AI RESEARCH
Cosine-Gated Adam-Decay: Drop-In Staleness-Aware Outer Optimization for Decoupled DiLoCo
arXiv CS.LG
•
ArXi:2605.09126v1 Announce Type: new Asynchronous DiLoCo systems may receive pseudo-gradients computed several outer rounds earlier, yet the standard Nestero outer optimizer does not explicitly condition its update on per-update age. This can make the outer momentum buffer brittle under large controlled delays.