AI RESEARCH

Cosine-Gated Adam-Decay: Drop-In Staleness-Aware Outer Optimization for Decoupled DiLoCo

arXiv CS.LG

ArXi:2605.09126v1 Announce Type: new Asynchronous DiLoCo systems may receive pseudo-gradients computed several outer rounds earlier, yet the standard Nestero outer optimizer does not explicitly condition its update on per-update age. This can make the outer momentum buffer brittle under large controlled delays.