AI RESEARCH

Scaling the Memory of Balanced Adam

arXiv CS.LG

ArXi:2605.10119v1 Announce Type: new Recent evidence suggests that Adam performs robustly when its momentum parameters are tied, $\beta_1=\beta_2$, reducing the optimizer to a single remaining parameter. However, the value of this parameter is still poorly understood. We argue that, in balanced Adam, $\beta$ should not be treated as a dimensionless constant: it defines a statistical memory horizon $H_\beta=(1-\beta)^{-1