AI RESEARCH
Why Adam Works Better with $\beta_1 = \beta_2$: The Missing Gradient Scale Invariance Principle
arXiv CS.AI
•
ArXi:2601.21739v2 Announce Type: replace-cross Adam has been at the core of large-scale