AI RESEARCH

Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking

arXiv CS.AI

ArXi:2602.16746v2 Announce Type: replace-cross Grokking -- the delayed transition from memorization to generalization in small algorithmic tasks -- remains poorly understood. We present a geometric analysis of optimization dynamics in transformers trained on modular arithmetic. PCA of attention weight trajectories reveals that