AI RESEARCH
Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking
arXiv CS.AI
•
ArXi:2602.16746v2 Announce Type: replace-cross Grokking -- the delayed transition from memorization to generalization in small algorithmic tasks -- remains poorly understood. We present a geometric analysis of optimization dynamics in transformers trained on modular arithmetic. PCA of attention weight trajectories reveals that