A Systematic Empirical Study of Grokking: Depth, Architecture, Activation, and Regularization

ArXi:2603.25009v1 Announce Type: new Grokking the delayed transition from memorization to generalization in neural networks remains poorly understood, in part because prior empirical studies confound the roles of architecture, optimization, and regularization. We present a controlled study that systematically disentangles these factors on modular addition (mod 97), with matched and carefully tuned