AI RESEARCH
First-Passage Prediction of Grokking Delay: ACalibrated Law under AdamW with Causal Validation
arXiv CS.AI
•
ArXi:2605.18845v1 Announce Type: cross We give the first quantitative prediction of grokking delay under AdamW. Treating the delay as a first-passage time, we derive a closed-form law T_grok - T_mem = (1 / 2 kappa_LL eta lambda) log(V_mem / V_star), where V_t = ||theta_t||^2 is the squared parameter norm, V_star is an architecture-dependent threshold, and kappa_LL absorbs the AdamW correction to the clean-SGD contraction rate 2 eta lambda.