AI RESEARCH

On-Average Stability of Multipass Preconditioned SGD and Effective Dimension

arXiv CS.LG

ArXi:2603.11989v1 Announce Type: new We study trade-offs between the population risk curvature, geometry of the noise, and preconditioning on the generalisation ability of the multipass Preconditioned Stochastic Gradient Descent (PSGD). Many practical optimisation heuristics implicitly navigate this trade-off in different ways -- for instance, some aim to whiten gradient noise, while others aim to align updates with expected loss curvature.