Learning Dynamics of Zeroth-Order Optimization: A Kernel Perspective

ArXi:2605.03373v1 Announce Type: new Classical optimization theory establishes that zeroth-order (ZO) algorithms suffer from a dimension-dependent slowdown, with convergence rates typically scaling with the model dimension compared to first-order methods. However, in contrast to these theoretical expectations, a growing body of recent work nstrates the successful application of ZO methods to fine-tuning Large Language Models (LLMs) with billions of parameters.