Improved Scaling Laws via Weak-to-Strong Generalization in Random Feature Ridge Regression

ArXi:2603.05691v1 Announce Type: new It is increasingly common in machine learning to use learned models to label data and then employ such data to train capable models. The phenomenon of weak-to-strong generalization exemplifies the advantage of this two-stage procedure: a strong student is trained on imperfect labels obtained from a weak teacher, and yet the strong student outperforms the weak teacher. In this paper, we show that the potential improvement is substantial, in the sense that it affects the scaling law followed by the test error.