Generalization performance of narrow one-hidden layer networks in the teacher-student setting

ArXi:2507.00629v4 Announce Type: replace-cross Understanding the generalization properties of neural networks on simple input-output distributions is key to explaining their performance on real datasets. The classical teacher-student setting, where a network is trained on data generated by a teacher model, provides a canonical theoretical test bed. In this context, a complete theoretical characterization of fully connected one-hidden-layer networks with generic activation functions remains missing.