Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks

ArXi:2605.10395v1 Announce Type: cross We study the information-theoretic limits of learning a one-hidden-layer teacher network with hierarchical features from noisy queries, in the context of knowledge transfer to a smaller student model. We work in the high-dimensional regime where the teacher width $k$ scales linearly with the input dimension $d$ -- a setting that captures large-but-finite-width networks and has only recently become analytically tractable.