Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent

ArXi:2605.17767v1 Announce Type: cross We study feature learning in two-layer neural networks within the linear-width regime, where the number of hidden neurons, sample size, and input dimension scale proportionally. While recent work has analyzed feature learning via a single step of gradient descent, such updates are fundamentally limited: they are approximately rank-one, capturing only a single direction, and require the target function to have an information exponent of one.