Linearly Separable Features in Shallow Nonlinear Networks: Width Scales Polynomially with Intrinsic Data Dimension

ArXi:2501.02364v2 Announce Type: replace Deep neural networks have attained remarkable success across diverse classification tasks. Recent empirical studies have shown that deep networks learn features that are linearly separable across classes. However, these findings often lack rigorous justifications, even under relatively simple settings. In this work, we address this gap by examining the linear separation capabilities of shallow nonlinear networks.