Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets

ArXi:2511.20888v2 Announce Type: replace-cross This paper argues that DNNs implement a computational Occam's razor -- finding the `simplest' algorithm that fits the data -- and that this could explain their incredible and wide-ranging success over traditional statistical methods. We start with the discovery that the set of real-valued function $f$ that can be $\epsilon$-approximated with a binary circuit of size at most $c\epsilon^{-\gamma}$ becomes convex in the `Harder than Monte Carlo' (HTMC) regime, when $\gamma>2$, allowing for the definition of a HTMC norm on functions.