[R] Architecture Determines Optimization: Deriving Weight Updates from Network Topology (seeking arXiv endorsement - cs.LG)

We derive neural network weight updates from first principles without assuming gradient descent or a specific loss function. Starting from the error equation E = y − f(x) and linearizing with respect to the free parameters, while noting that the data x and target y are fixed observations, we obtain a linear constraint on the weight perturbations. The minimum-norm solution to this underdetermined system is algebraically identical to the standard weight gradient. Gradient descent is therefore a consequence of minimizing weight change subject to error correction.