AI RESEARCH

Ordinary Least Squares is a Special Case of Transformer

arXiv CS.AI

ArXi:2604.13656v1 Announce Type: cross The statistical essence of the Transformer architecture has long remained elusive: Is it a universal approximator, or a neural network version of known computational algorithms? Through rigorous algebraic proof, we show that the latter better describes Transformer's basic nature: Ordinary Least Squares (OLS) is a special case of the single-layer Linear Transformer.