AI RESEARCH

Perceptrons and localization of attention's mean-field landscape

arXiv CS.LG

ArXi:2601.21366v2 Announce Type: replace The forward pass of a Transformer can be seen as an interacting particle system on the unit sphere: time plays the role of layers, particles that of token embeddings, and the unit sphere idealizes layer normalization. In some weight settings the system can even be seen as a gradient flow for an explicit energy, and one can make sense of the infinite context length (mean-field) limit thanks to Wasserstein gradient flows.