AI RESEARCH
Training-Induced Escape from Token Clustering in a Mean-Field Formulation of Transformers
arXiv CS.LG
•
ArXi:2605.07772v1 Announce Type: new Transformers perform inference by iteratively transforming token representations across layers. This layerwise computation has been studied empirically, and recent mean-field theories of Transformer dynamics explain how attention can drive token distributions toward clustering. However, existing mean-field analyses largely treat model parameters as prescribed, leaving open how