AI RESEARCH

Quantifying Concentration Phenomena of Mean-Field Transformers in the Low-Temperature Regime

arXiv CS.LG

ArXi:2605.10931v1 Announce Type: cross Transformers with self-attention modules as their core components have become an integral architecture in modern large language and foundation models. In this paper, we study the evolution of tokens in deep encoder-only transformers at inference time which is described in the large-token limit by a mean-field continuity equation.