AI RESEARCH
Kinetic theory for Transformers and the lost-in-the-middle phenomenon
arXiv CS.LG
•
ArXi:2605.09213v1 Announce Type: cross We study causal self-attention dynamics -- a toy model for decoder Transformers -- which we interpret as a non-exchangeable interacting particle system. Adapting cumulant expansions to the triangular causal dependency structure of the model, and appealing to non-hierarchical methods to estimate correlations using Glauber calculus, we prove a quantitative mean-field limit result and a next-order characterization of correlations.