Scaling Limits of Long-Context Transformers

ArXi:2605.08505v1 Announce Type: cross We study the long-context limit of softmax self-attention with a fixed query and a random context of $n$ i.i.d. keys on the sphere, viewing the inverse temperature $\beta_n$ as the scaling parameter that decides whether attention degenerates into uniform averaging or collapses onto the single closest key.