RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably

ArXi:2605.15514v1 Announce Type: cross We identify intrinsic limitations of Rotary Positional Embeddings (RoPE) in Transformer-based long-context language models. Our theoretical analysis abstracts away from the specific content of the context and depends only on its length. We prove that as context length increases, RoPE-based attention becomes unpredictable and loses two properties that are central to its effectiveness. First, it loses its locality bias: RoPE is no likely to favor nearer positions than substantially farther ones.