When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction

ArXi:2605.12922v1 Announce Type: new Large language models can follow complex instructions in a single turn, yet over long multi-turn interactions they often lose the thread of instructions, persona, and rules. This degradation has been measured behaviorally but not mechanistically explained. We propose a channel-transition account: goal-defining tokens become less accessible through attention, while goal-related information may persist in residual representations. We