How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing

ArXi:2603.13259v1 Announce Type: cross When a language model is fed a wrong answer, what happens inside the network? Current understanding treats truthfulness as a static property of individual-layer representations-a direction to be probed, a feature to be extracted. Less is known about the dynamics: how internal representations diverge across the full depth of the network when the model processes correct versus incorrect continuations.