AI RESEARCH

Don't Lose Focus: Activation Steering via Key-Orthogonal Projections

arXiv CS.CL

ArXi:2605.06342v1 Announce Type: new Activation steering controls LLM behaviour towards target behaviour by intervening in internal representations, yet it often degrades reasoning and retrieval performance. We argue that a primary cause of this trade-off is attention rerouting: steering vectors alter query-key matching, shifting attention away from contextually important tokens toward less informative ones. To address this, we propose Steering via Key-Orthogonal Projections (SKOP), a steering method that constrains harmful attention rerouting without eliminating steering efficacy.