AI RESEARCH
Don't Lose Focus: Activation Steering via Key-Orthogonal Projections
arXiv CS.CL
•
ArXi:2605.06342v1 Announce Type: new Activation steering controls LLM behaviour towards target behaviour by intervening in internal representations, yet it often degrades reasoning and retrieval performance. We argue that a primary cause of this trade-off is attention rerouting: steering vectors alter query-key matching, shifting attention away from contextually important tokens toward less informative ones. To address this, we propose Steering via Key-Orthogonal Projections (SKOP), a steering method that constrains harmful attention rerouting without eliminating steering efficacy.