AI RESEARCH
Where to Steer: Input-Dependent Layer Selection for Steering Improves LLM Alignment
arXiv CS.LG
•
ArXi:2604.03867v1 Announce Type: new Steering vectors have emerged as a lightweight and effective approach for aligning large language models (LLMs) at inference time, enabling modulation over model behaviors by shifting LLM representations towards a target behavior. However, existing methods typically apply steering vectors at a globally fixed layer, implicitly assuming that the optimal intervention layer is invariant across inputs.