Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions

ArXi:2605.05983v1 Announce Type: new Recently, steering vectors (SVs) have emerged as an effective and lightweight approach to steer behaviors of large language models (LLMs), among which fine-tuned SVs are effective than optimization-free ones. However, current approaches to fine-tuned SVs suffer from two limitations. First, they require careful selection of steering factors on a per-SV basis to balance steering effectiveness and generation quality at inference time.