AI RESEARCH

Analysing the Safety Pitfalls of Steering Vectors

arXiv CS.CL

ArXi:2603.24543v1 Announce Type: cross Activation steering has emerged as a powerful tool to shape LLM behavior without the need for weight updates. While its inherent brittleness and unreliability are well-documented, its safety implications remain underexplored. In this work, we present a systematic safety audit of steering vectors obtained with Contrastive Activation Addition (CAA), a widely used steering approach, under a unified evaluation protocol.