AI RESEARCH

Steering Awareness: Detecting Activation Steering from Within

arXiv CS.AI

ArXi:2511.21399v3 Announce Type: replace-cross Activation steering -- adding a vector to a model's residual stream to modify its behavior -- is widely used in safety evaluations as if the model cannot detect the intervention. We test this assumption,