AI RESEARCH
Mechanistic Indicators of Steering Effectiveness in Large Language Models
arXiv CS.CL
•
ArXi:2602.01716v3 Announce Type: replace Activation-based steering enables Large Language Models (LLMs) to exhibit targeted behaviors by intervening on intermediate activations without re