AI RESEARCH

Activation Steering via Generative Causal Mediation

arXiv CS.LG

ArXi:2602.16080v2 Announce Type: replace-cross Where should we intervene in a language model (LM) to localize and control behaviors that are diffused across many tokens of a long-form response? We