AI RESEARCH

Weight Updates as Activation Shifts: A Principled Framework for Steering

arXiv CS.LG

ArXi:2603.00425v2 Announce Type: replace Activation steering promises to be an extremely parameter-efficient form of adaptation, but its effectiveness depends on critical design choices -- such as intervention location and parameterization -- that currently rely on empirical heuristics rather than a principled foundation. We establish a first-order equivalence between activation-space interventions and weight-space updates, deriving the conditions under which activation steering can replicate fine-tuning behavior.