From Attribution to Action: A Human-Centered Application of Activation Steering

ArXi:2604.11467v1 Announce Type: new Explainable AI (XAI) methods reveal which features influence model predictions, yet provide limited means for practitioners to act on these explanations. Activation steering of components identified via XAI offers a path toward actionable explanations, although its practical utility remains understudied. We