AI RESEARCH

FishBack: Pullback Fisher Geometry for Optimal Activation Steering in Transformers

arXiv CS.LG

ArXi:2605.17231v1 Announce Type: new Activation steering methods modify intermediate representations of language models to control output behavior, but universally assume the activation space is Euclidean. We show this assumption fails drastically: the local geometry induced by the model's own output behavior -- the Fisher information metric of the softmax layer, pulled back through the Jacobian of subsequent layers -- deviates from the Euclidean metric by over 97% in relative spectral norm on GPT-2, with an effective dimensionality of only 2--17% of the ambient space.