AI RESEARCH

Stream separation improves Bregman conditioning in transformers

arXiv CS.LG

ArXi:2603.21317v1 Announce Type: new Linear methods for steering transformer representations, including probing, activation engineering, and concept erasure, implicitly assume the geometry of representation space is Euclidean. Park [Park, 2026] showed that softmax induces a curved Bregman geometry whose metric tensor is the Hessian of the log-normalizer, $H({\lambda}) = Co[{\gamma} | {\lambda}]$. Ignoring this curvature causes Euclidean steering to leak probability mass to unintended tokens. Their analysis applies at the output layer.