AI RESEARCH
RT-Transformer: The Transformer Block as a Spherical State Estimator
arXiv CS.AI
•
ArXi:2605.11007v1 Announce Type: cross We show that the core components of the Transformer block -- attention, residual connections, and normalization -- arise naturally from a single geometric estimation problem. Modeling the latent state as a direction on the hypersphere, with noise defined in the tangent plane at the current estimate, yields a precision-weighted directional inference procedure in which attention aggregates evidence, residual connections implement incremental state updates, and normalization retracts the updated state back onto the hypersphere.