Transformers Linearly Represent Highly Structured World Models

ArXi:2605.18847v1 Announce Type: cross Do transformers, when trained on sequential reasoning traces, build internal models of the underlying task? And if so, does the structure of those internal representations mirror the structure of the domain? We train an 8-layer transformer on Sudoku solving traces and perform a mechanistic analysis of its internal computation. We establish two results.