Stability and Generalization in Looped Transformers

ArXi:2604.15259v1 Announce Type: cross Looped transformers promise test-time compute scaling by spending iterations on harder problems, but it remains unclear which architectural choices let them extrapolate to harder problems at test time rather than memorize