Distinct mechanisms underlying in-context learning in transformers

ArXi:2604.12151v1 Announce Type: new Modern distributed networks, notably transformers, acquire a remarkable ability (termed `in-context learning') to adapt their computation to input statistics, such that a fixed network can be applied to data from a broad range of systems. Here, we provide a complete mechanistic characterization of this behavior in transformers trained on a finite set $S$ of discrete Marko chains. The transformer displays four algorithmic phases, characterized by whether the network memorizes and generalizes, and whether it uses 1-point or 2-point statistics.