Effective Context in Transformers: An Analysis of Fragmentation and Tokenization

ArXi:2605.13485v1 Announce Type: new Transformers predict over a representation of a sequence. The same data can be written as bytes, characters, or subword tokens, and these representations may be lossless. Yet, under a fixed context window, they need not expose the same information to the model. This raises a basic question: how does the choice of representation change what a finite-context predictor can achieve? We study this question on Marko sources and uncover two complementary phenomena.