A Theory of Online Learning with Autoregressive Chain-of-Thought Reasoning

ArXi:2605.06819v1 Announce Type: new Autoregressive generation lies at the heart of the mechanism of large language models. It can be viewed as the repeated application of a next-token generator: starting from an input string (prompt), the generator is applied for $M$ steps, and the last generated token is taken as the final output. [Joshi, 2025] proposed a PAC model for studying the learnability of the input-output maps arising from this process.