Emergent Slow Thinking in LLMs as Inverse Tree Freezing

ArXi:2509.23629v3 Announce Type: replace-cross Reinforcement learning with verifiable rewards (RLVR) enables large language models to acquire slow, multi-step reasoning from sparse final-answer signals. We provide a statistical-physics picture of this emergence. We show that an autoregressive model's finite capacity forces it to compress its exponentially large prefix space into a Marko network of predictive states, on which slow thinking unfolds as a random walk -- the Concept Network (CoNet) picture.