LaDi-RL: Latent Diffusion Reasoning Prevents Entropy Collapse in Reinforcement Learning

ArXi:2602.01705v3 Announce Type: replace Reinforcement learning has become a central paradigm for improving LLM reasoning, but most existing methods optimize policies over discrete token sequences. This creates a mismatch between the optimization space and the structure of reasoning: many important decisions are semantic, global, and trajectory-level rather than local token choices. Continuous latent-space RL offers a promising alternative by allowing policies to explore higher-level reasoning representations. However, simply moving to latent space is not sufficient.