Hybrid Latent Reasoning with Decoupled Policy Optimization

ArXi:2604.20328v1 Announce Type: new Chain-of-Thought (CoT) reasoning significantly elevates the complex problem-solving capabilities of multimodal large language models (MLLMs). However, adapting CoT to vision typically discretizes signals to fit LLM inputs, causing early semantic collapse and discarding fine-grained details. While external tools can mitigate this, they