Reasoning Compression with Mixed-Policy Distillation

ArXi:2605.08776v1 Announce Type: new Reasoning-centric large language models (LLMs) achieve strong performance by generating intermediate reasoning trajectories, but often incur excessive token usage and high inference-time decoding cost. We observe that, when solving the same problems, larger reasoning models can often produce concise traces, whereas smaller reasoning models tend to generate longer and redundant trajectories. This is especially problematic in real-world deployment, where memory, latency, and serving-cost constraints often favor smaller models.