Learning from Partial Chain-of-Thought via Truncated-Reasoning Self-Distillation

ArXi:2603.13274v1 Announce Type: cross Reasoning-oriented language models achieve strong performance by generating long chain-of-thought traces at inference time. However, this capability comes with substantial and often excessive computational cost, which can materialize in redundant or inefficient reasoning. We study this setting and