CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning

ArXi:2605.20075v1 Announce Type: cross Chain-of-thought (CoT) is a standard approach for eliciting reasoning capabilities from large language models (LLMs). However, the common CoT paradigm treats thinking as a prerequisite for answering, which can delay access to plausible answers and incur unnecessary token costs even when the model is able to identify an answer before extended thinking, a behavior known as performative reasoning. In this paper, we