Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs

ArXi:2603.03538v2 Announce Type: replace Large Language Models (LLMs) with chain-of-thought generation have nstrated great potential for solving complex reasoning and planning tasks. However, the output of current LLMs is not fully reliable and needs careful verification. Even if LLMs get accurate over time, learned verifiers can help increase trust, enforce safety constraints, and ensure alignment with personal preferences.