Efficient Test-Time Scaling of Multi-Step Reasoning by Probing Internal States of Large Language Models

ArXi:2511.06209v4 Announce Type: replace-cross LLMs can solve complex tasks by generating long, multi-step reasoning chains. Test-time scaling (TTS) can further improve LLM performance by sampling multiple variants of intermediate reasoning steps, verifying their correctness, and strategically choosing the best steps for continuation. However, existing verification approaches, such as Process Reward Models (PRMs), are computationally expensive, limited to specific domains, and require large-scale human or model-generated annotations.