Adaptive Test-Time Compute Allocation for Reasoning LLMs via Constrained Policy Optimization

ArXi:2604.14853v1 Announce Type: new Test-time compute scaling, the practice of spending extra computation during inference via repeated sampling, search, or extended reasoning, has become a powerful lever for improving large language model performance.