AI RESEARCH
Entropy Centroids as Intrinsic Rewards for Test-Time Scaling
arXiv CS.AI
•
ArXi:2604.26173v1 Announce Type: cross An effective way to scale up test-time compute of large language models is to sample multiple responses and then select the best one, as in Grok Heavy and Gemini Deep Think. Existing selection methods often rely on external reward models, which requires