Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty

ArXi:2604.10072v1 Announce Type: new Recent advancements in the Generative Reward Model (GRM) have nstrated its potential to enhance the reasoning abilities of LLMs through Chain-of-Thought (CoT) prompting. Despite these gains, existing implementations of GRM suffer from two critical limitations. First, CoT prompting is applied indiscriminately to all inputs regardless of their inherent complexity. This