AI RESEARCH

Step-wise Rubric Rewards for LLM Reasoning

arXiv CS.LG

ArXi:2605.17291v1 Announce Type: new Reinforcement Learning with Verifiable Rewards (RLVR) is widely used to improve reasoning in large language models, but rewards only final-answer correctness with no supervision over intermediate steps. Rubric-based methods such as Rubrics as Rewards (RaR)