AI RESEARCH

Direct Reasoning Optimization: Token-Level Reasoning Reflectivity Meets Rubric Gates for Unverifiable Tasks

arXiv CS.AI

ArXi:2506.13351v3 Announce Type: replace-cross Reinforcement learning (RL)