AI RESEARCH
Direct Reasoning Optimization: Token-Level Reasoning Reflectivity Meets Rubric Gates for Unverifiable Tasks
arXiv CS.AI
•
ArXi:2506.13351v3 Announce Type: replace-cross Reinforcement learning (RL)