Exploring Reasoning Reward Model for Agents

ArXi:2601.22154v2 Announce Type: replace-cross Agentic Reinforcement Learning (Agentic RL) has achieved notable success in enabling agents to perform complex reasoning and tool use. However, most methods still relies on sparse outcome-based reward for