ToolRLA: Multiplicative Reward Decomposition for Tool-Integrated Agents

ArXi:2603.01620v4 Announce Type: replace Tool-integrated agents that interleave reasoning with API calls are promising for complex tasks, yet aligning them for high-stakes, domain-specific deployment remains challenging: existing reinforcement learning approaches rely on coarse binary rewards that cannot distinguish tool selection errors from malformed parameters. We present ToolRLA, a three-stage post-