RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models

ArXi:2603.18859v1 Announce Type: new Reinforcement learning (RL) holds significant promise for enhancing the agentic reasoning capabilities of large language models (LLMs) with external environments. However, the inherent sparsity of terminal rewards hinders fine-grained, state-level optimization. Although process reward modeling offers a promising alternative