Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis

ArXi:2604.24198v1 Announce Type: cross Process Reward Models (PRMs) have achieved remarkable success in augmenting the reasoning capabilities of Large Language Models (LLMs) within static domains such as mathematics. However, their potential in dynamic data analysis tasks remains underexplored. In this work, we first present a empirical study revealing that general-domain PRMs struggle to supervise data analysis agents.