AI RESEARCH

AgentEval: DAG-Structured Step-Level Evaluation for Agentic Workflows with Error Propagation Tracking

arXiv CS.CL

ArXi:2604.23581v1 Announce Type: cross Agentic systems that chain reasoning, tool use, and synthesis into multi-step workflows are entering production, yet prevailing evaluation practices like end-to-end outcome checks and ad-hoc trace inspection systematically mask the intermediate failures that dominate real-world error budgets.