AI RESEARCH

PROTEA: Offline Evaluation and Iterative Refinement for Multi-Agent LLM Workflows

arXiv CS.CL

ArXi:2605.18032v1 Announce Type: new Multi-agent LLM workflows -- systems composed of multiple role-specific LLM calls -- often outperform single-prompt baselines, but they remain difficult to debug and refine. Failures can originate from subtle errors in intermediate outputs that propagate to downstream nodes, requiring developers to inspect long traces and infer which agent to modify. We present PROTEA, a unified interface for offline, test-driven improvement of multi-agent workflows.