AgentCollabBench: Diagnosing When Good Agents Make Bad Collaborators

ArXi:2605.08647v1 Announce Type: cross Multi-agent systems achieve state-of-the-art outcomes through peer collaboration. However, when an agent in the pipeline silently drops a constraint, the system's final output may look correct even though the reasoning chain was quietly corrupted, and existing outcome-based evaluations are blind to such multi-hop process failures. To make these vulnerabilities measurable before deployment, we