Efficient Failure Management for Multi-Agent Systems with Reasoning Trace Representation

ArXi:2603.21522v1 Announce Type: cross Large Language Models (LLM)-based Multi-Agent Systems (MASs) have emerged as a new paradigm in software system design, increasingly nstrating strong reasoning and collaboration capabilities. As these systems become complex and autonomous, effective failure management is essential to ensure reliability and availability. However, existing approaches often rely on per-trace reasoning, which leads to low efficiency, and neglect historical failure patterns, limiting diagnostic accuracy.