AI RESEARCH
When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems
arXiv CS.LG
•
ArXi:2605.01133v1 Announce Type: cross Large language model (LLM)-powered multi-agent systems (MAS) enable agents to communicate and share information, achieving strong performance on complex tasks. However, this communication also creates an attack surface where malicious agents can propagate misinformation and manipulate group decisions, undermining MAS safety. Existing embedding-based defenses aim to detect and prune suspicious agents, but their effectiveness depends on a clear separation between the text embeddings of malicious and benign messages.