GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations

ArXi:2605.14498v1 Announce Type: new Large Language Model (LLM) agents increasingly serve as personal assistants and workplace collaborators, where their utility depends on memory systems that extract, retrieve, and apply information across long-running conversations. However, both existing memory systems and benchmarks are built around the dyadic, single-user setup, even though real deployments routinely span groups and channels with multiple users interacting with the agent and with each other.