Evaluating Long-Horizon Memory for Multi-Party Collaborative Dialogues

ArXi:2602.01313v3 Announce Type: replace-cross Long-term conversational memory in practical LLM applications is inherently collaborative: information is produced by multiple participants, scattered across groups and channels, revised over time, and implicitly grounded in roles and social context. Yet there is currently no established benchmark that evaluates memory under interaction patterns resembling real-world deployment, as existing benchmarks largely focus on dyadic or single-topic dialogues. In this paper, we.