MEME: Multi-entity & Evolving Memory Evaluation

ArXi:2605.12477v1 Announce Type: new LLM-based agents increasingly operate in persistent environments where they must, update, and reason over information across many sessions. While prior benchmarks evaluate only single-entity updates, MEME defines six tasks spanning the full space defined by the multi-entity and evolving axes, including three not scored by prior work: Cascade and Absence (dependency reasoning) and Deletion (post-removal state