From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents

ArXi:2604.20006v1 Announce Type: new Personalized agents that interact with users over long periods must maintain persistent memory across sessions and update it as circumstances change. However, existing benchmarks predominantly frame long-term memory evaluation as fact retrieval from past conversations, providing limited insight into agents' ability to consolidate memory over time or handle frequent knowledge updates. We