AI RESEARCH

When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory

arXiv CS.AI

ArXi:2605.07313v1 Announce Type: new Memory-agent evaluations report fixed-snapshot accuracy or retrieval quality, but these scores do not show whether evidence remains usable as irrelevant sessions (sessions not annotated as task-relevant evidence for the query) accumulate. We present a scale-conditioned evaluation protocol for agent memory under evidence-preserving growth: for each query, task evidence is held fixed while irrelevant sessions are added.