Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

ArXi:2511.20857v2 Announce Type: replace Statefulness is essential for large language model (LLM) agents to perform long-term planning and problem-solving. This makes memory a critical component, yet its management and evolution remain largely underexplored. Existing evaluations mostly focus on static conversational settings, where memory is passively retrieved from dialogue to answer queries, overlooking the dynamic ability to accumulate and reuse experience across evolving task streams.