Benchmarked 4 agent memory systems: Mem0 scores 49% recall (worse than a coin flip), Zep uses 340x more tokens for 15 points improvement. Here's what's actually going on.

r/LocalLLaMA
Generative AI AI Research AI Tools

I've been digging into how AI coding agents actually handle memory - not what the marketing says, but what the code and benchmarks show. Here's what I found. TL;DR: Every agent memory system in 2026 is either too simple (can't search), too expensive (600K tokens per conversation), or too clever (burns tokens on memory management instead of actual work). The real unsolved problem isn't remembering - it's forgetting. How the major systems actually work Claude Code - Reads CLAUDE.md at session start. Entire file goes into context. No vector DB, no semantic search.