[D] Production gaps in context-window compression for AI agent memory

've been working on AI memory infrastructure and recently spent a few weeks reading through the source code of an open-source context-window compression system - the kind that replaces retrieval entirely by having background LLM agents compress conversation history into structured observations, then prefix the entire block into every turn. The approach just hit 90+% on LongMemEval, which is impressive.