AI RESEARCH

#1 on memory benchmark LongMemEval with Gemini Flash, not Pro [R]

r/MachineLearning

Disclosure: first author. Evaluation of an experimental memory retrieval system against LongMemEval (Wang, 2024). Figured the results might be of interest here, particularly the deliberate use of a smaller answering model to isolate retrieval quality from model capability. 96.4% at top-50 with Gemini 3 Flash. Comparative reported scores (all Gemini 3 Pro): Mem0 94.8%, Honcho 92.6%, HydraDB 90.79%, Supermemory 85.2%. Retrieval architecture draws on episodic memory theory (Tulving, 1972), reconstructive recall (Bartlett, 1932), and temporal context models (Howard & Kahana, 2002.