Google’s TurboQuant Is Quietly Rewriting the Rules of AI Memory

Towards AI
AI Research

Google’s TurboQuant shrinks AI’s working memory by up to 10x A new compression algorithm from Google Research shrinks AI’s working memory by up to 10x - with near-zero accuracy loss. Here is how it works, and why it matters. Every time you have a long conversation with an AI, ask it to summarize a document, or run a complex semantic search, the model is quietly filling up a working memory called the key-value cache. It is the model’s fast-access notepad - storing what it has already processed so it does not have to recompute everything from scratch with each new word.