AI RESEARCH

[P] TurboQuant Pro: Open-source vector compression toolkit — 5-42x smaller embeddings with 0.97+ recall [R]

r/MachineLearning

TL;DR: We built an open-source toolkit that compresses high-dimensional vectors (embeddings, KV cache, anything in pgvector/FAISS) by 5-42x while maintaining 0.95+ cosine similarity. Benchmarked 6 methods on 2.4M real embeddings. MIT licensed. GitHub: Install: pip install turboquant-pro The Problem Vector databases are eating RAM. If you're running RAG with BGE-M3 (1024-dim float32), each embedding is 4KB. At 1M vectors that's 4GB just for embeddings. At 10M you need 40GB. pgvector, FAISS, Pinecone - they all have this problem.