Google’s TurboQuant Compression May Support Faster Inference, Same Accuracy on Less Capable Hardware
InfoQ AI/ML
•
Generative AI
AI Research
Google Research unveiled TurboQuant, a novel quantization algorithm that compresses large language models’ Key-Value caches by up to 6x. With 3.5-bit compression, near-zero accuracy loss, and no re