TurboQuant - Extreme KV Cache Quantization · ggml-org/llama.cpp · Discussion #20969
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
14+ independent validators now across Metal, CUDA, HIP, Vulkan, and MLX. Apple Silicon, NVIDIA (4090, 5090, H100, A100, V100, 1080 Ti), AMD (RX 9070 XT, RX 6600). from M1 to Blackwell. the data converges. - u/Pidtom That's an all-in-one thread to check all discussions & benchmarks on TurboQuant. submitted by /u/pmttyji [link] [comments]