Deepseek v4 Flash, Gemma/Qwen KV Cache Quantization & 384K Context

Deepseek v4 Flash, Gemma/Qwen KV Cache Quantization & 384K Context Today's Highlights Deepseek v4 is now available on HuggingFace, featuring Flash optimization and an astonishing 384K max output capability. Meanwhile, new research details KV cache quantization for Gemma 4 and Qwen 3.6, offering insights into local inference optimization. Deepseek V4 Flash and Non-Flash Out on HuggingFace (r/LocalLLaMA)