Qwen3.6-35B-A3B KLDs - INTs and NVFPs
r/LocalLLaMA
•
AI Hardware
AI Research
AI Tools
KLD for INTs and NVFP4s. AS ALWAYS - Use Case is important. Accuracy versus speed versus native kernels on your GPUs. Things to note again: This is done in VLLM, with REAL logits. My Repo has made changes in the VLLM "hot path", so it's real, it's on GPU, and it's ~3-5 minutes on RTX 6000s KLD does not lie, it's just raw math against Logits KLD tells a story of divergence. Evals are still important, for use-case specific A quant can have a worse KLD and get a better eval on a test versus a better KLD quant. This is bench maxing, and it's real. Choose the Quant for your Use-Case.