Qwen3.6-27B KLDs - INTs and NVFPs

r/LocalLLaMA
AI Research

Will do more, but here's a start, as you're chosing your models. Remember, USE-CASE is important: Notice the larger size of THoTD NVFP versus the other. This is because THoTD is an NVFP4A16 versus NVFP4(A4). NVFP4(A4) should stay in 4bit the whole time, so if you are doing batching, NVFP4(A4) may see better performance as batching occurs Notice that huge size increase for Cyan from INT4 to __TECH_PRESERVE_1TECH_PRESERVE_0__. food for thought. Mixed-precision is amazing, but takes space. Is 0.02 accuracy worth losing 6GB of Context? Up to you to decide.