llama.cpp benchmark native vs. non native NVFP4 on Blackwell - summary

r/LocalLLaMA
Generative AI AI Hardware Open Source AI AI Research

I tested two llama.cpp builds on the same Qwen3.6-27B-NVFP4 model. llama-bench reports the model label as qwen35 27B NVFP4, but the actual tested model is Qwen3.6-27B-NVFP4. Test platform GPU: NVIDIA GeForce RTX 5090 CPU: AMD Ryzen 9 9950X3D RAM: 128 GB DDR5 5600 CL36 Backend: CUDA Tested builds b8966 - last build without native NVFP4 b8967 - build with native NVFP4 (first build with native NVFP4) Both runs used the same model and settings: Qwen3.6-27B-NVFP4, 17.50 GiB, 26.90B parameters, CUDA backend, ngl=999, fa=1. Main.