llama.cpp - NVFP4 native support on Blackwell from now - b8967

r/LocalLLaMA
Generative AI AI Hardware Open Source AI

It looks like finally we have it! Time to test! Platform: RTX 5090+(RTX5060TI - but not used during test) - Ryzen 9 9950X3D+128 GB DDR5 5600 CL36): TEST: CUDA_VISIBLE_DEVICES=0 /home/marcin/llama.cpp/llama-bench \ -m /home/marcin/llama.cpp_models/Qwen3.6-27B-NVFP4/Qwen3.6-27B-NVFP4.gguf \ -ngl 999 \ -fa 1 \ -p 512,2048 \ -n 128,512 \ -d 0,4096,8192,16384,32768 \ -r 5 \ -o md | tee /home/marcin/qwen3.6-27b-nvfp4-gpu0-bench-depth.md model size params backend ngl fa test t/s qwen35 27B NVFP4 17.50 GiB 26.90B CUDA 999 1 pp512 5546.93 ± 220.29 qwen35 27B NVFP4 17.50 GiB 26.90B CUDA 999 1 pp2048.