Nemotron-3-Super-120B-A12B NVFP4 inference benchmark on one RTX Pro 6000 Blackwell

r/LocalLLaMA
AI Hardware AI Research AI Tools

Ran Nemotron-3-Super-120B-A12B NVFP4 through a full benchmark sweep on a single RTX Pro 6000 using vLLM. fp8 KV cache (per Nvidia's setup, unclear if their metrics were tested at fp8 KV cache or not). Context from 1K to 512K, 1 to 5 concurrent requests, 1024 output tokens per request. No prompt caching. Numbers are steady-state averages across sustained load. This is a team-oriented benchmark, not tuned for peak single-user performance. Methodology details at the bottom.