RTX6k (Server, 450w) Qwen3.5-122B-A10B (MXFP4_MOE) Benchmarks
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
Date: 2026-03-08 Hardware: NVIDIA RTX PRO 6000 Blackwell (96 GB VRAM), single GPU Server: llama.cpp (llama-server), 4 parallel slots, 262K context Model: Qwen3.5-122B-A10B-MXFP4_MOE (~63 GB on disk) Tool: llama-benchy v0.3.4 Container: llm-qwen35 on gpus.local.lan Summary Metric Value Prompt processing (pp) 2,100-2,900 t/s Token generation (tg), single stream ~80 t/s Token generation (tg), 4 concurrent ~143 t/s total (~36 t/s per request) TTFT at 512 prompt tokens ~220 ms TTFT at 65K context depth ~23 s TG degradation at 65K context ~72 t/s (−10% vs no context) Phase 1: Baseline (Single.