Throughput and TTFT comparisons of Qwen 3.6 27B, Qwen 3.6 35B A3B and Gemma 4 models on H100
r/LocalLLaMA
•
AI Hardware
Open Source AI
AI Research
AI Tools
I wanted to figure out which of the newer small and mid-size models are actually worth running on a single H100, so I put 8 of them through a proper vLLM benchmark and recorded what came out. The setup was simple. One H100 80GB, vLLM 0.19.1, the built-in vllm bench serve tool, 100 prompts per run, 128 input tokens and 128 output tokens. We ran each model at four different concurrency levels (1, 4, 8, and 16 simultaneous requests) and measured two things: - Throughput in tokens / second, which tells you how much the GPU can produce overall once requests are flowing.