TPU v7x Ironwood vs Nvidia B200

r/LocalLLaMA
AI Hardware

Google published Ironwood inference benchmarks in their AI-Hypercomputer/tpu-recipes repo. Nvidia has InferenceMAX numbers for B200. Nobody has compared them head-to-head under identical conditions. Ironwood skipped MLPerf v6.0, so there's no neutral standard either. I rented B200s on Vast.ai and ran exactly the same FP8 configs Google published, on two models: Qwen3-32B (dense) and Qwen3-Coder-480B-A35B (MoE). Same quantization (FP8 e4m3 weights + activations + KV cache), same sequence lengths, same concurrency, same prompt count, same seed - every arg copied from Google's recipe yaml.