2x RTX Pro 6000 vs 2x A100 80GB dense model inference

r/LocalLLaMA
Generative AI

Has anyone compared inference performance of the largest dense model (not sparse or MoE) that will fit on both of these setups to be compared? * On a PCIe Gen5 x16 bus, 2x RTX Pro 6000 Blackwell 96GB (workstation, not Max-Q): NVFP4 quantized * Triple NV-Link'd, 2x A100 80GB Ampere: W4A16 quantized submitted by /u/RealTime3392 [link] [comments]