DiffusionLLM - Inception Mercury 2 - 11,000 tokens per second on NVIDIA H100 GPUs.

r/LocalLLaMA
AI Hardware

Submitted by /u/Revolutionary_Ask154 [link] [comments]