How do I get the superfast DFlash / MTP tokens per second that I'm seeing on here? Dual 3090s
r/LocalLLaMA
•
Generative AI
I'm trying to get these high tokens per second that I'm seeing on here using the new speculative decoding techniques. Hardware: 2x3090, AMD 9900X, 32GB RAM, Gigabyte B850