How NVIDIA Cut DeepSeek Sparse Attention’s Top-K Time
Towards AI
•
AI Hardware
Open Source AI
Half by Exploiting a Quirk of Autoregressive Decoding