How NVIDIA Cut DeepSeek Sparse Attention’s Top-K Time

Towards AI
AI Hardware Open Source AI

Half by Exploiting a Quirk of Autoregressive Decoding