Faster Sparse Attention with IndexCache (GitHub Repo)

TLDR AI
Open Source AI

IndexCache reduces the cost of DeepSeek Sparse Attention by reusing top-k token indices across layers instead of recomputing them every time. The approach removes most indexer computations while maintaining model quality.