Faster Sparse Attention with IndexCache (GitHub Repo)
TLDR AI
•
Open Source AI
IndexCache reduces the cost of DeepSeek Sparse Attention by reusing top-k token indices across layers instead of recomputing them every time. The approach removes most indexer computations while maintaining model quality.