AI RESEARCH
From Sparsity to Simplicity: Enabling Simpler Sequential Replacements via Sparse Attention Distillation
arXiv CS.AI
•
ArXi:2605.18865v1 Announce Type: cross Self-attention serves as the core foundation of large-scale transformer pre