AI RESEARCH
AdaSplash-2: Faster Differentiable Sparse Attention
arXiv CS.LG
•
ArXi:2604.15180v1 Announce Type: new Sparse attention has been proposed as a way to alleviate the quadratic cost of transformers, a central bottleneck in long-context