AdaSplash-2: Faster Differentiable Sparse Attention

ArXi:2604.15180v1 Announce Type: new Sparse attention has been proposed as a way to alleviate the quadratic cost of transformers, a central bottleneck in long-context