Dynamic Sparse Attention: Access Patterns and Architecture

ArXi:2603.13430v1 Announce Type: cross Dynamic sparse attention (DSA) reduces the per-token attention bandwidth by restricting computation to a top-k subset of cached key-value (KV) entries, but its token-dependent selection pattern