AI RESEARCH
Nearly Optimal Attention Coresets
arXiv CS.AI
•
ArXi:2605.05602v1 Announce Type: cross We consider the problem of estimating the Attention mechanism in small space, and prove the existence of coresets for it of nearly optimal size. Specifically, we show that for any set of unit-norm keys and values $(K,V)$ in $\mathbb{R}^d$, there exists a subset $(K',V')$ of size at most $O({\sqrt{d} e^{\rho+o(\rho)}/\varepsilon})$ such that \[ \left\| \operatorname{Attn}(q,K,V)- \operatorname{Attn}(q,K',V') \right\| \le \varepsilon \] simultaneously for all queries whose norm is bounded by $\rho.