AI RESEARCH
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps
arXiv CS.CL
•
ArXi:2605.16928v1 Announce Type: new Long-context inference in large language models is bottlenecked by the quadratic cost of full attention. Existing efficient alternatives often rely either on native sparse