AI RESEARCH

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

arXiv CS.CL

ArXi:2605.16928v1 Announce Type: new Long-context inference in large language models is bottlenecked by the quadratic cost of full attention. Existing efficient alternatives often rely either on native sparse