UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification

ArXi:2605.06221v1 Announce Type: new As large language models (LLMs) continue to advance rapidly, they are becoming increasingly capable while simultaneously demanding ever-longer context lengths. To improve the inference efficiency of long-context processing, several novel low-complexity hybrid architectures have recently been proposed, effectively alleviating the computational burden of long-context inference.