AI RESEARCH

DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing

arXiv CS.CL

ArXi:2604.19351v1 Announce Type: new The quadratic computational complexity of the standard attention mechanism constitutes a fundamental bottleneck for large language models in long-context inference. While existing KV cache compression methods alleviate memory pressure, they often sacrifice generation quality and fail to address the high overhead of floating-point arithmetic. This paper