AI RESEARCH
Semantic Integrity Matters: Benchmarking and Preserving High-Density Reasoning in KV Cache Compression
arXiv CS.AI
•
ArXi:2502.01941v3 Announce Type: replace-cross While Key-Value (KV) cache compression is essential for efficient LLM inference, current evaluations disproportionately focus on sparse retrieval tasks, potentially masking the degradation of High-Density Reasoning where Chain-of-Thought (CoT) coherence is critical. We