RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction

ArXi:2605.04075v1 Announce Type: cross Multimodal Large Language Models face severe challenges in computational efficiency and memory consumption due to the substantial expansion of the visual KV cache when processing long visual contexts. Existing KV cache compression methods typically rely on the "persistence of importance" hypothesis to prune tokens.