AI RESEARCH

PiKV: KV Cache Management System for Mixture of Experts

arXiv CS.AI

ArXi:2508.06526v3 Announce Type: replace-cross As large-scale language models continue to scale up in both size and context length, the memory and communication cost of key-value (KV) cache storage has become a major bottleneck in multi-GPU and multi-node inference. While MoE-based architectures sparsify computation across experts, the corresponding KV caches remain dense and globally synchronized, resulting in significant overhead.