SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging

ArXi:2603.14303v1 Announce Type: new Existing KV cache compression methods generally operate on discrete tokens or non-semantic chunks. However, such approaches often lead to semantic fragmentation, where linguistically coherent units are disrupted, causing irreversible information loss and degradation in model performance. To address this, we