YOCO++: Enhancing YOCO with KV Residual Connections for Efficient LLM Inference

ArXi:2604.13556v1 Announce Type: new Cross-layer key-value (KV) compression has been found to be effective in efficient inference of large language models (LLMs). Although they reduce the memory consumption of the KV cache, such methods usually