AI RESEARCH
YOCO++: Enhancing YOCO with KV Residual Connections for Efficient LLM Inference
arXiv CS.CL
•
ArXi:2604.13556v1 Announce Type: new Cross-layer key-value (KV) compression has been found to be effective in efficient inference of large language models (LLMs). Although they reduce the memory consumption of the KV cache, such methods usually