AI RESEARCH
ICaRus: Identical Cache Reuse for Efficient Multi Model Inference
arXiv CS.AI
•
ArXi:2603.13281v1 Announce Type: cross Multi model inference has recently emerged as a prominent paradigm, particularly in the development of agentic AI systems. However, in such scenarios, each model must maintain its own Key-Value (KV) cache for the identical prompt, leading to substantial memory consumption. This explosive growth of KV caches forces LLM serving systems to evict previously d caches, which in turn