RelayCaching: Accelerating LLM Collaboration via Decoding KV Cache Reuse

ArXi:2603.13289v1 Announce Type: cross The increasing complexity of AI tasks has shifted the paradigm from monolithic models toward multi-agent large language model (LLM) systems. However, these collaborative architectures