AI RESEARCH

CacheSolidarity: Preventing Prefix Caching Side Channels in Multi-tenant LLM Serving Systems

arXiv CS.LG

ArXi:2603.10726v1 Announce Type: cross Large Language Models (LLMs) rely on optimizations like Automatic Prefix Caching (APC) to accelerate inference. APC works by reusing previously computed states for the beginning part of a request (prefix), when another request starts with the same text. While APC improves throughput, it