AI RESEARCH

StepCache: Step-Level Reuse with Lightweight Verification and Selective Patching for LLM Serving

arXiv CS.AI

ArXi:2603.28795v1 Announce Type: cross We address LLM serving workloads where repeated requests share a common solution structure but differ in localized constraints, such as output schema, variable names, or numeric constants. Prior caching approaches typically reuse either full responses (semantic caching) or model-internal KV/prefix states, which are respectively brittle under partial changes or tightly coupled to specific backends.