EntropyCache: Decoded Token Entropy Guided KV Caching for Diffusion Language Models

ArXi:2603.18489v1 Announce Type: new Diffusion-based large language models (dLLMs) rely on bidirectional attention, which prevents lossless KV caching and requires a full forward pass at every denoising step. Existing approximate KV caching methods reduce this cost by selectively updating cached states, but their decision overhead scales with context length or model depth. We propose EntropyCache, a