AI RESEARCH

HeatKV: Head-tuned KV-cache Compression for Visual Autoregressive Modeling

arXiv CS.CV

ArXi:2605.14877v1 Announce Type: new Visual Autoregressive (VAR) models have recently nstrated impressive image generation quality while maintaining low latency. However, they suffer from severe KV-cache memory constraints, often requiring gigabytes of memory per generated image. We