AI RESEARCH

Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization

arXiv CS.LG

ArXi:2602.02958v4 Announce Type: replace Despite rapid progress in autoregressive video diffusion, an emerging system algorithm bottleneck limits both deployability and generation capability: KV cache memory. In autoregressive video generation models, the KV cache grows with generation history and quickly dominates GPU memory, often exceeding 30 GB, preventing deployment on widely available hardware. critically, constrained KV cache budgets restrict the effective working memory, directly degrading long horizon consistency in identity, layout, and motion.