AI RESEARCH

KV Cache Quantization for Self-Forcing Video Generation: A 33-Method Empirical Study

arXiv CS.AI

ArXi:2603.27469v1 Announce Type: cross Self-forcing video generation extends a short-horizon video model to longer rollouts by repeatedly feeding generated content back in as context. This scaling path immediately exposes a systems bottleneck: the key-value (KV) cache grows with rollout length, so longer videos require not only better generation quality but also substantially better memory behavior. We present a comprehensive empirical study of KV-cache compression for self-forcing video generation on a Wan2.1-based Self-Forcing stack.