KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

ArXi:2605.14278v1 Announce Type: new Aligning streaming autoregressive (AR) video generators with human preferences is challenging. Existing reinforcement learning methods predominantly rely on noise-based exploration and SDE-based surrogate policies that are mismatched to the deterministic ODE dynamics of distilled AR models, and tend to perturb low-level appearance rather than the high-level semantic storyline progression critical for long-horizon coherence.