MSG Score: Automated Video Verification for Reliable Multi-Scene Generation

ArXi:2411.19121v2 Announce Type: replace While text-to-video diffusion models have advanced significantly, creating coherent long-form content remains unreliable due to stochastic sampling artifacts. This necessitates generating multiple candidates, yet verifying them creates a severe bottleneck; manual review is unscalable, and existing automated metrics lack the adaptability and speed required for runtime monitoring.