UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

ArXi:2605.00658v1 Announce Type: new Recent progress has shown that video diffusion models (VDMs) can be repurposed for diverse multimodal graphics tasks. However, existing methods often train separate models for each problem setting, which fixes the input-output mapping and limits the modeling of correlations across modalities. We present UniVidX, a unified multimodal framework that leverages VDM priors for versatile video generation.