Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator

ArXi:2604.08121v1 Announce Type: new Unified multimodal models integrating visual understanding and generation face a fundamental challenge: visual generation incurs substantially higher computational costs than understanding, particularly for video. This imbalance motivates us to invert the conventional paradigm: rather than extending understanding-centric MLLMs to generation, we propose Uni-ViGU, a framework that unifies video generation and understanding by extending a video generator as the foundation. We.