OneCAT and InternVL-U, two new models

r/StableDiffusion
Generative AI

InternVL-U: OneCAT: The papers for InternVL-U and OneCAT both present advancements in Unified Multimodal Models (UMMs) that integrate understanding, reasoning, generation, and editing. While they share the goal of architectural unification, they differ significantly in their fundamental design philosophies, inference efficiencies, and specialized capabilities. Architecture and Methodology Comparison InternVL-U is designed as a streamlined ensemble model that combines a state-of-the-art Multimodal Large Language Model (MLLM) with a specialized visual generation head.