InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

ArXi:2603.09877v1 Announce Type: new Unified multimodal models (UMMs) that integrate understanding, reasoning, generation, and editing face inherent trade-offs between maintaining strong semantic comprehension and acquiring powerful generation capabilities. In this report, we present InternVL-U, a lightweight 4B-parameter UMM that cratizes these capabilities within a unified framework.