Enhancing Alignment for Unified Multimodal Models via Semantically-Grounded Supervision

ArXi:2603.19807v1 Announce Type: cross Unified Multimodal Models (UMMs) have emerged as a promising paradigm that integrates multimodal understanding and generation within a unified modeling framework. However, current generative