AI RESEARCH

LatentUMM: Dual Latent Alignment for Unified Multimodal Models

arXiv CS.CV

ArXi:2605.17766v1 Announce Type: new Unified multimodal models (UMMs) achieve strong performance in both understanding and generation by learning a shared latent space, yet they often exhibit functional inconsistency between these two capabilities. We observe that this issue does not stem from a lack of shared representations, but from the absence of explicit alignment between the transformations that map into and out of the latent space. As a result, generation and re-encoding can follow inconsistent trajectories, leading to semantic drift under modality transitions.