TorchUMM: A Unified Multimodal Model Codebase for Evaluation, Analysis, and Post-training

ArXi:2604.10784v1 Announce Type: new Recent advances in unified multimodal models (UMMs) have led to a proliferation of architectures capable of understanding, generating, and editing across visual and textual modalities. However, developing a unified framework for UMMs remains challenging due to the diversity of model architectures and the heterogeneity of