Beyond Accuracy: Benchmarking Cross-Task Consistency in Unified Multimodal Models

ArXi:2604.25072v1 Announce Type: new Unified Multimodal Models (uMMs) aim to both visual understanding and visual generation within a shared representation. However, existing evaluation protocols assess these two capabilities independently and do not examine whether they are semantically aligned. As a result, it remains unclear whether current uMMs learn coherent unified representations that remain consistent across tasks given a visual concept. We