Synergizing Understanding and Generation with Interleaved Analyzing-Drafting Thinking

ArXi:2602.21435v2 Announce Type: replace Unified Vision-Language Models (UVLMs) aim to advance multimodal learning by ing both understanding and generation within a single framework. However, existing approaches largely focus on architectural unification while overlooking the need for explicit interaction between the two capabilities during task solving. As a result, current models treat understanding and generation as parallel skills rather than synergistic processes. To achieve real synergy, we