When Understanding Becomes a Risk: Authenticity and Safety Risks in the Emerging Image Generation Paradigm

ArXi:2603.24079v1 Announce Type: new Recently, multimodal large language models (MLLMs) have emerged as a unified paradigm for language and image generation. Compared with diffusion models, MLLMs possess a much stronger capability for semantic understanding, enabling them to process complex textual inputs and comprehend richer contextual meanings. However, this enhanced semantic ability may also