When Cultures Meet: Multicultural Text-to-Image Generation

ArXi:2502.15972v2 Announce Type: replace-cross Text-to-image generation models have achieved strong performance in culturally homogeneous settings, yet their ability to generate multicultural scenes, where people and landmarks originate from different cultures, remains largely unexplored. As one strategy for composing cultural and graphic information, we explore MosAIG, a Multi-Agent framework that enhances multicultural Image Generation by leveraging LLMs with distinct cultural personas.