Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models

ArXi:2504.01137v3 Announce Type: replace Text-to-image generation models suffer from alignment problems, where generated images fail to accurately capture the objects and relations in the text prompt. Prior work has focused on improving alignment by refining the diffusion process, ignoring the role of the text encoder, which guides the diffusion.