Steer Away From Mode Collisions: Improving Composition In Diffusion Models

ArXi:2509.25940v2 Announce Type: replace-cross We propose to improve multi-concept prompt fidelity in text-to-image diffusion models. We begin with common failure cases - prompts like "a cat and a dog" that sometimes yields images where one concept is missing, faint, or colliding awkwardly with another. We hypothesize that this happens when the diffusion model drifts into mixed modes that over-emphasize a single concept it learned strongly during