Relevance Feedback in Text-to-Image Diffusion: A Training-Free And Model-Agnostic Interactive Framework

ArXi:2603.14936v1 Announce Type: new Text-to-image generation using diffusion models has achieved remarkable success. However, users often possess clear visual intents but struggle to express them precisely in language, resulting in ambiguous prompts and misaligned images. Existing methods struggle to bridge this gap, typically relying on high-load textual dialogues, opaque black-box inferences, or expensive fine-tuning. They fail to simultaneously achieve low cognitive load, interpretable preference inference, and remain