AI RESEARCH

FOCUS: Optimal Control for Multi-Entity World Modeling in Text-to-Image Generation

arXiv CS.CV

ArXi:2510.02315v2 Announce Type: replace Text-to-image (T2I) models excel on single-entity prompts but struggle with multi-entity scenes, often exhibiting attribute leakage, identity entanglement, and subject omissions. We present a principled theoretical framework that steers sampling toward multi-subject fidelity by casting flow matching (FM) as stochastic optimal control (SOC), yielding a single hyperparameter controlled trade-off between fidelity and object-centric state separation / binding consistency. Within this framework, we derive two architecture-agnostic algorithms: (i) a.