MULTI: Disentangling Camera Lens, Sensor, View, and Domain for Novel Image Generation

ArXi:2605.12134v1 Announce Type: cross Recent text-to-image models produce high-quality images, yet text ambiguity hinders precise control when specific styles or objects are required. There have been a number of recent works dealing with learning and composing multiple objects and patterns. However, current work focuses almost entirely on image content, overlooking imaging factors such as camera lens, sensor types, imaging viewpoints, and scenes' domain characteristics. We