AI RESEARCH

Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning

arXiv CS.CV

ArXi:2604.04746v1 Announce Type: new Humans paint images incrementally: they plan a global layout, sketch a coarse draft, inspect, and refine details, and most importantly, each step is grounded in the evolving visual states. However, can unified multimodal models trained on text-image interleaved datasets also imagine the chain of intermediate states? In this paper, we