AI RESEARCH

From Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editing

arXiv CS.CV

ArXi:2605.15181v1 Announce Type: new Modern image editing models produce realistic results but struggle with abstract, multi step instructions (e.g., ``make this advertisement vegetarian-friendly''). Prior agent based methods decompose such tasks but rely on handcrafted pipelines or teacher imitation, limiting flexibility and decoupling learning from actual editing outcomes. We propose an experiential framework for long-horizon image editing, where a planner generates structured atomic decompositions and an orchestrator selects tools and regions to execute each step.