Early Semantic Grounding in Image Editing Models for Zero-Shot Referring Image Segmentation

ArXi:2605.13122v1 Announce Type: new Instruction-based image editing (IIE) models have recently nstrated strong capability in modifying specific image regions according to natural language instructions, which implicitly requires identifying where an edit should be applied. This indicates that such models inherently perform language-conditioned visual semantic grounding.