Open-Source Image Editing Models Are Zero-Shot Vision Learners

ArXi:2605.04566v1 Announce Type: cross Recent studies have shown that large generative models can solve vision tasks they were not explicitly trained for. However, existing evidence relies on closed-source models~(Veo~3, Nano Banana Pro) or requires task-specific instruction tuning, leaving open whether publicly available image-editing models possess zero-shot vision abilities out of the box.