AI RESEARCH

Decoupling Endpoint and Semantic Transition Learning for Zero-Shot Composed Image Retrieval

arXiv CS.AI

ArXi:2605.08389v1 Announce Type: cross Zero-shot composed image retrieval (ZS-CIR) retrieves a target image from a reference image and a text modification without human-annotated CIR triplets. Projection-based ZS-CIR methods are attractive because they do not rely on LLMs at inference and remain lightweight, but they often underperform LLM-based approaches on complex semantic modifications.