AI RESEARCH
Decoupling Endpoint and Semantic Transition Learning for Zero-Shot Composed Image Retrieval
arXiv CS.AI
•
ArXi:2605.08389v1 Announce Type: cross Zero-shot composed image retrieval (ZS-CIR) retrieves a target image from a reference image and a text modification without human-annotated CIR triplets. Projection-based ZS-CIR methods are attractive because they do not rely on LLMs at inference and remain lightweight, but they often underperform LLM-based approaches on complex semantic modifications.