AI RESEARCH
UniCVR: From Alignment to Reranking for Unified Zero-Shot Composed Visual Retrieval
arXiv CS.CV
•
ArXi:2604.20318v1 Announce Type: new Composed image retrieval, multi-turn composed image retrieval, and composed video retrieval all share a common paradigm: composing the reference visual with modification text to retrieve the desired target. Despite this shared structure, the three tasks have been studied in isolation, with no prior work proposing a unified framework, let alone a zero-shot solution. In this paper, we propose UniCVR, the first unified zero-shot composed visual retrieval framework that jointly addresses all three tasks without any task-specific human-annotated data.