AI RESEARCH
TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval
arXiv CS.CV
•
ArXi:2604.21806v1 Announce Type: new Composed Image Retrieval (CIR) is an important image retrieval paradigm that enables users to retrieve a target image using a multimodal query that consists of a reference image and modification text. Although research on CIR has made significant progress, prevailing setups still rely simple modification texts that typically cover only a limited range of salient changes, which induces two limitations highly relevant to practical applications, namely Insufficient Entity Coverage and Clause-Entity Misalignment.