Caption-Matching: A Multimodal Approach for Cross-Domain Image Retrieval

ArXi:2403.15152v3 Announce Type: replace Cross-Domain Image Retrieval (CDIR) is a challenging task in computer vision, aiming to match images across different visual domains such as sketches, paintings, and photographs. Existing CDIR methods rely either on supervised learning with labeled cross-domain correspondences or on methods that require