AI RESEARCH

WRF4CIR: Weight-Regularized Fine-Tuning Network for Composed Image Retrieval

arXiv CS.CV

ArXi:2604.05583v1 Announce Type: new Composed Image Retrieval (CIR) task aims to retrieve target images based on reference images and modification texts. Current CIR methods primarily rely on fine-tuning vision-language pre-trained models. However, we find that these approaches commonly suffer from severe overfitting, posing challenges for CIR with limited triplet data. To better understand this issue, we present a systematic study of overfitting in VLP-based CIR, revealing a significant and previously overlooked generalization gap across different models and datasets.