Learning to Align Generative Appearance Priors for Fine-grained Image Retrieval

ArXi:2605.09859v1 Announce Type: new Fine-grained image retrieval (FGIR) typically relies on supervision from seen categories to learn discriminative embeddings for retrieving unseen categories. However, such supervision often biases retrieval models toward the semantics of seen categories rather than the underlying appearance characteristics that generalize across categories, thereby limiting retrieval performance on unseen categories.