AI RESEARCH
MSD-Score: Multi-Scale Distributional Scoring for Reference-Free Image Caption Evaluation
arXiv CS.CV
•
ArXi:2605.06080v1 Announce Type: new Evaluating image captions without references remains challenging because global embedding similarity often misses fine-grained mismatches such as hallucinated objects, missing attributes, or incorrect relations. We propose MSD-Score, a reference-free metric that models image patch and text token embeddings as von Mises-Fisher mixtures on the unit hypersphere. Instead of treating each modality as a single point, MSD-Score formulates image-text matching as a multi-scale distributional scoring problem.