ITIScore: An Image-to-Text-to-Image Rating Framework for the Image Captioning Ability of MLLMs

ArXi:2604.03765v1 Announce Type: new Recent advances in multimodal large language models (MLLMs) have greatly improved image understanding and captioning capabilities. However, existing image captioning benchmarks typically suffer from limited diversity in caption length, the absence of recent advanced MLLMs, and insufficient human annotations, which potentially