Evaluating Remote Sensing Image Captions Beyond Metric Biases

ArXi:2604.22855v1 Announce Type: new The core objective of image captioning is to achieve lossless semantic compression from visual signals into textual modalities. However, the reliance on manually curated reference texts for evaluation essentially forces models to mimic specific human annotation styles, thereby masking the true descriptive capabilities of advanced foundation models.