AI RESEARCH

Lost in Translation: Do LVLM Judges Generalize Across Languages?

arXiv CS.CL

ArXi:2604.19405v1 Announce Type: new Automatic evaluators such as reward models play a central role in the alignment and evaluation of large vision-language models (LVLMs). Despite their growing importance, these evaluators are almost exclusively assessed on English-centric benchmarks, leaving open the question of how well these evaluators generalize across languages. To answer this question, we