AI RESEARCH

CulturALL: Benchmarking Multilingual and Multicultural Competence of LLMs on Grounded Tasks

arXiv CS.AI

ArXi:2604.19262v1 Announce Type: cross Large language models (LLMs) are now deployed worldwide, inspiring a surge of benchmarks that measure their multilingual and multicultural abilities. However, these benchmarks prioritize generic language understanding or superficial cultural trivia, leaving the evaluation of grounded tasks -- where models must reason within real-world, context-rich scenarios -- largely unaddressed. To fill this gap, we present CulturALL, a comprehensive and challenging benchmark to assess LLMs' multilingual and multicultural competence on grounded tasks.