GreekMMLU: A Native-Sourced Multitask Benchmark for Evaluating Language Models in Greek

ArXi:2602.05150v2 Announce Type: replace Large Language Models (LLMs) are commonly trained on multilingual corpora that include Greek, yet reliable evaluation benchmarks for Greek-particularly those based on authentic, native-sourced content-remain limited. Existing datasets are often machine-translated from English, failing to capture Greek linguistic and cultural characteristics. We