PaliBench: A Multi-Reference Blueprint for Classical Language Translation Benchmarks

ArXi:2605.16881v1 Announce Type: new Digital humanities projects increasingly rely on machine translation and large language models to widen access to classical, religious, and otherwise under-translated textual traditions. Yet standard translation benchmarks are poorly suited to such materials: they typically compare a system output against a single reference translation, even though classical texts often multiple faithful renderings that differ in terminology, register, and interpretation. This article.