TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models

ArXi:2602.22827v2 Announce Type: replace-cross This paper presents a comprehensive evaluation framework for assessing the cultural competence of large language models (LLMs) in Persian. Existing Persian cultural benchmarks rely predominantly on multiple-choice formats and English-centric metrics that fail to capture Persian's morphological complexity and semantic nuance. Our framework