Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues

ArXi:2605.00119v1 Announce Type: cross There is a significant gap in evaluating cultural reasoning in LLMs using conversational datasets that capture culturally rich and dialectal contexts. Most Arabic benchmarks focus on short text snippets in Modern Standard Arabic (MSA), overlooking the cultural nuances that naturally arise in dialogues. To address this gap, we