MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese

ArXi:2604.25926v1 Announce Type: new The use of large language models (LLMs) for complex mathematical reasoning is an emergent area of research, with fast progress in methods, models, and benchmark datasets. However, most mathematical reasoning evaluations exhibit a significant linguistic bias, with the vast majority of benchmark datasets being exclusively in English or (at best) translated from English. We address this limitation by