TAPO: Translation Augmented Policy Optimization for Multilingual Mathematical Reasoning

ArXi:2603.25419v1 Announce Type: new Large Language Models (LLMs) have nstrated remarkable proficiency in English mathematical reasoning, yet a significant performance disparity persists in multilingual contexts, largely attributed to deficiencies in language understanding. To bridge this gap, we