Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

ArXi:2603.13045v1 Announce Type: new Large Language Models (LLMs) have nstrated remarkable capability in machine translation on high-resource language pairs, yet their performance on low-resource translation still lags behind. Existing post-