AI RESEARCH
Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation
arXiv CS.CL
•
ArXi:2603.13045v1 Announce Type: new Large Language Models (LLMs) have nstrated remarkable capability in machine translation on high-resource language pairs, yet their performance on low-resource translation still lags behind. Existing post-