PEGRL: Improving Machine Translation by Post-Editing Guided Reinforcement Learning

ArXi:2602.03352v2 Announce Type: replace Reinforcement learning (RL) has shown strong promise for LLM-based machine translation, with recent methods such as GRPO nstrating notable gains; nevertheless, translation-oriented RL remains challenged by noisy learning signals arising from Monte Carlo return estimation, as well as a large trajectory space that favors global exploration over fine-grained local optimization. We