CLewR: Curriculum Learning with Restarts for Machine Translation Preference Learning

ArXi:2601.05858v2 Announce Type: replace-cross Large language models (LLMs) have nstrated competitive performance in zero-shot multilingual machine translation (MT). Some follow-up works further improved MT performance via preference optimization, but they leave a key aspect largely underexplored: the order in which data samples are given during