Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax

ArXi:2605.14366v1 Announce Type: cross Extending large language models (LLMs) to low-resource languages often incurs an "alignment tax": improvements in the target language come at the cost of catastrophic forgetting in general capabilities. We argue that this trade-off arises from the rigidity of supervised fine-tuning (SFT), which enforces token-level surface imitation on narrow and biased data distributions.