AI RESEARCH
Bootstrapped Mixed Rewards for RL Post-Training: Injecting Canonical Action Order
arXiv CS.LG
•
ArXi:2512.04277v3 Announce Type: replace Post-