AI RESEARCH
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
arXiv CS.LG
•
ArXi:2605.10899v1 Announce Type: cross