AI RESEARCH
Beyond Variance: Prompt-Efficient RLVR via Rare-Event Amplification and Bidirectional Pairing
arXiv CS.AI
•
ArXi:2602.03452v2 Announce Type: replace-cross Reinforcement learning with verifiable rewards (RLVR) is effective for