AI RESEARCH
Search Self-play: Pushing the Frontier of Agent Capability without Supervision
arXiv CS.LG
•
ArXi:2510.18821v3 Announce Type: replace Reinforcement learning with verifiable rewards (RLVR) has become the mainstream technique for