AI RESEARCH
JURY-RL: Votes Propose, Proofs Dispose for Label-Free RLVR
arXiv CS.AI
•
ArXi:2604.25419v1 Announce Type: new Reinforcement learning with verifiable rewards (RLVR) enhances the reasoning of large language models (LLMs), but standard RLVR often depends on human-annotated answers or carefully curated reward specifications. In machine-checkable domains, label-free alternatives such as majority voting or LLM-as-a-judge remove annotation cost but can