Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains

ArXi:2605.18261v1 Announce Type: new Reinforcement learning with verifiable rewards (RLVR) has nstrated promising potential to enhance the reasoning capabilities of large language models (LLMs) in domains such as mathematics and coding. However, its applications on knowledge-intensive domains have not been effectively explored due to the scarcity of high-quality verifiable data. Furthermore, current RLVR focuses solely on the correctness of final answers, leading to the limitations of flawed reasoning and sparse reward signals.