SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans

ArXi:2603.07853v1 Announce Type: cross Research Agents enable models to gather information from the web using tools to answer user queries, requiring them to dynamically interleave internal reasoning with tool use. While such capabilities can in principle be learned via reinforcement learning with verifiable rewards (RLVR), we observe that agents often exhibit poor exploration behaviors, including premature termination and biased tool usage. As a result, RLVR alone yields limited improvements.