DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning

ArXi:2510.09255v4 Announce Type: replace Enhancing LLMs with the ability to actively search external knowledge is crucial for complex and real-world tasks. Current approaches either rely on prompting to elicit the model's innate agent capabilities, or suffer from performance ceilings and collapse when applying RL to complex interactive tasks, leaving their true agentic potential untapped. To address this, we