Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents

ArXi:2510.14967v2 Announce Type: replace-cross Large language model (LLM)-based agents are increasingly trained with reinforcement learning (RL) to enhance their ability to interact with external environments through tool use, particularly in search-based settings that require multi-turn reasoning and knowledge acquisition. However, existing approaches typically rely on outcome-based rewards that are only provided exclusively upon generating the final answer.