Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction

ArXi:2603.23550v1 Announce Type: new Multi-turn human-AI collaboration is fundamental to deploying interactive services such as adaptive tutoring, conversational recommendation, and professional consultation. However, optimizing these interactions via reinforcement learning is hindered by the sparsity of verifiable intermediate rewards and the high stochasticity of user responses. To address these challenges, we