AI RESEARCH

On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents

arXiv CS.AI

ArXi:2603.12109v1 Announce Type: new Reinforcement learning (RL) with outcome-based rewards has achieved significant success in