Decoupling Exploration and Policy Optimization: Uncertainty Guided Tree Search for Hard Exploration

ArXi:2603.22273v1 Announce Type: new The process of discovery requires active exploration -- the act of collecting new and informative data. However, efficient autonomous exploration remains a major unsolved problem. The dominant paradigm addresses this challenge by using Reinforcement Learning (RL) to train agents with intrinsic motivation, maximizing a composite objective of extrinsic and intrinsic rewards.