AI RESEARCH

When (and How) to Trust the Expert: Diagnosing Query-Time Expert-Guided Reinforcement Learning

arXiv CS.AI

ArXi:2605.09109v1 Announce Type: new Many continuous-control problems ship with a competent but suboptimal controller (a tuned PID, a hand-designed gait). A growing family of methods uses such controllers as queryable experts during RL, but each method has been proposed in isolation, on a different benchmark, without imperfect-expert testing. We harmonize the comparison on a shared SAC backbone, common HPO and evaluation protocols, 100/50 seeds per (en, method), and a degradation sweep over expert undertuning, action bias, and observation noise.