Oracle-Guided Soft Shielding for Safe Move Prediction in Chess

ArXi:2603.08506v1 Announce Type: new In high stakes environments, agents relying purely on imitation learning or reinforcement learning often struggle to avoid safety-critical errors during exploration. Existing reinforcement learning approaches for environments such as chess require hundreds of thousands of episodes and substantial computational resources to converge. Imitation learning, on the other hand, is sample efficient but is brittle under distributional shift and lacks mechanisms for proactive risk avoidance.