Beyond Noisy-TVs: Noise-Robust Exploration Via Learning Progress Monitoring

ArXi:2509.25438v2 Announce Type: replace-cross When there exists an unlearnable source of randomness (noisy-TV) in the environment, a naively intrinsic reward driven exploring agent gets stuck at that source of randomness and fails at exploration. Intrinsic reward based on uncertainty estimation or distribution similarity, while eventually escapes noisy-TVs as time unfolds, suffers from poor sample efficiency and high computational cost.