Chunk-Guided Q-Learning

ArXi:2603.13971v1 Announce Type: cross In offline reinforcement learning (RL), single-step temporal-difference (TD) learning can suffer from bootstrapping error accumulation over long horizons. Action-chunked TD methods mitigate this by backing up over multiple steps, but can