Adaptive Q-Chunking for Offline-to-Online Reinforcement Learning

ArXi:2605.05544v1 Announce Type: new Offline-to-online reinforcement learning with action chunking eliminates multi-step off-policy bias and enables temporally coherent exploration, but all existing methods use a fixed chunk size across every state. This is suboptimal: near contact events the agent needs short chunks for reactive control, while during free-space motion long chunks provide better credit assignment.