Bellman Calibration for $V$-Learning in Offline Reinforcement Learning

ArXi:2512.23694v2 Announce Type: replace-cross Reliable long-horizon value prediction is difficult in offline reinforcement learning because fitted value methods combine bootstrapping, function approximation, and distribution shift, while standard guarantees often require Bellman completeness or realizability. We