AI RESEARCH
Bellman Calibration for $V$-Learning in Offline Reinforcement Learning
arXiv CS.LG
•
ArXi:2512.23694v2 Announce Type: replace-cross Reliable long-horizon value prediction is difficult in offline reinforcement learning because fitted value methods combine bootstrapping, function approximation, and distribution shift, while standard guarantees often require Bellman completeness or realizability. We