AI RESEARCH
Sequential Off-Policy Learning with Logarithmic Smoothing
arXiv CS.LG
•
ArXi:2506.10664v2 Announce Type: replace-cross Off-policy learning enables