AI RESEARCH

When Policies Cannot Be Retrained: A Unified Closed-Form View of Post-Training Steering in Offline Reinforcement Learning

arXiv CS.LG

ArXi:2604.22873v1 Announce Type: new Offline reinforcement learning (RL) can, and in many applications the trained actor cannot be retrained because of data, cost, or governance constraints. We study deployment-time adaptation for frozen offline actors using Product-of-Experts (PoE) composition with a goal-conditioned prior.