AI RESEARCH

Lever: Inference-Time Policy Reuse under Support Constraints

arXiv CS.LG

ArXi:2604.20174v1 Announce Type: new Reinforcement learning (RL) policies are typically trained for fixed objectives, making reuse difficult when task requirements change. We study inference-time policy reuse: given a library of pre-trained policies and a new composite objective, can a high-quality policy be constructed entirely offline, without additional environment interaction? We