Maximum Entropy Behavior Exploration for Sim2Real Zero-Shot Reinforcement Learning

ArXi:2603.25464v1 Announce Type: cross Zero-shot reinforcement learning (RL) algorithms aim to learn a family of policies from a reward-free dataset, and recover optimal policies for any reward function directly at test time. Naturally, the quality of the pre