Hierarchical Behaviour Spaces

ArXi:2604.24558v1 Announce Type: cross Recent work in hierarchical reinforcement learning has shown success in scaling to billions of timesteps when learning over a set of predefined option reward functions. We show that, instead of using a single reward function per option, the reward functions can be effectively used to induce a space of behaviours, by letting the controller specify linear combinations over reward functions, allowing a expressive set of policies to be represented. We call this method Hierarchical Behaviour Spaces.