Direct Preference Optimization for Primitive-Enabled Hierarchical RL: A Bilevel Approach

ArXi:2411.00361v4 Announce Type: replace Hierarchical reinforcement learning (HRL) enables agents to solve complex, long-horizon tasks by decomposing them into manageable sub-tasks. However, HRL methods face two fundamental challenges: (i) non-stationarity caused by the evolving lower-level policy during