AI RESEARCH
Direct Preference Optimization for Primitive-Enabled Hierarchical RL: A Bilevel Approach
arXiv CS.LG
•
ArXi:2411.00361v4 Announce Type: replace Hierarchical reinforcement learning (HRL) enables agents to solve complex, long-horizon tasks by decomposing them into manageable sub-tasks. However, HRL methods face two fundamental challenges: (i) non-stationarity caused by the evolving lower-level policy during