Learning Multi-Timescale Abstractions for Hierarchical Combinatorial Planning

ArXi:2605.17058v1 Announce Type: new The combination of exponentially large action spaces, stochastic dynamics, and long-horizon decision-making under limited resources makes Sequential Stochastic Combinatorial Optimization (SSCO) particularly challenging for reinforcement learning. Hierarchical Reinforcement Learning (HRL) offers a natural decomposition, but it places the high-level policy in a Semi-Marko Decision Process (SMDP) where actions have variable durations, making it difficult to learn a world model that is suitable for planning. We.