Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients

ArXi:2605.14297v1 Announce Type: new We study reinforcement learning in hybrid discrete-continuous action spaces, such as settings where the discrete component selects a regime (or index) and the continuous component optimizes within it -- a structure common in robotics, control, and operations problems. Standard model-free policy gradient methods rely on score-function (SF) estimators and suffer from severe credit-assignment issues in high-dimensional settings, leading to poor gradient quality.