Restless Bandits with Individual Penalty Constraints: A New Near-Optimal Index Policy and How to Learn It

ArXi:2604.04101v1 Announce Type: new This paper investigates the Restless Multi-Armed Bandit (RMAB) framework under individual penalty constraints to address resource allocation challenges in dynamic wireless networked environments. Unlike conventional RMAB models, our model allows each user (arm) to have distinct and stringent performance constraints, such as energy limits, activation limits, or age of information minimums, enabling the capture of diverse objectives including fairness and efficiency.