Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning

ArXi:2506.04626v2 Announce Type: replace-cross Motivated by real-world settings where data collection and policy deployment -- whether for a single agent or across multiple agents -- are costly, we study the problem of on-policy single-agent reinforcement learning (RL) and federated RL (FRL) with a focus on minimizing burn-in costs (the sample sizes needed to reach near-optimal regret) and policy switching or communication costs.