Enhancing Sample Efficiency in Multi-Agent RL with Uncertainty Quantification and Selective Exploration

ArXi:2506.02841v3 Announce Type: replace-cross Multi-agent reinforcement learning (MARL) methods have achieved state-of-the-art results on a range of multi-agent tasks. Yet, MARL algorithms typically require significantly environment interactions than their single-agent counterparts to converge, a problem exacerbated by the difficulty in exploring over a large joint action space and the high variance intrinsic to MARL environments.