Quality-Aware Exploration Budget Allocation for Cooperative Multi-Agent Reinforcement Learning

ArXi:2605.01865v1 Announce Type: cross Cooperative multi-agent reinforcement learning (MARL) requires agents to discover joint strategies in a combinatorially large state-action space, yet effective coordination configurations are exceedingly rare. Intrinsic motivation, which augments task rewards with novelty bonuses, is a popular approach for driving exploration, but its effectiveness hinges on the exploration intensity $\beta$, where too large a value overwhelms the task signal and causes coordination collapse, while too small a value prevents discovery of rare strategies.