The Horizon Threshold in Cooperative Multi-Agent Reward-Free Exploration

ArXi:2602.01453v3 Announce Type: replace We study cooperative multi-agent reinforcement learning in the setting of reward-free exploration, where multiple agents jointly explore an unknown MDP in order to learn its dynamics (without observing rewards). We focus on a tabular finite-horizon MDP and adopt a phased learning framework. In each learning phase, multiple agents independently interact with the environment. specifically, in each learning phase, each agent is assigned a policy, executes it, and observes the resulting trajectory.