Fully Decentralized Cooperative Multi-Agent Reinforcement Learning is A Context Modeling Problem

ArXi:2509.15519v2 Announce Type: replace This paper studies fully decentralized cooperative multi-agent reinforcement learning, where each agent solely observes the states, its local actions, and the shared rewards. The inability to access other agents' actions often leads to non-stationarity during value function updates and relative overgeneralization during value function estimation, hindering effective cooperative policy learning.