AI RESEARCH

Structural Equivalence and Learning Dynamics in Delayed MARL

arXiv CS.LG

ArXi:2605.04345v1 Announce Type: new We formally establish the equivalence between Observation Delay (OD) and Action Delay (AD) in cooperative partially observable multi-agent systems using observation-action histories. We show that both systems generate identical admissible joint-policy sets, and their induced state-action-observation trajectories are identical in distribution, leading to identical optimal solutions in Decentralized Partially Observable Marko Decision Processes (Dec-POMDPs.