Robust Multi-Agent Path Finding under Observation Attacks: A Principled Adversarial-Plus-Smoothing Training Recipe

ArXi:2605.11469v1 Announce Type: new Decentralized multi-agent path finding (MAPF) routes a team of agents on a shared grid, each acting from its own local view. The standard solution trains one shared neural policy with Proximal Policy Optimization (PPO), a popular on-policy reinforcement learning algorithm. Such a policy works well on clean observations, but a small input perturbation on one agent often changes its action, which then blocks a neighbour, and the team jams. In this paper we present two.