Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards

ArXi:2510.16187v2 Announce Type: replace-cross Real-world multi-agent systems may require ad hoc teaming, where an agent must coordinate with other previously unseen teammates to solve a task in a zero-shot manner. Prior work often either selects a pretrained policy based on an inferred model of the new teammates or pretrains a single policy that is robust to potential teammates. Instead, we propose to leverage all pretrained policies in a zero-shot transfer setting.