Metric-Gradient Projection for Stable Multi-Agent Policy Learning

ArXi:2605.18809v1 Announce Type: cross General-sum multi-agent learning is often governed by a stacked update field in which each agent's policy update changes the optimization landscape faced by the others. This coupling can entangle an integrable component of collective improvement with cyclic interaction dynamics, leading to slow or unstable multi-agent learning.