Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models

ArXi:2603.10098v1 Announce Type: cross Recent advances in multi-agent reinforcement learning, particularly Policy-Space Response Oracles (PSRO), have enabled the computation of approximate game-theoretic equilibria in increasingly complex domains. However, these methods rely on deep reinforcement learning oracles that produce `black-box' neural network policies, making them difficult to interpret, trust or debug. We