MA-VLCM: A Vision Language Critic Model for Value Estimation of Policies in Multi-Agent Team Settings

ArXi:2603.15418v1 Announce Type: cross Multi-agent reinforcement learning (MARL) commonly relies on a centralized critic to estimate the value function. However, learning such a critic from scratch is highly sample-inefficient and often lacks generalization across environments.