AI RESEARCH
MA-VLCM: A Vision Language Critic Model for Value Estimation of Policies in Multi-Agent Team Settings
arXiv CS.AI
•
ArXi:2603.15418v1 Announce Type: cross Multi-agent reinforcement learning (MARL) commonly relies on a centralized critic to estimate the value function. However, learning such a critic from scratch is highly sample-inefficient and often lacks generalization across environments.