VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments

ArXi:2506.02387v3 Announce Type: replace Recent advancements in Vision Language Models (VLMs) have expanded their capabilities to interactive agent tasks, yet existing benchmarks remain limited to single-agent or text-only environments. In contrast, real-world scenarios often involve multiple agents interacting within rich visual and textual contexts, posing challenges with both multimodal observations and strategic interactions. To bridge this gap, we