AI RESEARCH

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

arXiv CS.AI

ArXi:2604.03016v1 Announce Type: new Multimodal Large Language Models (MLLMs) are evolving from passive observers into active agents, solving problems through Visual Expansion (invoking visual tools) and Knowledge Expansion (open-web search). However, existing evaluations fall short: they lack flexible tool integration, test visual and search tools separately, and evaluate primarily by final answers. Consequently, they cannot verify if tools were actually invoked, applied correctly, or used efficiently. To address this, we