Beyond Description: Cognitively Benchmarking Fine-Grained Action for Embodied Agents

ArXi:2511.18685v3 Announce Type: replace Multimodal Large Language Models (MLLMs) show promising results as decision-making engines for embodied agents operating in complex, physical environments. However, existing benchmarks often prioritize high-level planning or spatial reasoning, leaving the fine-grained action intelligence required for embodied physical interaction underexplored. To address this gap, we