AI RESEARCH
Video-Only ToM: Enhancing Theory of Mind in Multimodal Large Language Models
arXiv CS.CV
•
ArXi:2603.24484v1 Announce Type: new As large language models (LLMs) continue to advance, there is increasing interest in their ability to infer human mental states and nstrate a human-like Theory of Mind (ToM). Most existing ToM evaluations, however, are centered on text-based inputs, while scenarios relying solely on visual information receive far less attention. This leaves a gap, since real-world human-AI interaction typically requires multimodal understanding.