AI RESEARCH
StoryTR: Narrative-Centric Video Temporal Retrieval with Theory of Mind Reasoning
arXiv CS.AI
•
ArXi:2604.23198v1 Announce Type: new Current video moment retrieval excels at action-centric tasks but struggles with narrative content. Models can see \textit{what is happening} but fail to reason \textit{why it matters}. This semantic gap stems from the lack of \textbf{Theory of Mind (ToM)}: the cognitive ability to infer implicit intentions, mental states, and narrative causality from surface-level observations. We