AI RESEARCH
SIV-Bench: A Video Benchmark for Social Interaction Understanding and Reasoning
arXiv CS.CV
•
ArXi:2506.05425v2 Announce Type: replace Understanding social interaction, which encompasses perceiving numerous and subtle multimodal cues, inferring unobservable mental states and relations, and dynamically predicting others' behavior, is the foundation for achieving human-machine interaction. Despite rapid advances in Multimodal Large Language Models (MLLMs), the rich and multifaceted nature of social interaction has hindered the development of benchmarks that holistically evaluate and guide their social interaction abilities.