AI RESEARCH
SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models
arXiv CS.AI
•
ArXi:2603.16859v1 Announce Type: new Omni-modal large language models (OLMs) redefine human-machine interaction by natively integrating audio, vision, and text. However, existing OLM benchmarks remain anchored to static, accuracy-centric tasks, leaving a critical gap in assessing social interactivity, the fundamental capacity to navigate dynamic cues in natural dialogues.