AI RESEARCH

ViBES: A Conversational Agent with Behaviorally-Intelligent 3D Virtual Body

arXiv CS.CV

ArXi:2512.14234v2 Announce Type: replace Human communication is inherently multimodal and social: words, prosody, and body language jointly carry intent. Yet most prior systems model human behavior as a translation task co-speech gesture or text-to-motion that maps a fixed utterance to motion clips-without requiring agentic decision-making about when to move, what to do, or how to adapt across multi-turn dialogue. This leads to brittle timing, weak social grounding, and fragmented stacks where speech, text, and motion are trained or inferred in isolation. We.