AI RESEARCH

Reading the Mood Behind Words: Integrating Prosody-Derived Emotional Context into Socially Responsive VR Agents

arXiv CS.AI

ArXi:2603.09324v1 Announce Type: cross In VR interactions with embodied conversational agents, users' emotional intent is often conveyed by how something is said than by what is said. However, most VR agent pipelines rely on speech-to-text processing, discarding prosodic cues and often producing emotionally incongruent responses despite correct semantics. We propose an emotion-context-aware VR interaction pipeline that treats vocal emotion as explicit dialogue context in an LLM-based conversational agent.