AI RESEARCH

Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling

arXiv CS.AI

ArXi:2603.11971v1 Announce Type: cross Emotion recognition in in-the-wild video data remains a challenging problem due to large variations in facial appearance, head pose, illumination, background noise, and the inherently dynamic nature of human affect. Relying on a single modality, such as facial expressions or speech, is often insufficient to capture these complex emotional cues. To address this issue, we propose a multimodal emotion recognition framework for the Expression (EXPR) Recognition task in the 10th Affective Behavior Analysis in-the-wild (ABAW) Challenge.