AI RESEARCH

Team RAS in 10th ABAW Competition: Multimodal Valence and Arousal Estimation Approach

arXiv CS.AI

ArXi:2603.13056v1 Announce Type: cross Continuous emotion recognition in terms of valence and arousal under in-the-wild (ITW) conditions remains a challenging problem due to large variations in appearance, head pose, illumination, occlusions, and subject-specific patterns of affective expression. We present a multimodal method for valence-arousal estimation ITW. Our method combines three complementary modalities: face, behavior, and audio. The face modality relies on GRADA-based frame-level embeddings and Transformer-based temporal regression.