MS-Mix: Sentiment-Guided Adaptive Augmentation for Multimodal Sentiment Analysis

ArXi:2510.11579v3 Announce Type: replace-cross Multimodal Sentiment Analysis (MSA) integrates complementary features from text, video, and audio for robust emotion understanding in human interactions. However, models suffer from severe data scarcity and high annotation costs, severely limiting real-world deployment in social media analytics and human-computer systems. Existing Mixup-based augmentation techniques, when naively applied to MSA, often produce semantically inconsistent samples and amplified label noise by ignoring emotional semantics across modalities.