AI RESEARCH

SOUPLE: Enhancing Audio-Visual Localization and Segmentation with Learnable Prompt Contexts

arXiv CS.CV • March 25, 2026

ArXi:2603.22732v1 Announce Type: new Large-scale pre-trained image-text models exhibit robust multimodal representations, yet applying the Contrastive Language-Image Pre-

Read Full Article