AI RESEARCH
Robust Audio-Text Retrieval via Cross-Modal Attention and Hybrid Loss
arXiv CS.CL
•
ArXi:2604.23323v1 Announce Type: new Audio-text retrieval enables semantic alignment between audio content and natural language queries, ing applications in multimedia search, accessibility, and surveillance. However, current state-of-the-art approaches struggle with long, noisy, and weakly labeled audio due to their reliance on contrastive learning and large-batch