AI RESEARCH

Generalizable Audio-Visual Navigation via Binaural Difference Attention and Action Transition Prediction

arXiv CS.AI

ArXi:2604.05007v1 Announce Type: cross In Audio-Visual Navigation (AVN), agents must locate sound sources in unseen 3D environments using visual and auditory cues. However, existing methods often struggle with generalization in unseen scenarios, as they tend to overfit to semantic sound features and specific