Generalizable Prompt Tuning for Audio-Language Models via Semantic Expansion

ArXi:2601.20867v2 Announce Type: replace-cross Prompt tuning has achieved remarkable progress in vision-language models (VLMs) and is recently being adopted for audio-language models (ALMs). However, its generalization ability in ALMs remains largely underexplored. We observe that conventional prompt tuning for ALMs also suffers from the Base-New Tradeoff, and we identify that this issue stems from the disrupted semantic structure of the embedding space.