Geo-ATBench: A Benchmark for Geospatial Audio Tagging with Geospatial Semantic Context

ArXi:2603.10623v1 Announce Type: cross Environmental sound understanding in computational auditory scene analysis (CASA) is often formulated as an audio-only recognition problem. This formulation leaves a persistent drawback in multi-label audio tagging (AT): acoustic similarity can make certain events difficult to separate from waveforms alone. In such cases, disambiguating cues often lie outside the waveform.