DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models

ArXi:2603.18048v1 Announce Type: new Recent Audio Multimodal Large Language Models (Audio MLLMs) nstrate impressive performance on speech benchmarks, yet it remains unclear whether these models genuinely process acoustic signals or rely on text-based semantic inference. To systematically study this question, we