MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks

ArXi:2507.23511v3 Announce Type: replace-cross While large audio-language models have advanced open-ended audio understanding, they still fall short of nuanced human-level comprehension. This gap persists largely because current benchmarks, limited by data annotations and evaluation metrics, fail to reliably distinguish between generic and highly detailed model outputs. To this end, this work