Adaptive Discovery of Interpretable Audio Attributes with Multimodal LLMs for Low-Resource Classification

ArXi:2603.06991v1 Announce Type: cross In predictive modeling for low-resource audio classification, extracting high-accuracy and interpretable attributes is critical. Particularly in high-reliability applications, interpretable audio attributes are indispensable. While human-driven attribute discovery is effective, its low throughput becomes a bottleneck. We propose a method for adaptively discovering interpretable audio attributes using Multimodal Large Language Models (MLLMs