Masked Autoencoders with Limited Data: Does It Work? A Fine-Grained Bioacoustics Case Study

ArXi:2605.14031v1 Announce Type: cross Bioacoustic recognition requires fine-grained acoustic understanding to distinguish similar-sounding species. However, many large-scale data repositories such as iNaturalist are weakly annotated, often with only a single positive species label per recording, making supervised learning particularly challenging. Inspired by advances in computer vision, recent approaches have shifted toward self-supervised learning to capture the underlying structure of audio without relying on exhaustive annotations.