AaSP: Aliasing-aware Self-Supervised Pre-Training for Audio Spectrogram Transformers

ArXi:2512.03637v2 Announce Type: replace-cross Transformer-based audio self-supervised learning (SSL) models commonly use spectrograms, vision-style Transformers, and masked modeling objectives. However, convolutional patchification with temporal downsampling lowers the effective Nyquist frequency and