WST-X Series: Wavelet Scattering Transform for Interpretable Speech Deepfake Detection

ArXi:2602.02980v2 Announce Type: replace-cross In this work, we focus on front-end design for speech deepfake detectors, the component that determines the discriminative acoustic cues provided to the classifier. Existing approaches are primarily categorized into two types. Hand-crafted filterbank features are transparent but limited in capturing higher-level information. SSL features, in turn, lack interpretability and may overlook fine-grained spectral anomalies.