AI RESEARCH

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

arXiv CS.CL

ArXi:2509.22220v2 Announce Type: replace Prevalent semantic speech tokenizers, designed to capture linguistic content, are surprisingly fragile. We find they are not robust to meaning-irrelevant acoustic perturbations; even at high Signal-to-Noise Ratios (SNRs) where speech is perfectly intelligible, their output token sequences can change drastically, increasing the learning burden for downstream LLMs. This instability stems from two flaws: a brittle single-path quantization architecture and a distant.