Closing the Speech-Text Gap with Limited Audio for Effective Domain Adaptation in LLM-Based ASR

ArXi:2604.06487v1 Announce Type: new Conventional end-to-end automatic speech recognition (ASR) systems rely on paired speech-text data for domain adaptation. Recent LLM-based ASR architectures connect a speech encoder to a large language model via a projection module, enabling adaptation with text-only data. However, this