When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition

ArXi:2605.02782v1 Announce Type: cross Automatic speech recognition (ASR) systems remain brittle on dysarthric and other atypical speech. Recent audio-language models raise the possibility of improving performance by conditioning on additional clinical context at inference time, but it is unclear whether these models can make use of such information. We