Mind the Pause: Disfluency-Aware Objective Tuning for Multilingual Speech Correction with LLMs

ArXi:2605.12242v1 Announce Type: cross Automatic Speech Recognition (ASR) transcripts often contain disfluencies, such as fillers, repetitions, and false starts, which reduce readability and hinder downstream applications like chatbots and voice assistants. If left unaddressed, such disfluencies can significantly degrade the reliability of downstream systems. Most existing approaches rely on classical models that focus on identifying disfluent tokens for removal.