Temporal Contrastive Decoding: A Training-Free Method for Large Audio-Language Models

ArXi:2604.15383v1 Announce Type: cross Large audio-language models (LALMs) generalize across speech, sound, and music, but unified decoders can exhibit a \emph{temporal smoothing bias}: transient acoustic cues may be underutilized in favor of temporally smooth context that is better ed by language priors, leading to less specific audio-grounded outputs. We propose \emph{Temporal Contrastive Decoding} (TCD), a