Towards Inclusive Communication: A Unified Framework for Generating Spoken Language from Sign, Lip, and Audio

ArXi:2508.20476v3 Announce Type: replace Audio is the primary modality for human communication and has driven the success of Automatic Speech Recognition (ASR) technologies. However, such audio-centric systems inherently exclude individuals who are deaf or hard of hearing. Visual alternatives such as sign language and lip reading offer effective substitutes, and recent advances in Sign Language Translation (SLT) and Visual Speech Recognition (VSR) have improved audio-less communication.