SENS-ASR: Semantic Embedding injection in Neural-transducer for Streaming Automatic Speech Recognition

ArXi:2603.10005v1 Announce Type: cross Many Automatic Speech Recognition (ASR) applications require streaming processing of the audio data. In streaming mode, ASR systems need to start transcribing the input stream before it is complete, i.e., the systems have to process a stream of inputs with a limited (or no) future context. Compared to offline mode, this reduction of the future context degrades the performance of Streaming-ASR systems, especially while working with low-latency constraint.