WhisperPipe: A Resource-Efficient Streaming Architecture for Real-Time Automatic Speech Recognition

ArXi:2604.25611v1 Announce Type: new Real-time automatic speech recognition (ASR) systems face a fundamental trade-off between transcription accuracy and computational efficiency, particularly when deploying large-scale transformer models like Whisper. Existing streaming approaches either sacrifice accuracy through aggressive chunking or incur prohibitive memory costs through unbounded context accumulation.