UniWhisper: Efficient Continual Multi-task Training for Robust Universal Audio Representation

ArXi:2602.21772v2 Announce Type: replace-cross A universal audio representation should capture fine-grained speech cues and high-level semantics for environmental sounds and music in a single encoder. Existing encoders often excel in one domain but degrade in others. We propose UniWhisper, an efficient continual multi-task