How do Granite-4.0-1b-speech, Qwen3-ASR-1.7B, and Voxtral Mini 4B Realtime compare?
r/LocalLLaMA
•
Generative AI
NLP
Open Source AI
AI Tools
I haven’t been following open-source ASR that much recently, but I have a new use case, so diving back in. The current top 3 models on HuggingFace options look quite different: IBM’s **Granite-4.0-1b-speech** (1B params), Alibaba’s **Qwen3-ASR-1.7B** (1.7B params), and Mistral’s **Voxtral Mini 4B Realtime** (4B params). All Apache 2.0 licensed, all targeting speech recognition, but they seem to be solving fundamentally different problems. I’d love to hear from anyone who’s actually deployed or benchmarked these head-to-head.