MiMo-V2.5-ASR is an 8B open-source speech recognition model from Xiaomi that transcribes Mandarin, English, eight Chinese dialects, code-switched speech, and song lyrics. Built for ML engineers, researchers, and developers building real-world voice applications.
Whisper changed what people expected from open-source ASR. Three years later, the leaderboard looks very different.
What it is: MiMo-V2.5-ASR is an 8B open-source speech recognition model from Xiaomi MiMo, MIT-licensed and available on HuggingFace, built for bilingual Chinese-English transcription across dialects, noisy audio, code-switched speech, and song lyrics.
The problem: most ASR models are benchmarked on clean studio data and deployed into the real world, where audio is noisy, speakers overlap, and people switch languages mid-sentence. The gap between benchmark accuracy and production accuracy is where voice products quietly fail.
The solution: staged training combining large-scale mid-training, supervised fine-tuning, and a reinforcement learning algorithm specifically targeting the scenarios where conventional models break down. Native punctuation from prosody means transcripts arrive ready to use.
What makes it different: on the Open ASR Leaderboard, MiMo-V2.5-ASR posts 5.73% average WER on English, below Whisper large-v3 at 7.44%. On Wu dialect it scores 19.55% vs FunASR-1.5 at 29.08%. On lyrics, 3.95% on m4singer vs Gemini 2.5 Pro at 4.25%. These are not cherry-picked scenarios — they are the hard ones.
Key features:
Eight Chinese dialects natively supported, including Wu, Cantonese, Hokkien, Sichuanese
Chinese-English code-switching with no language tags
Lyrics transcription under accompaniment and pitch variation
Multi-speaker and noisy environment robustness
Native punctuation, no post-processing needed
MIT license, Python API, Gradio demo, self-hostable
Benefits:
Production-grade accuracy on the audio conditions that actually exist in the field
One model replaces multiple regional or domain-specific ASR solutions
Self-hosting eliminates per-call API costs and keeps data on your infra
Ready-to-use punctuated output cuts one step from every downstream pipeline
Who it's for: ML engineers and voice product teams building bilingual or Chinese-language transcription pipelines who need accuracy that holds up outside the lab.
Open-source ASR has been catching up to closed models for years. MiMo-V2.5-ASR is a data point that the gap is now very small, and in some scenarios gone.
No comment highlights available yet. Please check back later!
About MiMo-V2.5 Voice on Product Hunt
“Bilingual ASR for dialects, code-switching, and songs”
MiMo-V2.5 Voice launched on Product Hunt on April 25th, 2026 and earned 114 upvotes and 1 comments, placing #6 on the daily leaderboard. MiMo-V2.5-ASR is an 8B open-source speech recognition model from Xiaomi that transcribes Mandarin, English, eight Chinese dialects, code-switched speech, and song lyrics. Built for ML engineers, researchers, and developers building real-world voice applications.
MiMo-V2.5 Voice was featured in API (98.1k followers), Open Source (68.3k followers), Artificial Intelligence (466.8k followers) and GitHub (41.2k followers) on Product Hunt. Together, these topics include over 128.8k products, making this a competitive space to launch in.
Who hunted MiMo-V2.5 Voice?
MiMo-V2.5 Voice was hunted by Rohan Chaubey and Kumar Abhishek. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
Want to see how MiMo-V2.5 Voice stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.
Whisper changed what people expected from open-source ASR. Three years later, the leaderboard looks very different.
What it is: MiMo-V2.5-ASR is an 8B open-source speech recognition model from Xiaomi MiMo, MIT-licensed and available on HuggingFace, built for bilingual Chinese-English transcription across dialects, noisy audio, code-switched speech, and song lyrics.
The problem: most ASR models are benchmarked on clean studio data and deployed into the real world, where audio is noisy, speakers overlap, and people switch languages mid-sentence. The gap between benchmark accuracy and production accuracy is where voice products quietly fail.
The solution: staged training combining large-scale mid-training, supervised fine-tuning, and a reinforcement learning algorithm specifically targeting the scenarios where conventional models break down. Native punctuation from prosody means transcripts arrive ready to use.
What makes it different: on the Open ASR Leaderboard, MiMo-V2.5-ASR posts 5.73% average WER on English, below Whisper large-v3 at 7.44%. On Wu dialect it scores 19.55% vs FunASR-1.5 at 29.08%. On lyrics, 3.95% on m4singer vs Gemini 2.5 Pro at 4.25%. These are not cherry-picked scenarios — they are the hard ones.
Key features:
Eight Chinese dialects natively supported, including Wu, Cantonese, Hokkien, Sichuanese
Chinese-English code-switching with no language tags
Lyrics transcription under accompaniment and pitch variation
Multi-speaker and noisy environment robustness
Native punctuation, no post-processing needed
MIT license, Python API, Gradio demo, self-hostable
Benefits:
Production-grade accuracy on the audio conditions that actually exist in the field
One model replaces multiple regional or domain-specific ASR solutions
Self-hosting eliminates per-call API costs and keeps data on your infra
Ready-to-use punctuated output cuts one step from every downstream pipeline
Who it's for: ML engineers and voice product teams building bilingual or Chinese-language transcription pipelines who need accuracy that holds up outside the lab.
Open-source ASR has been catching up to closed models for years. MiMo-V2.5-ASR is a data point that the gap is now very small, and in some scenarios gone.