MiMo-V2.5 Voice
Bilingual ASR for dialects, code-switching, and songs
API
Open Source
Artificial Intelligence
GitHub

Upvotes117

▲ 117View on ProductHunt ⧉

Comments3

3 commentsSee comments on PH ⧉

Featured onApril 25th, 2026

Hunted by

Rohan Chaubey,

Kumar Abhishek

Page AI

The most advanced AI website builder • Sponsored

Try now ⧉

Product upvotes vs the next 3

Waiting for data. Loading

Product comments vs the next 3

Waiting for data. Loading

Product upvote speed vs the next 3

Waiting for data. Loading

Product upvotes and comments

Waiting for data. Loading

Product vs the next 3

MiMo-V2.5 Voice

Bilingual ASR for dialects, code-switching, and songs

MiMo-V2.5-ASR is an 8B open-source speech recognition model from Xiaomi that transcribes Mandarin, English, eight Chinese dialects, code-switched speech, and song lyrics. Built for ML engineers, researchers, and developers building real-world voice applications.

Top comment

Upvotes117

▲ 117View on ProductHunt ⧉

Comments3

3 commentsSee comments on PH ⧉

Product of the Day7th

Whisper changed what people expected from open-source ASR. Three years later, the leaderboard looks very different.
What it is: MiMo-V2.5-ASR is an 8B open-source speech recognition model from Xiaomi MiMo, MIT-licensed and available on HuggingFace, built for bilingual Chinese-English transcription across dialects, noisy audio, code-switched speech, and song lyrics.
The problem: most ASR models are benchmarked on clean studio data and deployed into the real world, where audio is noisy, speakers overlap, and people switch languages mid-sentence. The gap between benchmark accuracy and production accuracy is where voice products quietly fail.
The solution: staged training combining large-scale mid-training, supervised fine-tuning, and a reinforcement learning algorithm specifically targeting the scenarios where conventional models break down. Native punctuation from prosody means transcripts arrive ready to use.
What makes it different: on the Open ASR Leaderboard, MiMo-V2.5-ASR posts 5.73% average WER on English, below Whisper large-v3 at 7.44%. On Wu dialect it scores 19.55% vs FunASR-1.5 at 29.08%. On lyrics, 3.95% on m4singer vs Gemini 2.5 Pro at 4.25%. These are not cherry-picked scenarios — they are the hard ones.
Key features:
Eight Chinese dialects natively supported, including Wu, Cantonese, Hokkien, Sichuanese
Chinese-English code-switching with no language tags
Lyrics transcription under accompaniment and pitch variation
Multi-speaker and noisy environment robustness
Native punctuation, no post-processing needed
MIT license, Python API, Gradio demo, self-hostable
Benefits:
Production-grade accuracy on the audio conditions that actually exist in the field
One model replaces multiple regional or domain-specific ASR solutions
Self-hosting eliminates per-call API costs and keeps data on your infra
Ready-to-use punctuated output cuts one step from every downstream pipeline
Who it's for: ML engineers and voice product teams building bilingual or Chinese-language transcription pipelines who need accuracy that holds up outside the lab.
Open-source ASR has been catching up to closed models for years. MiMo-V2.5-ASR is a data point that the gap is now very small, and in some scenarios gone.

About MiMo-V2.5 Voice on Product Hunt

“Bilingual ASR for dialects, code-switching, and songs”

MiMo-V2.5 Voice launched on Product Hunt on April 25th, 2026 and earned 117 upvotes and 3 comments, placing #7 on the daily leaderboard. MiMo-V2.5-ASR is an 8B open-source speech recognition model from Xiaomi that transcribes Mandarin, English, eight Chinese dialects, code-switched speech, and song lyrics. Built for ML engineers, researchers, and developers building real-world voice applications.

On the analytics side, MiMo-V2.5 Voice competes within API, Open Source, Artificial Intelligence and GitHub — topics that collectively have 681.4k followers on Product Hunt. The dashboard above tracks how MiMo-V2.5 Voice performed against the three products that launched closest to it on the same day.

Who hunted MiMo-V2.5 Voice?

MiMo-V2.5 Voice was hunted by Rohan Chaubey and Kumar Abhishek. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.

Reviews

MiMo-V2.5 Voice has received 1 review on Product Hunt with an average rating of 5.00/5. Read all reviews on Product Hunt.

For a complete overview of MiMo-V2.5 Voice including community comment highlights and product details, visit the product overview.

MiMo-V2.5 VoiceBilingual ASR for dialects, code-switching, and songsAPIOpen SourceArtificial IntelligenceGitHub

Product upvotes and comments

Product vs the next 3

Top comment

About MiMo-V2.5 Voice on Product Hunt

Who hunted MiMo-V2.5 Voice?

Reviews

MiMo-V2.5 Voice
Bilingual ASR for dialects, code-switching, and songs
API
Open Source
Artificial Intelligence
GitHub