Product Thumbnail

MiMo-V2.5 Voice

Bilingual ASR for dialects, code-switching, and songs

API
Open Source
Artificial Intelligence
GitHub

Hunted byRohan ChaubeyRohan Chaubey,Kumar AbhishekKumar Abhishek

Product upvotes vs the next 3

Waiting for data. Loading

Product comments vs the next 3

Waiting for data. Loading

Product upvote speed vs the next 3

Waiting for data. Loading

Product upvotes and comments

Waiting for data. Loading

Product vs the next 3

Loading

MiMo-V2.5 Voice

Bilingual ASR for dialects, code-switching, and songs

MiMo-V2.5-ASR is an 8B open-source speech recognition model from Xiaomi that transcribes Mandarin, English, eight Chinese dialects, code-switched speech, and song lyrics. Built for ML engineers, researchers, and developers building real-world voice applications.

Top comment

Whisper changed what people expected from open-source ASR. Three years later, the leaderboard looks very different.

What it is: MiMo-V2.5-ASR is an 8B open-source speech recognition model from Xiaomi MiMo, MIT-licensed and available on HuggingFace, built for bilingual Chinese-English transcription across dialects, noisy audio, code-switched speech, and song lyrics.

The problem: most ASR models are benchmarked on clean studio data and deployed into the real world, where audio is noisy, speakers overlap, and people switch languages mid-sentence. The gap between benchmark accuracy and production accuracy is where voice products quietly fail.

The solution: staged training combining large-scale mid-training, supervised fine-tuning, and a reinforcement learning algorithm specifically targeting the scenarios where conventional models break down. Native punctuation from prosody means transcripts arrive ready to use.

What makes it different: on the Open ASR Leaderboard, MiMo-V2.5-ASR posts 5.73% average WER on English, below Whisper large-v3 at 7.44%. On Wu dialect it scores 19.55% vs FunASR-1.5 at 29.08%. On lyrics, 3.95% on m4singer vs Gemini 2.5 Pro at 4.25%. These are not cherry-picked scenarios — they are the hard ones.

Key features:

  • Eight Chinese dialects natively supported, including Wu, Cantonese, Hokkien, Sichuanese

  • Chinese-English code-switching with no language tags

  • Lyrics transcription under accompaniment and pitch variation

  • Multi-speaker and noisy environment robustness

  • Native punctuation, no post-processing needed

  • MIT license, Python API, Gradio demo, self-hostable

Benefits:

  • Production-grade accuracy on the audio conditions that actually exist in the field

  • One model replaces multiple regional or domain-specific ASR solutions

  • Self-hosting eliminates per-call API costs and keeps data on your infra

  • Ready-to-use punctuated output cuts one step from every downstream pipeline

Who it's for: ML engineers and voice product teams building bilingual or Chinese-language transcription pipelines who need accuracy that holds up outside the lab.

Open-source ASR has been catching up to closed models for years. MiMo-V2.5-ASR is a data point that the gap is now very small, and in some scenarios gone.

About MiMo-V2.5 Voice on Product Hunt

Bilingual ASR for dialects, code-switching, and songs

MiMo-V2.5 Voice launched on Product Hunt on April 25th, 2026 and earned 110 upvotes and 1 comments, placing #6 on the daily leaderboard. MiMo-V2.5-ASR is an 8B open-source speech recognition model from Xiaomi that transcribes Mandarin, English, eight Chinese dialects, code-switched speech, and song lyrics. Built for ML engineers, researchers, and developers building real-world voice applications.

On the analytics side, MiMo-V2.5 Voice competes within API, Open Source, Artificial Intelligence and GitHub — topics that collectively have 674.4k followers on Product Hunt. The dashboard above tracks how MiMo-V2.5 Voice performed against the three products that launched closest to it on the same day.

Who hunted MiMo-V2.5 Voice?

MiMo-V2.5 Voice was hunted by Rohan Chaubey and Kumar Abhishek. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.

Reviews

MiMo-V2.5 Voice has received 1 review on Product Hunt with an average rating of 5.00/5. Read all reviews on Product Hunt.

For a complete overview of MiMo-V2.5 Voice including community comment highlights and product details, visit the product overview.