MAI-Transcribe-1 is Microsoft’s new multilingual speech-to-text model built for real-world audio. It delivers best-in-class accuracy across 25 languages, strong robustness in noisy environments, faster batch transcription, and pricing aimed at production speech workflows.
Hi everyone!
Benchmarks are only part of the story for ASR. In real products like voice agents, meeting transcription, and call center analytics, audio is rarely clean — and MAI-Transcribe-1 is clearly built for that reality.
MS is positioning it around three things that actually matter in production: best-in-class accuracy across 25 languages, strong robustness to noisy real-world audio, and much better price-performance at $0.36 per hour of audio. On top of that, they say batch transcription is 2.5x faster than their current Azure Fast offering.
An ASR that actually outperforms models like Scribe v2 and Whisper-large-v3... definitely seems worth testing out in a real integration. 🤔
I run Whisper in prod for a voice input thing — accents and background noise break it constantly. If this actually handles noisy multilingual audio better, that alone is worth switching. $0.36/hr is solid too. Gonna try it this week.
Multilingual ASR is a hard problem — especially for noisy audio. We deal with this at NexClip AI too, where accurate timestamps on every word are critical for topic-based video editing. Curious how MAI-Transcribe-1 handles word-level timestamp accuracy across languages?
I need to try ASR and it's perfect for me. Thanks Zac for hunting it! Feel I gonna love it
Congrats on the launch! 👏
Also launching today — curious, what worked best for you to get your first users?