Product Thumbnail

MAI-Transcribe-1

Production ASR for noisy multilingual audio

API
Artificial Intelligence
Audio

MAI-Transcribe-1 is Microsoft’s new multilingual speech-to-text model built for real-world audio. It delivers best-in-class accuracy across 25 languages, strong robustness in noisy environments, faster batch transcription, and pricing aimed at production speech workflows.

Top comment

Hi everyone! Benchmarks are only part of the story for ASR. In real products like voice agents, meeting transcription, and call center analytics, audio is rarely clean — and MAI-Transcribe-1 is clearly built for that reality. MS is positioning it around three things that actually matter in production: best-in-class accuracy across 25 languages, strong robustness to noisy real-world audio, and much better price-performance at $0.36 per hour of audio. On top of that, they say batch transcription is 2.5x faster than their current Azure Fast offering. An ASR that actually outperforms models like Scribe v2 and Whisper-large-v3... definitely seems worth testing out in a real integration. 🤔

Comment highlights

I run Whisper in prod for a voice input thing — accents and background noise break it constantly. If this actually handles noisy multilingual audio better, that alone is worth switching. $0.36/hr is solid too. Gonna try it this week.

Multilingual ASR is a hard problem — especially for noisy audio. We deal with this at NexClip AI too, where accurate timestamps on every word are critical for topic-based video editing. Curious how MAI-Transcribe-1 handles word-level timestamp accuracy across languages?

I need to try ASR and it's perfect for me. Thanks Zac for hunting it! Feel I gonna love it

Congrats on the launch! 👏

Also launching today — curious, what worked best for you to get your first users?