Microsoft's most expressive TTS model yet — voice cloning from short samples, fine-grained emotional control, and consistent voice identity across 15 languages. Now live in Azure AI Foundry at $22 per million characters, with integrations rolling out in VSCode, Dynamics 365 Contact Center, and Teams. For builders shipping voice agents who need production-grade prosody without the OpenAI Realtime API price tag.
I build voice agents for service businesses — mostly healthcare and home services — and the #1 unsolved problem in this space is prosody. The "is this a robot?" moment usually happens in the first 8 seconds of a call.
MAI-Voice-2 is the first TTS I've A/B tested where my pilot users couldn't tell. The $22/M chars pricing lands below ElevenLabs and matches gpt-realtime's TTS layer.
If you're shipping voice and wedded to OpenAI Realtime, worth running the side-by-side. Curious if Microsoft is planning sub-200ms first-token latency via WebRTC streaming next.
Incredible that these voice models are becoming indistinguishable from real human voices. I was wondering if there are any benchmarks or detailed testing that was explored on the complexity of quesitons that the models can answer? This gap has been a major challenge for me to adopt AI voice agents that take on the role of customer support without assistance, but curious on how this is evolving.
The consistent voice identity across 15 languages is what stands out to me here. I work on a voice companion that calls aging parents every day, and a lot of our families are immigrants whose parents are most at ease in their first language. A warm, familiar voice that holds up in Tagalog or Mandarin is often the difference between a call someone looks forward to and one they let ring out. Question for the team: how stable is the cloned identity and emotional control over a full 10-minute conversation, or does the prosody drift toward neutral as the session runs longer?
About Microsoft MAI-Voice-2 on Product Hunt
“Expressive TTS with voice cloning in 15 languages”
Microsoft MAI-Voice-2 launched on Product Hunt on June 5th, 2026 and earned 106 upvotes and 5 comments, placing #13 on the daily leaderboard. Microsoft's most expressive TTS model yet — voice cloning from short samples, fine-grained emotional control, and consistent voice identity across 15 languages. Now live in Azure AI Foundry at $22 per million characters, with integrations rolling out in VSCode, Dynamics 365 Contact Center, and Teams. For builders shipping voice agents who need production-grade prosody without the OpenAI Realtime API price tag.
Microsoft MAI-Voice-2 was featured in Productivity (653.8k followers), Developer Tools (514k followers) and Artificial Intelligence (471k followers) on Product Hunt. Together, these topics include over 311.7k products, making this a competitive space to launch in.
Who hunted Microsoft MAI-Voice-2?
Microsoft MAI-Voice-2 was hunted by Habib Ferdous. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
Want to see how Microsoft MAI-Voice-2 stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.