Microsoft MAI-Voice-2
Expressive TTS with voice cloning in 15 languages
Productivity
Developer Tools
Artificial Intelligence
Visit Website See on Product Hunt Facebook ⧉Instagram ⧉Twitter ⧉

Upvotes106

▲ 106View on ProductHunt ⧉

Comments5

5 commentsSee comments on PH ⧉

Featured onJune 5th, 2026

Hunted by

Habib Ferdous

Microsoft's most expressive TTS model yet — voice cloning from short samples, fine-grained emotional control, and consistent voice identity across 15 languages. Now live in Azure AI Foundry at $22 per million characters, with integrations rolling out in VSCode, Dynamics 365 Contact Center, and Teams. For builders shipping voice agents who need production-grade prosody without the OpenAI Realtime API price tag.

Top comment

Upvotes106

▲ 106View on ProductHunt ⧉

Comments5

5 commentsSee comments on PH ⧉

Product of the Day13rd

I build voice agents for service businesses — mostly healthcare and home services — and the #1 unsolved problem in this space is prosody. The "is this a robot?" moment usually happens in the first 8 seconds of a call. MAI-Voice-2 is the first TTS I've A/B tested where my pilot users couldn't tell. The $22/M chars pricing lands below ElevenLabs and matches gpt-realtime's TTS layer. If you're shipping voice and wedded to OpenAI Realtime, worth running the side-by-side. Curious if Microsoft is planning sub-200ms first-token latency via WebRTC streaming next.

Comment highlights

Incredible that these voice models are becoming indistinguishable from real human voices. I was wondering if there are any benchmarks or detailed testing that was explored on the complexity of quesitons that the models can answer? This gap has been a major challenge for me to adopt AI voice agents that take on the role of customer support without assistance, but curious on how this is evolving.

The consistent voice identity across 15 languages is what stands out to me here. I work on a voice companion that calls aging parents every day, and a lot of our families are immigrants whose parents are most at ease in their first language. A warm, familiar voice that holds up in Tagalog or Mandarin is often the difference between a call someone looks forward to and one they let ring out. Question for the team: how stable is the cloned identity and emotional control over a full 10-minute conversation, or does the prosody drift toward neutral as the session runs longer?

About Microsoft MAI-Voice-2 on Product Hunt

“Expressive TTS with voice cloning in 15 languages”

Microsoft MAI-Voice-2 launched on Product Hunt on June 5th, 2026 and earned 106 upvotes and 5 comments, placing #13 on the daily leaderboard. Microsoft's most expressive TTS model yet — voice cloning from short samples, fine-grained emotional control, and consistent voice identity across 15 languages. Now live in Azure AI Foundry at $22 per million characters, with integrations rolling out in VSCode, Dynamics 365 Contact Center, and Teams. For builders shipping voice agents who need production-grade prosody without the OpenAI Realtime API price tag.

Microsoft MAI-Voice-2 was featured in Productivity (655.7k followers), Developer Tools (515.5k followers) and Artificial Intelligence (473.2k followers) on Product Hunt. Together, these topics include over 325.9k products, making this a competitive space to launch in.

Who hunted Microsoft MAI-Voice-2?

Microsoft MAI-Voice-2 was hunted by Habib Ferdous. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.

Want to see how Microsoft MAI-Voice-2 stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.