New OpenAI audio models for developers: gpt-4o powered speech-to-text (more accurate than Whisper) and steerable text-to-speech. Build voice agents, transcriptions, and more.
Voice is the future, and OpenAI's new audio models are accelerating that shift! They've just launched three new models in their API:
🎤 gpt-4o-transcribe & gpt-4o-mini-transcribe (STT): Beating Whisper on accuracy, even in noisy environments. Great for call centers, meeting transcription, and more.
🗣️ gpt-4o-mini-tts (TTS): This is the game-changer. Steerable voice output – you control the style and tone! Think truly personalized voice agents.
🛠️ Easy Integration: Works with the OpenAI API and Agents SDK, supporting both speech-to-speech and chained development.
Experience the steerable TTS for yourself: OpenAI.fm
Is there any other products that outperform openAI’s? I.e. does Elevenlab do a greater job?
I like the sound. I listened to the article at 1.5 speed, sometimes it seemed like the pronunciation was slowing down, sometimes it was speeding up. I would like to see 1.25 playback speed in the future, but even so it is already quite pleasant!)
The first thought I had when I saw this was "This is HUGE!". Steerable TTS is a game changer and the improvement in STT accuracy is fantastic.
The alloy and shimmer voices always sounded 10x better than the others. And tbh. Having tried 11labs a lot. Alloy and Shimmer is the bar to beat. Love the testing UX on openai.fm tho. Used to be only able to test these voices in open-ai's internal playground dashboard.