Universal-Streaming delivers all the streaming speech-to-text voice agents need in one robust API: ultra-fast immutable transcripts, higher accuracy, built-in endpointing, and transparent pricing at $0.15/hour with unlimited concurrency.
Six months ago, we launched Universal-2 to tackle last-mile accuracy in speech recognition—today, we’re excited to introduce Universal-Streaming, our purpose-built, real-time speech-to-text model designed specifically for voice agents.
Universal-Streaming isn’t just fast—it’s a game-changer for real-time apps:
⚡ ~300ms immutable transcripts with no partial/final tradeoff
🧠 Intelligent endpointing that smooths out awkward pauses and interruptions
🔒 Accurate on the tokens that matter—emails, codes, names
🌎 Unlimited concurrency at just $0.15/hr with no surprise fees
And it’s not just about the benchmarks—developers building real-world voice agents are already seeing the difference: more natural interactions, higher task completion, and easier scaling from 5 to 50,000+ concurrent users.
Whether you're shipping AI assistants, realtime transcription tools, or something entirely new—Universal-Streaming gives you the power to build voice products your users will actually love to talk to.
We can’t wait to see what you build—try it out and tell us what you think!