Introducing Lightning V3 — Smallest AI's most advanced text-to-speech model. With 100ms latency, a 3.89 WVMOS score, and support for English, Hindi, Spanish, Tamil and 15+ languages, V3 was preferred over OpenAI's GPT-4o-mini-TTS by listeners 76.2% of the time. Get audio output in 44.1 kHz and powers voice assistants, IVR systems, content creation and conversational AI with human-like speech. Instant voice cloning from just 10 seconds of audio. Real-time. Expressive. Enterprise-ready.
Lightning V3 delivers 100ms latency at 20 concurrent requests. That's real-time voice AI that actually scales. In blind listening tests, listeners preferred it over OpenAI's GPT-4o-mini-TTS 76.2% of the time, with a WVMOS score of 3.98.
But speed means nothing if it sounds robotic. Lightning V3 scores 3.33/5 on intonation and 3.07/5 on prosody; meaning it doesn't just read text, it speaks with natural rhythm, pauses, and expression. The kind of voice your users won't realize is AI.
It supports 15+ languages with more being added regularly (Indic & European languages included). It handles voice cloning from just 5-15 seconds of audio, and gives you flexible streaming via HTTP, SSE, or WebSocket. Whatever fits your stack.
We built this for developers shipping voice assistants, conversational AI, IVR systems, customer support bots, and anything that needs immediate, human-sounding voice feedback. Whether you're a solo builder or an enterprise team, the API is simple and the docs are solid.
We've been heads down on this for a while and we're genuinely proud of where V3 lands. Would love for you to try it and tell us what you think!
How well does it handle code-switching? Like mixing Hindi + English in the same sentence?
Super cool. Any plans for regional accents within languages (like Indian English vs US English)? I can use it for my SaaS tutorials.
Does it support emotion control via API? Like being able to dial up/down expressiveness depending on use case?
76.2% preference over GPT-4o-mini-TTS is impressive. Would love to know how big the test group was and what kind of prompts were used??
This is actually very cool. 100ms latency coupled with decent prosody is kinda the holy grail for voice agents.
Hi Guys, do you have a published comparison vs eleven for conversational use-cases?
Interesting positioning. A lot of TTS products talk about sounding natural, but for voice agents the latency piece is just as important as the voice quality itself.
100ms is the part that really caught my attention here. How much of that performance holds up in real production settings once people add full conversational pipelines around it?
TTS for voice agents has a different bar than TTS for content - it's not just naturalness, it's latency under real conditions. An agent that pauses 800ms before responding feels broken even if the audio quality is great. Curious how Lightning V3 handles the tradeoff between quality and time-to-first-audio in streaming mode.
Huge congrats team!! 🚀 voice AI that actually sounds human is still rare tbh gonna test this later today, been looking for smth like this for a side project
Preety cool... How you guys are balancing low latency vs prosody quality, since expressive speech usually needs more context?
I used the voice clone feature! I am able to use the realistic voices in videos that I create loving the experience.
Yo, been following the TTS space closely while building voice agents, and Lightning V3 genuinely surprised me. Getting real-time performance and natural prosody in the same model has always felt like a trade-off, this is the first time it hasn't. The multilingual support is a big deal for me as well. Congrats on the launch.
Hey Product Hunt!
Lightning V3 delivers 100ms latency at 20 concurrent requests. That's real-time voice AI that actually scales. In blind listening tests, listeners preferred it over OpenAI's GPT-4o-mini-TTS 76.2% of the time, with a WVMOS score of 3.98.
But speed means nothing if it sounds robotic. Lightning V3 scores 3.33/5 on intonation and 3.07/5 on prosody; meaning it doesn't just read text, it speaks with natural rhythm, pauses, and expression. The kind of voice your users won't realize is AI.
It supports 15+ languages with more being added regularly (Indic & European languages included). It handles voice cloning from just 5-15 seconds of audio, and gives you flexible streaming via HTTP, SSE, or WebSocket. Whatever fits your stack.
We built this for developers shipping voice assistants, conversational AI, IVR systems, customer support bots, and anything that needs immediate, human-sounding voice feedback. Whether you're a solo builder or an enterprise team, the API is simple and the docs are solid.
We've been heads down on this for a while and we're genuinely proud of where V3 lands. Would love for you to try it and tell us what you think!