We've open-sourced Fish Audio S2, a new generation of expressive TTS that lets you direct voices with natural language. Add cues like [whisper] or [laughing nervously], generate multi-speaker dialogue in one pass, and create scary-real voices across 80+ languages.
This is a big step beyond S1, redefining expressive voice AI. Write emotion cues anywhere in the text and hear the speech flow exactly how [emphasis] YOU direct it.
Pricing itself makes a huge difference compared to competitors. And the quality is on par with most of "high end" TTS models
But unfortunately what i don’t like is you asking for data and a subscription before me as a user tryed your voices. Better UX would be to at least let the user try to generate one custom voice to prove its power and magic - so the thing you’re selling on your marketing page … except that it looks very promising ✨
Fish Audio has outstanding technical strength. Their voice synthesis is natural, expressive, and highly stable, showing both strong research capability and excellent engineering execution.
The focus on emotion and nuance in TTS is really interesting. A lot of voice models sound technically good but still feel a bit flat, so the idea of capturing rhythm and speaking habits is compelling.
Also impressive that voice cloning works with just ~10 seconds of audio. Curious how you’re handling consent and voice ownership safeguards as this gets adopted more widely?
As someone who used to lead a team that created dozens of voice overs for different market, these tools are a game-changer.
This is a big unlock for anyone building voice-driven products. Directing voices with natural language cues like [whisper] or [laughing nervously] instead of fiddling with sliders is so much more intuitive. Love that it's open source too. What languages are you seeing the most community demand for?
Just found fish audio this year and was surprised about the API and the S1 model. Well, the S2 is now absolutely mind-blowing. Great work!
Really cool how fast it can be to clone my voice. Should I be giving it multiple recordings at different emotions so that it has a better register of what I sound like?
What's the basis for the tonation or emphasis? Congrats on the launch, @hehe6z!
As a content creator - I've been looking for a product like this for a long time! Hope it'll match my expectations.
How does Fish Audio maintain consistent emotional prosody and rhythmic nuance across long-form content, and what specific architectural improvements over So-VITS-SVC allow for such high-fidelity cloning from only 10 seconds of source audio?
About Fish Audio S2 on Product Hunt
“Real Expressive AI Voices”
Fish Audio S2 launched on Product Hunt on March 10th, 2026 and earned 345 upvotes and 55 comments, placing #5 on the daily leaderboard. We've open-sourced Fish Audio S2, a new generation of expressive TTS that lets you direct voices with natural language. Add cues like [whisper] or [laughing nervously], generate multi-speaker dialogue in one pass, and create scary-real voices across 80+ languages.
Fish Audio S2 was featured in Open Source (68.3k followers), Artificial Intelligence (466.1k followers), GitHub (41.2k followers) and Audio (2k followers) on Product Hunt. Together, these topics include over 119.9k products, making this a competitive space to launch in.
Who hunted Fish Audio S2?
Fish Audio S2 was hunted by Kevin William David. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
Want to see how Fish Audio S2 stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.
Hi our beloved PH!
[excited] [slightly nervous]
Today we’re launching Fish Audio S2, our new text-to-speech model.
[long pause]
Hear Fish S2 Read This!
This is a big step beyond S1, redefining expressive voice AI. Write emotion cues anywhere in the text and hear the speech flow exactly how [emphasis] YOU direct it.
And, [inhale] we’re open-sourcing all of it.
GitHub: https://github.com/fishaudio/fish-speech/
HuggingFace: https://huggingface.co/fishaudio/s2-pro/
Shout out to SGLang for powering our stack.
There’s much more to S2.
Try it yourself now: https://fish.audio/s2/
As always, we want to give back to the community. For the launch, we’re offering free generation credits and an exclusive 50% OFF promo code: PH-FishS2
Go build weird things with it :)
We’d love to hear what you make.