Fish Audio S1 is the most expressive and emotionally rich TTS model—creating lifelike voices that capture emotion, rhythm, and nuance. Clone any voice in 10 seconds, preserving accent, tone, and speaking habits with unmatched realism.
Fish Audio lowers the barrier to entry for individual creators and developers.
Affordable: For many users, its biggest advantage is its price. Reports indicate that its service costs about one-sixth of ElevenLabs, making professional-grade voice cloning more affordable.
Open Source Heritage and Accessibility: The Fish Audio team has a strong open source background, having created popular projects like Bert-VITS2. They have also open-sourced a smaller model, the S1-mini, on GitHub and Hugging Face for community experimentation and contribution. I've been using it extensively for three months now and it's a very useful product, especially for voice cloning!
good! I've tried every TTS model out there and have landed on Fish. Congratulations Helena on publicly launching! We've been using the API for months and it's been amazing.
good! I've tried every TTS model out there and have landed on Fish. Congratulations Helena on publicly launching! We've been using the API for months and it's been amazing.
Other major feature that's working exceptionally well is the voice cloning. It's enabled a whole new world of use cases that wasn't possible before.
good! I've tried every TTS model out there and have landed on Fish. Congratulations Helena on publicly launching! We've been using the API for months and it's been amazing.
What stood out the most to me during testing was how natural and expressive fish sounded vs everything else. It was a lot easier than I expected to get the emotion feeling right.
Other major feature that's working exceptionally well is the voice cloning. It's enabled a whole new world of use cases that wasn't possible before.
Really impressive demo — the emotion and expressiveness feel far closer to a real human than most TTS I’ve tried.
Really impressive demo — the emotion and expressiveness feel far closer to a real human than most TTS I’ve tried. Curious: how do you handle consent and IP when cloning voices from short samples
Really impressive demo — the emotion and expressiveness feel far closer to a real human than most TTS I’ve tried. Curious: how do you handle consent and IP when cloning voices from short samples (10s)
So excited to see a more affordable and powerful solution in the field of TTS, which is both imaginative and practical. Congratulations on the launch!
The voice quality is insanely realistic. Can’t believe it only needs 10 seconds of input!
The voice quality is insanely realistic. Can’t believe it only needs 10 seconds of input!🔥
🎧 Just gave Fish Audio a spin — and wow, the emotional depth is next level. I’ve played around with other TTS tools before, but they often fall flat when it comes to tone and expressiveness. This one? Feels like it gets the soul of the voice. 😮💨
I’m especially impressed by the 10-second voice cloning — tested it with a friend’s audio snippet, and the result was uncanny.
Curious: how does it compare with commercial models like ElevenLabs in multilingual scenarios? Have you stress-tested accents or emotion transfer in other languages?
Massive props to the team behind So-VITS-SVC and Bert-VITS2 — open-source + this level of polish is rare. 🔥
I’m still learning how to use the website, so I’m not sure if you already have something like this in mind, but one feature I’d love to see implemented is the ability to record my own voice and use that as a reference. What I mean is, I’d like the model to capture the cadence, tone, and emotional range of my voice, while still keeping the generated voice intact unless I choose to fully replace it.
For example, if I wanted to add more emotion to a line—like sadness, excitement, or frustration—it would be great if the AI could analyze my sample and then mirror that same energy or inflection. Right now, some of the voices, even though they sound great, don’t always capture those subtle nuances or the emotional texture I’m aiming for.
It would also be helpful to have an option to upload a short voice sample, maybe a few seconds long, without having to go through complicated prompts or on-screen steps. That would make it much easier for people who can’t see what’s on the screen or find it hard to navigate the interface visually.
Ideally, the system could take that single clean sample and let me adjust the tone—like making it slightly higher, softer, or deeper—while maintaining the original emotional feel. Maybe there could also be a toggle or slider for mood, so if I wanted to sound calmer or more intimate, I could easily tweak that. Add the ability to get a female and mail version of the voices directly on the main interface.
And it’d be amazing if the system eventually allowed for accent customization too—like being able to choose between English, Scottish, or other regional accents while still reflecting my own speaking rhythm.
Basically, what I’m hoping for is a way to be a more active participant in the process—using my voice not just as text input but as an emotional guide for how I want the final result to sound.
I love Fish Audio! It has far better accuracy with voice cloning than Elevenlabs, and far less unnecessary censorship as well. If a Speech to Speech option was introduced, it would make things even better. Really love your service as an alternative to Play.HT which pulled the rug out from under its users in July. I use it to create my audio stories. Keep up the wonderful work! I can't wait to see what's next for Fish Audio.
Cool. We are the deep user of 11labs. But noticed it is not good at some particular language. Hope to see new tools. One quick question. How to avoid the illegal cloning ? Especially for some fraud guys
Good voice tech is usually extremely expensive. Nice to see a high-quality option that's actually affordable
Impressed by how expressive the voices are, the emotion sits in the pauses and timing
Impressed by how expressive the voices are, the emotion sits in the pauses and timing. I dropped a 10 second sample and it sounded surprisingly human, with little quirks that made it feel like a real person.
Voice tech is getting sooo good.
Side note: I'd recommend establishing a passphrase with your family. ;)