Chatterbox Turbo is a 350M parameter open-source TTS model. It features paralinguistic tags (control laughs, sighs, etc.), zero-shot cloning, and runs 6x faster than real-time. Uniquely includes built-in PerTh watermarking for safety.
This is a really generous release from the Resemble AI team. The "paralinguistic tags" feature is super interesting: being able to simply type [laugh] or [sigh] to control the emotion is a very practical touch for getting natural results.
I also really appreciate that it includes the PerTh watermarking by default. It is rare to see safety features baked directly into an MIT-licensed model like this.
Fast, expressive, and traceable. This model has huge potential in the open source TTS space.
Very interesting! Which languages are supported? If I provide a sample in one language, can I copy my voice and have the service read something in another language using my voice?
This is amazing, man! Audio editing is usually a pain. If this actually simplifies it, that’s a big win.
Neat. I do light VO/podcast stuff and the speech-to-speech + quick edits are what I care about. Zero-shot clone for pickups sounds handy. Big plus on watermarking + detection—feels safer. Curious how natural the laughs/sighs controls come out.
Hi everyone!
This is a really generous release from the Resemble AI team. The "paralinguistic tags" feature is super interesting: being able to simply type [laugh] or [sigh] to control the emotion is a very practical touch for getting natural results.
I also really appreciate that it includes the PerTh watermarking by default. It is rare to see safety features baked directly into an MIT-licensed model like this.
Fast, expressive, and traceable. This model has huge potential in the open source TTS space.