Product upvotes vs the next 3

Waiting for data. Loading

Product comments vs the next 3

Waiting for data. Loading

Product upvote speed vs the next 3

Waiting for data. Loading

Product upvotes and comments

Waiting for data. Loading

Product vs the next 3

Loading

Sesame

Conversational speech model that achieves voice presence

Sesame's Conversational Speech Model (CSM) creates AI voices that go beyond text-to-speech, aiming for truly natural and engaging conversations.

Top comment

Hi everyone!

Sharing Sesame's Conversational Speech Model (CSM), and this is a big step beyond typical text-to-speech. The goal is to achieve what Sesame calls "voice presence": making spoken interactions feel real, understood, and valued.

A PH version of this model System Card is :)

😃 Emotional Context: It tries to understand and respond to the emotion in the conversation.
⏱️ Conversational Dynamics: It aims for natural timing, pauses, and intonation.
🧠 Contextual Awareness: It adapts its tone and style to the situation.
👤 Consistent Personality: It maintains coherence.
👂 Multimodal: It understands both text and audio input.
🗣️ End-to-End: It generates speech directly, in a single stage, for greater efficiency.
🔓 Open Source: Models will be released under Apache 2.0 License.

They've built a custom evaluation suite to measure these conversational aspects, because traditional metrics (like Word Error Rate) don't really capture how natural the speech sounds.

The model itself is based on the Llama architecture, but with a clever split-transformer design.

You can try a demo to experience the conversational voice (It's magical, believe me)

Hunting credits to @sentry_co 🙌