SAM Audio is a unified model that separates any sound from any source. Use text ("dog barking"), visual clicks on video, or time spans to isolate specific audio. It unifies speech, music, and sound effect separation into one promptable model.
To be honest, when I first saw this, I didn't think much of it. But after looking closer... SAM Audio is absolutely mind-blowing.
Attention to all makers building audio-related products: Do not ignore this model.
Just like the original SAM changed image segmentation forever, SAM Audio breaks the "fragmented" world of audio processing.
Old Way: You needed separate tools for noise reduction, vocal isolation, and speaker diarization. It was a mess of "signal processing."
The SAM Way: It understands Semantic Intent. You don't filter frequencies, you just tell it what you want. -> "Isolate the guitar" (Text Prompt) -> Click on the car in the video (Visual Prompt) -> Select this specific timestamp (Span Prompt)
It basically shifts audio editing from "engineering" to "describing." And since the inference is pretty fast, the engineering potential here is massive.
P.S. Checked the license—commercial use allowed!✌️
Wow, Meta sounds amazing! The SAM Audio feature is seriously impressive - being able to isolate specific sounds like that is wild. Curious, how well does it handle overlapping sound events when separating audio in real-time?
This really nails it. It makes audio work feel far more approachable. As an engineer and creators in it will be quite helpful for me. Excited to see how people actually use this day to day.
That's pretty impressive! 🚀
I think there'll be a lot of new SaaS-s build around controlling this new model. As well as it being integrating into existing video editing software. Looking forward to that!
About SAM Audio on Product Hunt
“Segment any sound with text, visual, or time prompts”
SAM Audio launched on Product Hunt on December 19th, 2025 and earned 158 upvotes and 4 comments, placing #5 on the daily leaderboard. SAM Audio is a unified model that separates any sound from any source. Use text ("dog barking"), visual clicks on video, or time spans to isolate specific audio. It unifies speech, music, and sound effect separation into one promptable model.
SAM Audio was featured in Open Source (68.3k followers), Artificial Intelligence (466.2k followers) and Audio (2k followers) on Product Hunt. Together, these topics include over 100.9k products, making this a competitive space to launch in.
Who hunted SAM Audio?
SAM Audio was hunted by Zac Zuo. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
Want to see how SAM Audio stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.
Hi everyone!
To be honest, when I first saw this, I didn't think much of it. But after looking closer... SAM Audio is absolutely mind-blowing.
Attention to all makers building audio-related products: Do not ignore this model.
Just like the original SAM changed image segmentation forever, SAM Audio breaks the "fragmented" world of audio processing.
Old Way: You needed separate tools for noise reduction, vocal isolation, and speaker diarization. It was a mess of "signal processing."
The SAM Way: It understands Semantic Intent. You don't filter frequencies, you just tell it what you want.
-> "Isolate the guitar" (Text Prompt)
-> Click on the car in the video (Visual Prompt)
-> Select this specific timestamp (Span Prompt)
It basically shifts audio editing from "engineering" to "describing." And since the inference is pretty fast, the engineering potential here is massive.
P.S. Checked the license—commercial use allowed!✌️