Hush
Open-source noise suppression for voice AI agents
Open Source
Developer Tools
Artificial Intelligence
GitHub
Visit Website See on Product Hunt Hugging Face ⧉LinkedIn ⧉Instagram ⧉Twitter ⧉Github ⧉

Upvotes186

▲ 186View on ProductHunt ⧉

Comments26

26 commentsSee comments on PH ⧉

Featured onJune 23rd, 2026

Hunted by

Hasan Ali

Hush removes competing voices, background noise, and audio interference from real-time calls so your voice AI agents always hear what matters.

Top comment

Upvotes186

▲ 186View on ProductHunt ⧉

Comments26

26 commentsSee comments on PH ⧉

Product of the Day6th

sub-1ms-per-frame on cpu is the easy number to benchmark — on real-time agent pipelines the harder thing is frame jitter compounding across the STT→LLM→TTS hop, where any lookahead the suppressor needs eats the latency budget you saved. gain-mask + deep filtering also has the target-speaker vs blind-separation question lurking when two voices overlap mid-utterance.

Comment highlights

Clean input audio is half the battle for voice agents and most teams underrate it. Open-sourcing it is generous. Will be poking at the repo. Congrats on the launch Atul!

Real-time noise suppression always involves tradeoffs - curious what the actual pipeline latency looks like end-to-end, not just model inference. WebRTC jitter buffers, chunking, and resampling all add overhead on top of the model itself, and for voice AI phone agents that budget is already tight with STT + LLM + TTS in the chain. Also wondering how it handles overlapping speakers mid-sentence vs. steady-state noise - that's usually where suppression models fall apart. How does it compare to what Deepgram or Twilio already offer natively in their voice pipelines?

This is exactly the kind of voice-agent infra where the test set matters more than the demo clip. I would love to see three numbers side by side: added latency per frame, word deletion rate for quiet primary speakers, and false retention when a second speaker is louder than the caller. The open-source angle is especially useful if teams can run the same stress clips before deploying it into live calls.

What’s the latency like in real time calls, and does it ever clip or distort the speaker’s voice?

Sub-ms matters because voice UX breaks when the audio path gets clever but slow. The edge case I would watch is the handoff between suppression and downstream turn detection; a clean stream is useful only if it preserves the timing signals.

Thanks everyone for the amazing support so far! We're excited to hear your thoughts and answer any questions you have. Your feedback will help shape the future of Hush.

Excited to test and use this in my ongoing peoject. The cpu only is a game changer. Thankyou for making this open source, I was searching, something like this!

Most noise suppression libraries are built for human listeners, where "good enough" means the person on the other end doesn't notice. For voice AI agents the bar is different because the model is doing ASR first, and artifacts that a human brain filters out can wreck transcription accuracy pretty badly. Curious whether Hush is tuned specifically for that ASR pipeline use case or whether it's general-purpose suppression you're applying upstream. Also wondering how it handles near-field keyboard noise and fan hum during long agent sessions, since those tend to be the consistent offenders in real deployments.

Hey everyone 👋 I'm the maker of Hush. Here's the story behind why we built it.
We build Voice AI at Weya. AI agents that handle live phone calls for businesses. And the #1 issue that kept breaking our pipeline wasn't the LLM, wasn't the TTS. It was background speech.
A caller phones in from a busy restaurant. Their colleague is talking next to them. A TV is blaring in the background. What happens? The background speaker's words get picked up, transcribed, and fed into the AI agent as if the caller said them. The entire conversation derails.
We tried every open-source noise cancellation model out there: DeepFilterNet3, RNNoise, SEGAN, MetricGAN+, DNS Challenge entrants. They all do a great job suppressing stationary noise (fans, traffic, HVAC). But none of them treat a competing human voice as a first-class problem. When the interference is another person speaking, speech looks like speech in every feature these models have learned. They either let it leak through, or they suppress both speakers and destroy intelligibility.
So we built Hush from scratch to fix exactly this.
What it does: Hush removes both background noise AND background speech from live audio, isolating only the primary speaker. It's an 8 MB model that runs fully on CPU in real time (<1 ms per 10 ms frame), at 16 kHz (native telephony sample rate).
How we did it: We extended DeepFilterNet3 with one targeted change: teaching the encoder to distinguish speakers, not just speech from noise.
Training data that reflects the real problem: 60% of our training samples include a competing human speaker mixed in. The model cannot pass training without learning to suppress speech that sounds like speech.
Auxiliary Separation Head: A lightweight Linear(256→32) + Sigmoid head attached to the encoder bottleneck, trained with L1 loss to predict an ERB-domain mask for background speakers. This is a training-only objective. It forces the encoder to carry speaker-discriminative features without adding any inference cost.
Production runtime in Rust: We built libweya_nc, a C-ABI shared library (Rust + tract for ONNX inference) that ships as a ~10 MB .so/.dylib/.dll with no embedded model. It shares compiled model weights across concurrent sessions via Arc<TypedSimplePlan>, so each session costs only a few KB of memory. Plug it into any C, C++, or Python application.
We trained on 10,000+ hours of mixed audio: LibriSpeech, VCTK, Common Voice for clean speech, DNS Challenge + FreeSound + ESC-50 for noise, and MIT IR Survey + OpenAIR for room impulse responses.
Why we open-sourced it: This gap exists because the benchmarks that drive open-source development (DNS Challenge, CHiME) measure noise suppression, not speaker isolation. Models optimized for those benchmarks are not optimized for Voice AI. We want to change that. Every team building voice agents, call centre bots, real-time transcription, or conversational AI systems deserves a model that actually handles the acoustic chaos of real phone calls.
The model, training code, Rust runtime library, and pretrained weights are all on GitHub and Hugging Face. MIT / Apache 2.0 licensed.
We're also fine-tuning a v2 optimized for even louder background noise and speech. Stay tuned.
Would love your feedback. Happy to answer any questions about the architecture, training, or how to integrate it 🙌

Sub-1ms on CPU is the claim that matters most here and also the one I'd want stress-tested. What's the degradation curve? Does it hold at 1ms with a single stream, and what happens at 10 or 50 concurrent calls on commodity hardware? That's the production reality for anyone running voice agents at scale.
The open-source angle is smart for adoption but the real question is where the commercial model sits. Apache 2.0 gets you into production stacks fast. What's the wedge that converts users to paying customers?

The CPU-only, sub-1ms-per-frame number is what jumped out at me. Most enhancement I've tried adds enough latency to break the natural turn-taking on a live call. We build voice AI that phones elderly parents at home, where the hard part is exactly what you describe: a TV going in the background, a spouse talking across the room, sometimes a hearing aid whistling. My question: when the primary speaker is quiet, slurred, or unsteady (pretty common with older users), does isolating them ever clip that softer speech? Planning to test Hush on some of our real call audio.

Hey Product Hunt! I'm @lordhasanali , CEO of weya AI.

We watched great voice AI fail in production, over and over, not because of the model, but because of the audio. Noisy environments, competing voices, background hum. Nobody was solving this properly, so we did.

Introducing Hush, our first in-house open-source speech enhancement model, which:

• Isolates the primary speaker and removes everything else in real time
• Runs entirely on CPU, under 1ms per frame - no GPU needed
• Language-agnostic - works across all spoken languages out of the box
• Apache 2.0 - free to use in production today

We launched at #5 on HuggingFace's Audio-to-Audio leaderboard, and this is just the start.

We'll be here all day answering questions. Try it, break it, and let us know what you think!

About Hush on Product Hunt

“Open-source noise suppression for voice AI agents”

Hush launched on Product Hunt on June 23rd, 2026 and earned 186 upvotes and 26 comments, placing #6 on the daily leaderboard. Hush removes competing voices, background noise, and audio interference from real-time calls so your voice AI agents always hear what matters.

Hush was featured in Open Source (68.6k followers), Developer Tools (515.7k followers), Artificial Intelligence (473.5k followers) and GitHub (41.3k followers) on Product Hunt. Together, these topics include over 220.3k products, making this a competitive space to launch in.

Who hunted Hush?

Hush was hunted by Hasan Ali. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.

Want to see how Hush stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.