Real-time Voice AI Agents We are open-sourcing the easiest way for developers to build real-time Voice Agents and Virtual Avatars into any app—telephony, web, mobile, robotics, wearables, and beyond.
👋 Hey Product Hunt, I’m Arjun, co-founder of VideoSDK.
I'm beyond excited to launch our Open-Source AI Voice Agent SDK.
Today, voice is becoming the new UI. We expect agents to understand us, respond instantly, and work seamlessly across web, mobile, and even telephony. But, to achieve this, developers have to stitch together: STT, LLM, TTS, glued with HTTP endpoints and, a prayer.
This most often results in agents that sound robotic, hallucinations and fail in product environments without observability.
So we built something to solve that: End-to-End infrastructure to build, deploy, and monitor your AI Voice Agents
Here’s what it offers:
Global WebRTC infra with <80ms latency
Native turn detection, VAD, and noise suppression
Modular pipelines for STT, LLM, TTS, avatars, and real-time model switching
Built-in RAG + memory for grounding and hallucination resistance
SDKs for web, mobile, Unity, IoT, and telephony — no glue code needed
Agent Cloud to scale infinitely with one-click deployments — or self-host with full control
Think of it like moving from a walkie-talkie to a modern cell towers that handles thousands of calls.
VideoSDK gives you the infrastructure to build voice agents that actually work in the real world, at scale.
I'd love your thoughts and questions! Happy to dive deep into architecture, use cases, or crazy edge cases you've been struggling with.
All the best for today, I’ve been a VoiceDSK customer for years, and the evolution is promising.
Getting that 80ms latency consistently is the key. Kudos for achieving this.
This Video SDK would absolutely glow on Xiaohongshu (小红书)! ✨
I built Redverse to beam dev tools just like this into China's visual content galaxy.
One tap to launch 🚀
“VideoSDK looks stellar love how it streamlines building real-time video and audio experiences with easy-to-use APIs. what’s next for it? Are you planning upgrades like low-latency streaming, built-in recording, or advanced moderation tools?”
I think I saw an SDK that looks almost 1-to-1 like yours and is called LiveKit.
You have a cool product; can you tell me how you differ and how long you've been working on it? Huge scale of work!
This is a fantastic step toward making real-time voice AI more accessible to developers across platforms. Love that it's open-source — excited to see what the community builds with it! Congrats on the launch 🚀
Congratulations on the launch team. Feel free to import it on LaunchIgniter for maximum visibility
Congrats on the launch, team!!! 🥳
For Introducing Voice Agent SDK — an open-source framework to build real-time voice agents that actually work in production.
Built on VideoSDK, it empowers agents to join meetings, listen, speak, and think — all with under 80ms latency.
The cascading pipeline supports STT, LLM, TTS, VAD, and Turn Detection — fully provider-agnostic.
With A2A and MCP, you get multi-agent collaboration and seamless integration with external tools and services.
We can’t wait to see what the community builds with Voice Agent SDK — go create something amazing!
Congrats on the launch! 🎉 @Video SDK This looks like a game-changer for voice AI development. The <80ms latency with global WebRTC infrastructure sounds impressive. Quick question - how does your native turn detection handle overlapping speech or interruptions? That's always been a challenge with voice agents. Also curious about the pricing model for the Agent Cloud vs self-hosting options!
this looks promising, makes your main product a full stack video & audio framework for building agents.
Congrats on the launch team!
🔥 This is a game-changer for anyone building with voice! Love how you're simplifying the entire stack for real-time Voice AI—especially the flexibility across telephony, web, mobile, and even robotics. Open-sourcing it makes it even more powerful for indie hackers and startups. Huge kudos to the team 👏