Build voice agents you can trust. Roark tracks call metrics, runs evaluations, and stress-tests your agent with simulated callers across accents, languages, and speaking styles. Failed calls become tests - giving you visibility and continuous improvement.
When we first built voice agents, we ran into the same problems every team faces:
Testing was manual - we literally called agents over and over just to check if they followed instructions.
Monitoring was missing - we didn’t know when failures happened, and even when they did, we had no idea which levers to pull to make the agent better.
Fixes didn’t stick - regressions kept popping back up without us noticing.
So we built Roark - a platform that brings reliability and visibility to Voice AI.
Here’s what Roark does today:
🔹 Monitoring & Evaluation
Capture 40+ built-in call metrics and events (latency, instruction-following, repetition detections, sentiment, etc.) - plus define your own custom ones.
Support for calls with up to 15 speakers, with automatic speaker identification.
Analyze audio with models for emotion, vocal cues, and even fine-tuned transcriptions based on your use case.
Build dashboards, schedule reports, set up alerts, and trigger webhooks so your team is always in the loop.
Evaluate calls with best-in-class evaluators you can run on demand or automate via SDK/API.
🔹 Simulations & Personas
Run end-to-end simulations for both inbound and outbound agents, over the phone or WebSocket - so you’re testing the same paths real customers take.
Define tests as conversations - a sequence of turns between customer and agent, using a graph-based approach. This makes it easy to branch into edge cases or test variants, so your coverage reflects real-world complexity, not just happy paths.
Configure personas by gender, language, accent, background noise, and speech profile (pace, clarity, disfluencies).
Layer on behavior profiles like base emotion, intent clarity, confirmation style, memory reliability - even a backstory.
Stress-test across real-world variables and automatically generate test cases from live calls (failed calls → repeatable tests).
🔹 Developer-first Integrations
First-class SDKs in Node & Python + REST API.
Native support for LiveKit, Pipecat, VAPI, Retell, and Voiceflow.
Easiest integrations on the market - nothing bolted together overnight.
👉 In the past 6 months, Roark has already processed over 10M minutes of calls for companies like Radiant Graph, Podium, Aircall and BrainCX - helping them evaluate agents and run simulations at scale.
The result? A full lifecycle platform that closes the loop: monitor your live calls → spot failures → turn them into tests → improve continuously.
Think of Roark as the QA + Observability layer for Voice AI - robust, deeply thought-out, and built to last.
👋 Hey Product Hunt!
We’re @zammitjames & @danielgauci, co-founders of Roark (YC W25).
When we first built voice agents, we ran into the same problems every team faces:
Testing was manual - we literally called agents over and over just to check if they followed instructions.
Monitoring was missing - we didn’t know when failures happened, and even when they did, we had no idea which levers to pull to make the agent better.
Fixes didn’t stick - regressions kept popping back up without us noticing.
So we built Roark - a platform that brings reliability and visibility to Voice AI.
Here’s what Roark does today:
🔹 Monitoring & Evaluation
Capture 40+ built-in call metrics and events (latency, instruction-following, repetition detections, sentiment, etc.) - plus define your own custom ones.
Support for calls with up to 15 speakers, with automatic speaker identification.
Analyze audio with models for emotion, vocal cues, and even fine-tuned transcriptions based on your use case.
Build dashboards, schedule reports, set up alerts, and trigger webhooks so your team is always in the loop.
Evaluate calls with best-in-class evaluators you can run on demand or automate via SDK/API.
🔹 Simulations & Personas
Run end-to-end simulations for both inbound and outbound agents, over the phone or WebSocket - so you’re testing the same paths real customers take.
Define tests as conversations - a sequence of turns between customer and agent, using a graph-based approach. This makes it easy to branch into edge cases or test variants, so your coverage reflects real-world complexity, not just happy paths.
Configure personas by gender, language, accent, background noise, and speech profile (pace, clarity, disfluencies).
Layer on behavior profiles like base emotion, intent clarity, confirmation style, memory reliability - even a backstory.
Stress-test across real-world variables and automatically generate test cases from live calls (failed calls → repeatable tests).
🔹 Developer-first Integrations
First-class SDKs in Node & Python + REST API.
Native support for LiveKit, Pipecat, VAPI, Retell, and Voiceflow.
Easiest integrations on the market - nothing bolted together overnight.
👉 In the past 6 months, Roark has already processed over 10M minutes of calls for companies like Radiant Graph, Podium, Aircall and BrainCX - helping them evaluate agents and run simulations at scale.
The result? A full lifecycle platform that closes the loop: monitor your live calls → spot failures → turn them into tests → improve continuously.
Think of Roark as the QA + Observability layer for Voice AI - robust, deeply thought-out, and built to last.
If you’re building voice agents, you can sign up today for 50% off with our PH discount, book a demo here if you’d like a walkthrough, or just drop me a note at [email protected] - we’d love to help.
- James & Daniel