Coval
Simulation & evals to ship delightful voice & chat AI agents
Analytics
Developer Tools
Artificial Intelligence

Featured onDecember 4th, 2024

Coval helps developers build reliable voice and chat agents faster with seamless simulation and evals. Create custom metrics, run 1000s of scenarios, trace workflows and integrate with CI/CD pipelines for actionable insights and peak agent performance.

Top comment

Upvotes948

▲ 948View on ProductHunt ⧉

Comments256

256 commentsSee comments on PH ⧉

Product of the Day1st

Hi Product Hunt Community 🐱👋 I’m Brooke, Founder of Coval! Today, we’re excited to launch Coval, a platform that transforms how you test, debug, and monitor voice and chat agents. Simulate thousands of scenarios from a few test cases. You create the prompts, we simulate environments to test your agents from all directions. 👉 Why did I build Coval? Before founding Coval, I led the evaluation job infrastructure team at Waymo, building simulation tools that tested every code change to ensure the Waymo Driver improved with every iteration. This shift from manual testing on racetracks to scalable, automated simulation transformed autonomous vehicles from early prototypes into reliable systems now navigating the streets of San Francisco. Today, AI agents face similar challenges: promising prototypes often hit reliability roadblocks as they scale. Drawing on my Waymo experience, I built Coval to bring automated simulation and evaluation to AI agents, helping teams move faster and deliver reliable, real-world performance. Coval’s mission? To ensure AI agents can be trusted with critical tasks, just as simulation helped unlock the potential of self-driving cars. It’s a tool built by developers, for developers—designed to save time, increase confidence, and eliminate the headaches of conversational AI development. ❓What Problems Does Coval Solve? 👉 Manual Testing Wastes Time Manually calling or chatting with agents is inefficient. Coval integrates into your CI/CD pipeline, running 1000s of simulations automatically with each prompt change. This saves time, increases test coverage, and boosts confidence in production performance. 👉 Debugging is a Nightmare Fixing one issue often breaks something else. Coval eliminates this frustration by providing actionable insights into agent workflows, tracking metrics for each simulation to help you pinpoint and resolve problems effectively. 👉 Production Monitoring is Hard Identifying the root cause of agent mistakes in production can be a nightmare. Coval’s monitoring offers immediate, actionable insights into custom metrics like LLM-as-a-Judge or tool calls, making it easier to ensure reliable performance. ❓Why Us? Our team brings deep experience in LLM evaluations at Berkeley & Stanford, building distributed systems for Fortune 500 companies, and crafting intuitive user interfaces. 🚀 Special Launch Offer As part of our Product Hunt launch, enjoy a free 2-week trial with personalized onboarding. We’ll help you set up custom metrics, run your first evaluations, and get the most out of Coval. 👉 Start Your Free Trial: https://www.coval.dev 👉 Check out our Docs: https://docs.coval.dev/overview 👉 Book a Demo Call: https://cal.com/bnhopkins/demo Excited to help you ship reliable AI agents faster! P.S. Drop by the comments—we’d love your feedback!

Comment highlights

Every time I take a Waymo, I'm struck by how safe and reliable it feels. As any AI engineer knows, that level of trust comes from rigorous evaluation frameworks implemented from day one. It's incredible that the same evaluation best practices that helped build these self-driving cars are now accessible to everyone, thanks to someone who actually built Waymo's evaluation infrastructure! This is exactly what we need.

Huge congrats on the launch @brooke_hopkins3!! Can't wait to see what overcoming this bottleneck can unlock, very exciting 🚀 🚀

Big congrats on the launch!!! I see so much effort you are putting to promote this product. Big things ahead of you!!! 🚀🚀🚀