Most AI agent products focus on orchestration. Bolt Foundry focuses on trust. Gambit is our open-source, local-first foundation for building agents you can inspect, test, grade, and verify. It has plain-English authoring, traceable runs, reusable graders, and repeatable verification built into the workflow.
Hi Product Hunt, Dan here from Bolt Foundry.
We built Gambit because we kept running into the same problem: teams can get an AI agent demo working, but they still cannot prove it will behave the way they expect in production. Most tools help you orchestrate agents. We wanted a better way to build agents you can actually inspect, test, grade, and verify.
Gambit is our open-source, local-first foundation for building trustworthy AI agents. Instead of hiding behavior inside a pile of prompts and glue code, it lets you author agent behavior in plain English, run real scenarios, inspect failures, turn human judgment into reusable graders, and verify behavior across repeated runs.
That workflow is the core of this launch. We are not just showing a chatbot or agent demo. We are showing a way to make agent behavior more legible, traceable, and fixable before it reaches users.
What is live today:
* open-source Gambit you can run locally
* a workflow for building, testing, grading, and verifying agent behavior
* demo assets and examples that show how we think teams should close the loop between agent behavior and trust
What we would love feedback on:
* Does this framing of “trust” vs. “orchestration” make sense to you?
* If you are building with agents today, where does reliability break down for your team?
* Which part of the workflow feels most valuable: plain-English authoring, traceable runs, graders, or verification?
We will be around all day answering questions and would love blunt feedback. Thanks for checking out Gambit.
I saw Gambit: an open source agent harness for reliable agent sassistants, and workflows” on Product Hunt and then checked the Bolt Foundry blog.
Your point about demos working but production behavior not being provable was spot on. The idea of inspecting, grading, and verifying agents — instead of hiding logic in prompts — feels practical.
Quick question: when teams start using Gambit locally do they usually define grading criteria upfront, or does it evolve after seeing failures?
Appreciate how clearly you framed the trust layer around agents.
Interesting direction here. The part in your post about teams being able to get an agent demo working but not being able to prove how it will behave in production hits a real pain.
One quick observation while looking at the homepage. The idea of trust vs orchestration comes through very clearly in your launch explanation, but the hero (Open foundation for trustworthy agents) feels a bit more abstract than the story you told here.
Something closer to the production reliability angle might communicate the difference faster. Have you tested leaning into that contrast more, @siscodan?