Product Thumbnail

Bench for Claude Code

Store, review, and share your Claude Code sessions

Developer Tools
Artificial Intelligence
Data Visualization

Claude Code just opened a PR. But do you really know what it did? By using Bench you can automatically store every session and easily find out what happened. Spot issues at a glance, dig into every tool call and file change, and share the full context with others through a single link: no further context needed. When things go right, embed the history in your PRs. When things go wrong, send the link to a colleague to ask for help. Free, no limits. One prompt to set up on Mac and Linux.

Top comment

Hey Product Hunt! 👋

I’m Manuel, co-founder of Silverstream AI. Since 2018, I’ve been working on AI agents across Google, Meta, and Mila. Now I’m building Bench for Claude Code with a small team.

If you use Claude Code a lot and want to store, review, or share its sessions, this tool is for you. Once connected, Bench automatically records and organizes your sessions, letting you inspect and debug them on your own or share them with your team to improve your workflows.

Getting started is simple:
• Go to bench.silverstream.ai and set it up in under a minute on Mac or Linux
• Keep using Claude Code as usual
• Open Bench when you need to understand or share a session


That’s it.

Bench is completely free. We built it for ourselves and now want as many developers as possible to try it and shape it with us.


We’ll be here all day reading and replying to feedback (without using Claude 😂). Would love to hear what you think!


Btw, support for more agents is coming soon, so stay tuned!

Comment highlights

Hey Product Hunt! I'm Omar, Founding Researcher at Silverstream AI.

We originally built Bench as an internal tool to make debugging our own agents less painful, and it's become something I reach for every day.

My favorite part? The high-level run overview. When an agent run has hundreds of steps, being able to scan the whole thing at a glance and immediately spot where something went wrong is a huge time-saver. From there, I can zoom in all the way down to the model's reasoning traces at the exact step where things broke, which makes a real difference when you're trying to understand why an agent made a certain decision, not just what it did.

As we kept adding features, we realized Bench had become too useful to keep to ourselves, so here we are! 🚀

We're starting with Claude Code, but support for more agents is on the way. Give it a try and let us know what you think!

Storing and reviewing sessions sounds like a developer convenience. But what's actually happening is something more interesting — you're creating a layer of reflection between execution and understanding.

Most tools help you move faster. This one helps you see what you did. That distinction matters more than most people realize, because the gap between building and knowing what you built is where most coordination breaks down.

I've tackled similar challenges with code reviews and context sharing, and I love how Bench automates session storage. How do you handle sensitive data in stored sessions to ensure developers aren’t accidentally sharing proprietary code?

Hey folks! I’m Simone, Co-founder and CTO of Silverstream AI.

Really happy to be launching this today. I’m excited to share it, and very curious to hear your feedback!

One habit we’ve introduced across the team is linking Bench sessions in PRs whenever Claude Code was involved in creating or debugging a change. It gives reviewers a lot more context on how a bug was found and fixed, instead of just showing the final diff.

That’s been one of the most useful workflows for us, and I’d recommend it to other teams using Claude Code too.

I’m also using Bench in a research setting, where session data helps generate detailed methodology reports showing how results were obtained. I’m already finding it useful, and I think there’s a lot more to unlock there!

Looking forward to your thoughts. I want to make Bench as useful for other devs as it's been so far for us, and your input really matters!

I've been thinking about this for a while now. Traditional git style version control is not optimal for the AI coding era. You lose information from your claude code terminal or your AI coding tool of choice. Cool to see this getting productize. Congrats on launch!

Great looking observability layer to see what's happening behind the scenes! I think it will surely help teams optimize their processes.

Congrats on the launch!

Now add observability + failure handling, otherwise it’s just scheduled guessing.

Nice.

Most people don’t need logs.

They need to understand why the agent made a bad decision and how to prevent it next time.

I love finding Claude Code related products daily on PH. This looks great!

Claude Code is so capable that we end up trusting it a little too much. But that's exactly when things get interesting:

  • I've had it silently migrate my local DB to an incompatible version while fixing a bug.

  • Another time, Claude decided the only way it had to fix an issue with a particulary inefficient for loop, was to turn off my audio drivers.

The real problem isn't that it made mistakes. It's that I had no way to go back and understand what it did, when, and why, to learn from it and finetune my prompts. Sure, I could just scroll the claude logs, but what if the "failures" weren't apparent until much later? Or what if the issue was at step 315 out of an hour-long agent run of 500 steps?

That's why Bench is a big deal. Not just a logger, but an audit trail that makes agent actions legible: every tool call, file change, conversation, subagent detail, all is there for you for as long as you need it, searchable and shareable. A great way to "share your context" to your colleagues, as well as being what I really needed to learn from my mistakes and improve in terms of prompt writing!

I’m curious how detailed the tracking is. If I can really see every tool call and file change clearly, I can imagine using this for debugging more than anything else.

Being able to attach session history to PRs is a really smart idea. Makes collaboration much easier.

Cograts on the launch. I can see this becoming essential for teams using AI agents regularly, especially when debugging or reviewing work.

How granular is the session tracking? Can you trace decisions step-by-step or it is more of a high level overview?

How granular is the session tracking? Can you trace decisions step-by-step or it is more of a high level overview?

How deep does it go when tracking tool calls and file changes across a session?

I’ve been using Claude Code quite a bit, and I often lose track of what actually happened in a session. This idea of being able to go back and inspect everything feels really useful for me.