This product was not featured by Product Hunt yet. It will not be visible on their landing page and won't be ranked (cannot win product of the day regardless of upvotes).
Product upvotes vs the next 3
Waiting for data. Loading
Product comments vs the next 3
Waiting for data. Loading
Product upvote speed vs the next 3
Waiting for data. Loading
Product upvotes and comments
Waiting for data. Loading
Product vs the next 3
Loading
DeepEval4Claude
Free eval for your AI agent. No keys, no SaaS
A free Claude code that scores your AI agents output against the rubric that top-tier consulting companies use to evaluate their workflows. Catches the two things that generic evals miss: sycophancy, and silent ambiguity, No API keys, no SDKs, no signup. One command to install MIT licensed.
Hi PH — Evgeny here. Quick story behind this one.
I'm a solo builder. I ship Claude-powered tools without a QA team, without a peer reviewer, and usually without enough sleep. For a long time my "eval pipeline" was: read the agent's output, squint at it, ship it, hope.
A few months ago I noticed two things kept biting me:
The agent was too agreeable. I'd give it a half-baked premise. It would build a beautiful answer on top of that premise without ever pushing back. I'd ship the answer and only later realize the premise was wrong.
The agent quietly chose interpretations. I'd give it an ambiguous input. Instead of asking, it would pick one path and run with it. The output looked great. It was solving the wrong problem.
I tried the usual eval frameworks — Ragas, G-Eval, LangSmith. They're good at "did the model hallucinate?" They're not built to catch "did the model suck up to me?" or "did the model silently make a judgment call?" Those are different failure modes.
So I built deepeval-bcg for myself, and now I'm putting it out there.
What it does
You run /deepeval-run path/to/your-agent-output.md inside Claude Code. It scores your output on 8 dimensions using the actual rubric BCG uses to evaluate analysts (calibrated 1–3 scale, weighted). Then it runs a small Skeptic Agent that throws three adversarial probes at the output — an ambiguity probe, a sycophancy probe, and a steelman-the-opposite probe. Then a Novelty Stack that asks "if I search-replace the company name, does this analysis still read the same?" If yes, the output is generic boilerplate dressed up in nice formatting.
You get back a PASS / REVISE / FAIL verdict, per-dimension scores, and a concrete fix directive. Whole thing runs in about 30 seconds.
Who it's for
Honestly: people like me.
Solo founders shipping a Claude-powered product without a team to peer-review the outputs.
Indie AI engineers building agents as side projects or small SaaS tools, where "looks fine to me" is the only QA step.
Claude Code power users who want a sanity check on their agent's work before it goes to a user.
If your agent produces strategic memos, research summaries, briefings, plans, recommendations, or anything where the quality of the thinking matters more than the correctness of facts, this skill is built for you.
What I learned along the way (the things that surprised me)
Sycophancy is invisible to the model that produced it. A model judging its own output for sycophancy is like asking someone to spot their own accent. The Skeptic Agent works because it's a structurally separate prompt with adversarial framing baked in. Subtle but it matters.
A verbatim rubric beats a paraphrased one. Early on I paraphrased the BCG anchors into "cleaner" language. Scores drifted week to week as Claude reinterpreted my paraphrases. I went back to the verbatim original anchors and drift dropped noticeably. Lesson: don't get clever with your eval criteria.
No-API-keys was a constraint that turned into a feature. I started this because I didn't want to pay another vendor for evals on a side project. It turns out a lot of solo builders feel the same way. The skill runs entirely inside your existing Claude Code session — no second bill, no key juggling, no signup wall.
What it isn't
Not a tracing / observability platform. If you need production tracing, pair it with LangSmith or similar.
Not a faithfulness/RAG eval. Ragas is great at that — this complements rather than replaces.
Not a closed-source SaaS. MIT licensed, no telemetry, no upsell on the core. If it's useful, star the repo; if it's not, fork it and rip out what you don't need.
How I hope you'll use it
Install it. Run it on whatever your agent shipped this week. If it surfaces something useful, tell me in the comments. If it gets something wrong, especially tell me — there's a built-in feedback button that fires off a pre-filled GitHub issue with one click, and I read every one.
/plugin marketplace add EvXata/deepeval-bcg
/plugin install deepeval@deepeval-bcg
/deepeval-amazon
That last command runs a complete eval on a bundled real example so you can see the output before pointing it at your own work.
Thanks for reading. Genuinely curious what issues other developers are seeing in their agents — drop them below.
About DeepEval4Claude on Product Hunt
“Free eval for your AI agent. No keys, no SaaS”
DeepEval4Claude was submitted on Product Hunt and earned 0 upvotes and 1 comments, placing #35 on the daily leaderboard. A free Claude code that scores your AI agents output against the rubric that top-tier consulting companies use to evaluate their workflows. Catches the two things that generic evals miss: sycophancy, and silent ambiguity, No API keys, no SDKs, no signup. One command to install MIT licensed.
On the analytics side, DeepEval4Claude competes within Productivity, Developer Tools, Artificial Intelligence and GitHub — topics that collectively have 1.7M followers on Product Hunt. The dashboard above tracks how DeepEval4Claude performed against the three products that launched closest to it on the same day.
Who hunted DeepEval4Claude?
DeepEval4Claude was hunted by Jake Blake. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
For a complete overview of DeepEval4Claude including community comment highlights and product details, visit the product overview.