Product Thumbnail

Agenta

Open-source prompt management & evals for AI teams

Open Source
Developer Tools
Artificial Intelligence

Agenta is an open-source LLMOps platform for building reliable AI apps. Manage prompts, run evaluations, and debug traces. We help developers and domain experts collaborate to ship LLM applications faster and with confidence.

Top comment

Cool. Congratulations on the launch! We’re also working on an AI project right now.

Comment highlights

Tried a prompt tracing tool last year and TBH the hardest part wasn’t the traces themselves but connecting them to test suites. Agenta's evals + test-case approach sounds promising because we need deterministic tests for regression checks. In our case we was able to catch a prompt drift only after a month, so automated evals would be huge. Would love to know how easy it is to author evaluators for domain-specific metrics. IMO good CI hooks and a lightweight API make the difference between a demo and something you can rely on in production.

Nice launch, but TBF I'm curious about approval workflows and audit trails. Henricook raised a good point, role-based access is useful, but some orgs need strict approvals before a prompt hits prod. Does Agenta provide an out-of-the-box approval queue or webhooks so we can tie it into our Jira/GitOps flow? Also wondering about immutable audit logs for compliance, that's non-negotiable for us. Is there any plan for add approval workflows or will we rely on external scripts?

Hard agree on the problem statement, shipping reliable prompts is tough. IMO the devil is in audit and approvals, not just branching. We need fine-grained audit logs, immutable history, and easy approval flow for non-dev users. Can you marks who approved each prompt and timestamp that action in the UI? If Agenta offers that plus a one-click rollback, it’d be a no-brainer for our infra.

tbh I was evaluating Langfuse and a couple other tools last months. What sold me on Agenta's pitch is the UI-first approach for non-dev SMEs. In our org SMEs need to tweak prompts without opening PRs and that workflow matters. Approval workflows would be a killer feature for larger orgs, TBF. Would love an API hook so we can gate prompt deploys via our CI pipeline.

TBF this looks promising. Curious how Agenta handles traces when you have async, high-latency LLM calls. We've seen trace sampling drop important edge cases in our infra and that bit us in prod. Are evaluators configurable to run off real traffic vs synthetic test sets? Also, who stores the logs if self-hosted, does it require extra infra or it included?

Congrats on the launch! Super clean workflow, love how Agenta brings the whole team into one place. How do you handle version control for prompts?

Great product! Can I integrate prompts from it to my app via API/SDK?
Can I use variables in the prompt?

I really like Agenta and chose it as my #1 tool for prompt engineering when I researched different tools. I needed a tool to teach my students at the University of Calgary how to do systematic prompt engineering studies, and this one was the best one for a non-technical audience to access all the professional tools for such studies in one place. I'm planning to get our PhD students into it now for prompt engineering studies that can turn into full-fledged research papers.

Been juggling prompts in git and evals in notebooks. Agenta looks like the boring tooling I actually want. Open source is nice. Trace debug view = clutch. If it plays nice with PostHog/LangChain, I’m in. Saving this for next sprint.

I could see how our tool could benefit from Agenta.ai’s agent-orchestration workflow to streamline complex automation tasks across our platform, and I’m definitely going to take a closer look at their launch to explore how we can make this work together.

seldomly can see an opensource project for LLMops like Agenta! Great launch and congrats team!

@mahmoudmabrouk Love the open-source approach! 🚀

How do you handle prompt versioning when multiple teams are collaborating? Can you roll back to previous versions easily?

Huge congrats! Love seeing more open-source tooling that actually helps teams ship with confidence.

@mabrouk Love seeing more momentum in the LLMOps space, especially with an open-source approach. Most teams trying to ship AI features hit the same wall: lots of prompts, zero visibility, and no reliable way to evaluate or debug what’s actually happening under the hood.

A platform that unifies prompts, evals, and trace debugging feels like a real unlock for both devs and domain experts who don’t want to depend on guesswork.

Curious: what’s been the biggest challenge so far, capturing consistent traces, defining evaluation metrics, or helping teams collaborate around prompt changes?

This looks pretty handy finally a place for AI teams to manage prompts, test them, and debug without chaos. Open-source makes it even better, feels like a tool teams can actually build on and trust.

As a PM, have been trying several tools for evals, super excited to try this one!!

Oh wow, this is really amazing. Collaborating with the team on prompts and debugging with evaluations is a really cool idea. It seems like AI tools are really evolving :) Also, I see APIs, and that makes it even more exciting.
Would love to try that out.