Evidently AI
Open-source evaluations and observability for LLM apps
Open Source
Developer Tools
Artificial Intelligence
GitHub

Featured onAugust 20th, 2024

Shipixen

Go from nothing → deployed Next.js codebase in minutes • Sponsored

Get Shipixen ⧉

Product upvotes vs the next 3

Waiting for data. Loading

Product comments vs the next 3

Waiting for data. Loading

Product upvote speed vs the next 3

Waiting for data. Loading

Product upvotes and comments

Waiting for data. Loading

Product vs the next 3

Evidently AI

Open-source evaluations and observability for LLM apps

Evidently is an open-source framework to evaluate, test and monitor AI-powered apps. 📚 100+ built-in checks, from classification to RAG. 🚦 Both offline evals and live monitoring. 🛠 Easily add custom metrics and LLM judges.

Top comment

Upvotes413

▲ 413View on ProductHunt ⧉

Comments114

114 commentsSee comments on PH ⧉

Product of the Day5th

Hi Makers! I'm Elena, a co-founder of Evidently AI. I'm excited to share that our open-source Evidently library is stepping into the world of LLMs! 🚀 Three years ago, we started with testing and monitoring for what's now called "traditional" ML. Think classification, regression, ranking, and recommendation systems. With over 20 million downloads, we're now bringing our toolset to help evaluate and test LLM-powered products. As you build an LLM-powered app or feature, figuring out if it's "good enough" can be tricky. Evaluating generative AI is different from traditional software and predictive ML. It lacks clear criteria and labeled answers, making quality more subjective and harder to measure. But there is no way around it: to deploy an AI app to production, you need a way to evaluate it. For instance, you might ask: - How does the quality compare if I switch from GPT to Claude? - What will change if I tweak a prompt? Do my previous good answers hold? - Where is it failing? - What real-world quality are users experiencing? It's not just about metrics—it's about the whole quality workflow. You need to define what "good" means for your app, set up offline tests, and monitor live quality. With Evidently, we provide the complete open-source infrastructure to build and manage these evaluation workflows. Here's what you can do: 📚 Pick from a library of metrics or configure custom LLM judges 📊 Get interactive summary reports or export raw evaluation scores 🚦 Run test suites for regression testing 📈 Deploy a self-hosted monitoring dashboard ⚙️ Integrate it with any adjacent tools and frameworks It's open-source under an Apache 2.0 license. We build it together with the community: I would love to learn how you address this problem and any feedback and feature requests. Check it out on GitHub: https://github.com/evidentlyai/e..., get started in the docs: http://docs.evidentlyai.com or join our Discord to chat: https://discord.gg/xZjKRaNp8b.

Evidently AIOpen-source evaluations and observability for LLM appsOpen SourceDeveloper ToolsArtificial IntelligenceGitHub

Product upvotes and comments

Product vs the next 3

Top comment

Evidently AI
Open-source evaluations and observability for LLM apps
Open Source
Developer Tools
Artificial Intelligence
GitHub