Open-source evaluations and observability for LLM apps
Evidently is an open-source framework to evaluate, test and monitor AI-powered apps. π 100+ built-in checks, from classification to RAG. π¦ Both offline evals and live monitoring. π Easily add custom metrics and LLM judges.
Hi Makers!
I'm Elena, a co-founder of Evidently AI. I'm excited to share that our open-source Evidently library is stepping into the world of LLMs! π
Three years ago, we started with testing and monitoring for what's now called "traditional" ML. Think classification, regression, ranking, and recommendation systems. With over 20 million downloads, we're now bringing our toolset to help evaluate and test LLM-powered products.
As you build an LLM-powered app or feature, figuring out if it's "good enough" can be tricky. Evaluating generative AI is different from traditional software and predictive ML. It lacks clear criteria and labeled answers, making quality more subjective and harder to measure. But there is no way around it: to deploy an AI app to production, you need a way to evaluate it.
For instance, you might ask:
- How does the quality compare if I switch from GPT to Claude?
- What will change if I tweak a prompt? Do my previous good answers hold?
- Where is it failing?
- What real-world quality are users experiencing?
It's not just about metricsβit's about the whole quality workflow. You need to define what "good" means for your app, set up offline tests, and monitor live quality.
With Evidently, we provide the complete open-source infrastructure to build and manage these evaluation workflows. Here's what you can do:
π Pick from a library of metrics or configure custom LLM judges
π Get interactive summary reports or export raw evaluation scores
π¦ Run test suites for regression testing
π Deploy a self-hosted monitoring dashboard
βοΈ Integrate it with any adjacent tools and frameworks
It's open-source under an Apache 2.0 license.
We build it together with the community: I would love to learn how you address this problem and any feedback and feature requests.
Check it out on GitHub: https://github.com/evidentlyai/e..., get started in the docs: http://docs.evidentlyai.com or join our Discord to chat: https://discord.gg/xZjKRaNp8b.