Product upvotes vs the next 3

Waiting for data. Loading

Product comments vs the next 3

Waiting for data. Loading

Product upvote speed vs the next 3

Waiting for data. Loading

Product upvotes and comments

Waiting for data. Loading

Product vs the next 3

Loading

Evidently AI

Open-source evaluations and observability for LLM apps

Evidently is an open-source framework to evaluate, test and monitor AI-powered apps. πŸ“š 100+ built-in checks, from classification to RAG. 🚦 Both offline evals and live monitoring. πŸ›  Easily add custom metrics and LLM judges.

Top comment

Hi Makers! I'm Elena, a co-founder of Evidently AI. I'm excited to share that our open-source Evidently library is stepping into the world of LLMs! πŸš€ Three years ago, we started with testing and monitoring for what's now called "traditional" ML. Think classification, regression, ranking, and recommendation systems. With over 20 million downloads, we're now bringing our toolset to help evaluate and test LLM-powered products. As you build an LLM-powered app or feature, figuring out if it's "good enough" can be tricky. Evaluating generative AI is different from traditional software and predictive ML. It lacks clear criteria and labeled answers, making quality more subjective and harder to measure. But there is no way around it: to deploy an AI app to production, you need a way to evaluate it. For instance, you might ask: - How does the quality compare if I switch from GPT to Claude? - What will change if I tweak a prompt? Do my previous good answers hold? - Where is it failing? - What real-world quality are users experiencing? It's not just about metricsβ€”it's about the whole quality workflow. You need to define what "good" means for your app, set up offline tests, and monitor live quality. With Evidently, we provide the complete open-source infrastructure to build and manage these evaluation workflows. Here's what you can do: πŸ“š Pick from a library of metrics or configure custom LLM judges πŸ“Š Get interactive summary reports or export raw evaluation scores 🚦 Run test suites for regression testing πŸ“ˆ Deploy a self-hosted monitoring dashboard βš™οΈ Integrate it with any adjacent tools and frameworks It's open-source under an Apache 2.0 license. We build it together with the community: I would love to learn how you address this problem and any feedback and feature requests. Check it out on GitHub: https://github.com/evidentlyai/e..., get started in the docs: http://docs.evidentlyai.com or join our Discord to chat: https://discord.gg/xZjKRaNp8b.