Product Thumbnail

Evidently AI

Open-source evaluations and observability for LLM apps

Open Source
Developer Tools
Artificial Intelligence
GitHub

Evidently is an open-source framework to evaluate, test and monitor AI-powered apps. ๐Ÿ“š 100+ built-in checks, from classification to RAG. ๐Ÿšฆ Both offline evals and live monitoring. ๐Ÿ›  Easily add custom metrics and LLM judges.

Top comment

Hi Makers! I'm Elena, a co-founder of Evidently AI. I'm excited to share that our open-source Evidently library is stepping into the world of LLMs! ๐Ÿš€ Three years ago, we started with testing and monitoring for what's now called "traditional" ML. Think classification, regression, ranking, and recommendation systems. With over 20 million downloads, we're now bringing our toolset to help evaluate and test LLM-powered products. As you build an LLM-powered app or feature, figuring out if it's "good enough" can be tricky. Evaluating generative AI is different from traditional software and predictive ML. It lacks clear criteria and labeled answers, making quality more subjective and harder to measure. But there is no way around it: to deploy an AI app to production, you need a way to evaluate it. For instance, you might ask: - How does the quality compare if I switch from GPT to Claude? - What will change if I tweak a prompt? Do my previous good answers hold? - Where is it failing? - What real-world quality are users experiencing? It's not just about metricsโ€”it's about the whole quality workflow. You need to define what "good" means for your app, set up offline tests, and monitor live quality. With Evidently, we provide the complete open-source infrastructure to build and manage these evaluation workflows. Here's what you can do: ๐Ÿ“š Pick from a library of metrics or configure custom LLM judges ๐Ÿ“Š Get interactive summary reports or export raw evaluation scores ๐Ÿšฆ Run test suites for regression testing ๐Ÿ“ˆ Deploy a self-hosted monitoring dashboard โš™๏ธ Integrate it with any adjacent tools and frameworks It's open-source under an Apache 2.0 license. We build it together with the community: I would love to learn how you address this problem and any feedback and feature requests. Check it out on GitHub: https://github.com/evidentlyai/e..., get started in the docs: http://docs.evidentlyai.com or join our Discord to chat: https://discord.gg/xZjKRaNp8b.

Comment highlights

Itโ€™s exciting to see an open-source platform that offers so much for AI monitoring and testing. The detailed metrics and continuous evaluation features are particularly noteworthy.

The ability to track data drift and monitor model performance seamlessly is impressive. Iโ€™m excited to see how this can streamline AI observability in real-world applications.

Iโ€™m impressed by how this tool integrates LLM observability into one platform. Itโ€™s a great way to ensure my AI models are performing as expected and quickly identify issues.

Congrats on the launch! Have you found that Evidently's suggestions are pretty consistent across solutions, or does it really depend on the application at hand? For example, does it always recommend ChatGPT over Claude (or vice versa), or does it depend on the use case? (And if you can share two use cases where the answer is different, that'd be super cool!)

How does Evidently AI handle complex datasets, and what level of customization is available for the reports?

The ability to quickly identify issues in machine learning models is invaluable. This feels like a powerful way to keep everything running smoothly. Good luck with the continued development!

Itโ€™s impressive how this simplifies monitoring machines learning models. the clarity it offers is really beneficial for data teams.

Evidently AI seems like a powerful tool for simplifying data insights. Looking forward to exploring its features! What makes Evidently AI stand out from other analytics tools?

Hi Elena and the Evidently AI team! ๐Ÿš€ Itโ€™s fantastic to see Evidently AI expanding into the LLM space. The ability to evaluate and monitor LLM performance is crucial, especially as these models become more integral to various applications. Can you provide an example or case study demonstrating how Evidently AI has successfully helped a project manage LLM evaluation and quality assurance, particularly highlighting the impact on model performance or user experience? Looking forward to seeing how Evidently AI evolves in this new domain and thanks for making such valuable tools open-source! ๐Ÿ› ๏ธ๐Ÿ“Š

Love this are toolkits as part of Evidently AI to cover some of the most important AIOps use cases like monitoring model hallucinations, etc. I'm new to building products with AI, I'm curious if there are learning resources for someone like me to learn more about topics like how to test AI generated results. Or does the AI has some suggestions on what methods to use? Also, big props for making the Evidently platform open source - you have my support for making this available to the world! Congrats on the launch @elenasamuylova and team!

Hello guys, congrats! How does Evidently help streamline the evaluation process for LLM-powered applications compared to traditional ML models?๐Ÿค”

Amazing team with a fantastic product that's solving real-need. Congrats team Evidently.

Cracked team solving one of the hardest problems in LLMs today. If anyone is going to solve it, it's them.

Fantastic launch! We've been searching for an effective solution like this for quite some time. How do you tailor your solution to meet the varying needs of your clients?

This solves so many of my current pain points working with LLMs. I'm developing AI mentors and therapists and I need a better way to run evals for each update and prompt optimization. Upvoting, bookmarking, and going to try this out! Thank you Elena!

+1 amazing team +1 amazing product in addition: friendly open-source support (easy to add suggestions and see it in the next release)

Congrats @elenasamuylova and @emeli_dral . It's an amazing product. I've been recommending your course to our users and customer (https://www.evidentlyai.com/ml-o...) - it's one of the best I think. It's an exciting progress on the LLM side.