Product Thumbnail

Athina AI

Monitor LLMs and automatically detect hallucinations in prod

Artificial Intelligence

Athina helps developers monitor and evaluate LLMs in production. Get complete visibility into RAG pipeline and 40+ preset eval metrics to detect hallucinations and measure performance

Top comment

Hi PH – I’m Shiv, a co-founder of Athina AI! We started on this journey about a year ago when we realized first-hand how difficult it is to take LLMs into production. One of the biggest challenges we faced was dealing with hallucinations, and finding effective ways to measure performance of different models, prompts and retrieval strategies. 😅 After speaking with dozens of builders, we found this to be a universal problem, and we set out to build the product we wished we had. With Athina, developers can easily monitor their LLM application in production, measure the model performance with our suite of 40+ evaluation metrics, and catch regressions in CI / CD. After many months of hard work, we’re now processing millions of logs every week from hundreds of users, and we’re excited to finally launch Athina publicly with the PH community! 🚀 Athina takes just a few minutes to set up, and here’s what you get: 🪵 Full visibility into production logs along with usage metadata like cost, token usage, etc. Athina also includes GraphQL access. 📊 Library of 40+ evaluation metrics including retrieval score, answer relevancy, faithfulness, conversation coherence, pii detection, and many more. 📐 Support for custom evaluation metrics: Easily plug in your own evaluation prompt or function. ⏱️ Compare performance across models, prompts, and topics so you can get insights like “gpt-4 has a 4.8% hallucination rate while our custom fine-tuned llama model has a 7.2% hallucination rate” 🛝 Built-In Prompt Playground so you can quickly experiment with different prompt and model combinations 👬 Built for collaboration: Athina supports multiple users in an organization. 🕸️ Enterprise Grade options like on-premise deployment, custom log retention, and more Thank you all for your support! ---- Website: https://athina.ai Sign Up: https://app.athina.ai Demo Video: https://bit.ly/athina-demo-feb-2024 Schedule calls with founders: https://cal.com/shiv-athina/30min

Comment highlights

Hey Shiv, A few questions: - can I moderate hallucinations in real time and stop my llm from sending response to user for example? Or is it just alerting? - is it possible to evaluate rag when the vector store changes ? Do you have side by side retrieval evaluation? - if my app has multiple chained llm calls, can I evaluate the entire flow ? Thx

Congrats on the launch Shiv and the team. Athina will drive a tangible impact. The product seems to tackle a concrete problem in an intuitive, user-friendly way. Inspiring work.

Really an important work to put LLM's in check. Congrats on the launch and best of luck

It's inspiring to see how your team identified a common challenge in taking LLMs into production and developed a solution to address it. What specific benefits do you anticipate it bringing to developers and organizations?

Detecting Digital hallucinations, who would have thought we would need to deal with this?

Hi Shiv, this is very interesting. I think there is great value in able to test prompts with different models.

Looks super useful. Congratulations on the progress! And thanks for having a free tier :D

Shiv & Himanshu, Huge congrats on rolling out Athina AI! It’s evident the amount of dedication and insight that went into addressing the complexities of deploying LLMs. The detailed monitoring and robust collection of evaluation metrics Athina offers seem like it could revolutionize the way developers approach model deployment. How it sheds light on model performance and identifies hallucinations - feels impressive. By the way, what’s the story behind the name 'Athina'? Excited to watch Athina AI evolve and make a significant impact on developers’ efficiency and workflow optimization! Best of luck to you and your entire team!

This is an important area of focus for anyone seriously using generative AI. Good luck.

Congrats on the launch @shivsak @himanshu_bamoria1 @akshat_athina and team! I love the product and have already been recommending it to others. Really impressed with what you all have built in such a short amount of time!

This is one of the best LLM evaluation tools I have tried and tested. Kudos to the team for the official launch :)