Product Thumbnail

Circuit Tracer

Anthropic's open tools to see how AI thinks

Open Source
Artificial Intelligence
GitHub

Anthropic's open-source Circuit Tracer helps researchers understand LLMs by visualizing internal computations as attribution graphs. Explore on Neuronpedia or use the library. Aims for AI transparency.

Top comment

Hi everyone!

We often hear about how large language models are like "black boxes," and understanding how they arrive at their outputs is a huge challenge. Anthropic's new open-source Circuit Tracer tools offer a fascinating step towards peeling back those layers.

Rather than focusing on building bigger models, this initiative is about developing better tools to see inside the ones we currently use. Researchers and enthusiasts can now generate and explore attribution graphs – which essentially map out parts of a model's internal decision-making process for given prompts on models like Llama 3.2 and Gemma-2. You can even interact by modifying internal features to observe how outputs change.

As AIs get more capable, genuinely understanding their internal reasoning, how they plan, or even when they might be "faking it," is becoming more crucial for building trust, ensuring safety, and responsibly guiding their development.

Comment highlights

This will be great in so many ways because to counter companies like Google using AI overviews, we need to first understand what it's thinking to create solutions. I keep thinking daily to resolve things like this, I hope this tool can help me with that.

Using AI or building with AI really needs AI mindset. It's something I have extensively used and the learning goes on.

Good to see focus coming on this.