Product upvotes vs the next 3

Waiting for data. Loading

Product comments vs the next 3

Waiting for data. Loading

Product upvote speed vs the next 3

Waiting for data. Loading

Product upvotes and comments

Waiting for data. Loading

Product vs the next 3

Loading

Circuit Tracer

Anthropic's open tools to see how AI thinks

Anthropic's open-source Circuit Tracer helps researchers understand LLMs by visualizing internal computations as attribution graphs. Explore on Neuronpedia or use the library. Aims for AI transparency.

Top comment

Hi everyone!

We often hear about how large language models are like "black boxes," and understanding how they arrive at their outputs is a huge challenge. Anthropic's new open-source Circuit Tracer tools offer a fascinating step towards peeling back those layers.

Rather than focusing on building bigger models, this initiative is about developing better tools to see inside the ones we currently use. Researchers and enthusiasts can now generate and explore attribution graphs – which essentially map out parts of a model's internal decision-making process for given prompts on models like Llama 3.2 and Gemma-2. You can even interact by modifying internal features to observe how outputs change.

As AIs get more capable, genuinely understanding their internal reasoning, how they plan, or even when they might be "faking it," is becoming more crucial for building trust, ensuring safety, and responsibly guiding their development.