What models does TransformerLens support?

It supports over 50 GPT-style models including GPT-2, LLaMA, LLaMA-2, Mistral, Pythia, EleutherAI models, and Google's Gemma.

What is the benefit of using HookedTransformer?

HookedTransformer wraps standard models with HookPoints at every internal activation (residual stream, MLP, attention), making it easy to inject or extract data without rewriting the model code.

How does this skill help with activation patching?

It provides structured workflows and code patterns to define clean and corrupted prompts, allowing you to systematically swap activations to identify causal hotspots in the model.

Can I use this skill with Sparse Autoencoders?

Yes, it includes integration patterns for SAELens, enabling you to encode residual stream activations into interpretable features using SAEs.

TransformerLens Interpretability

Name: TransformerLens Interpretability
Author: Orchestra-Research

byOrchestra-Research

•

3,983

•

Data Science & ML

Analyzes and manipulates transformer model internals using mechanistic interpretability techniques like activation patching and circuit analysis.

This skill empowers researchers and developers to peek inside the 'black box' of GPT-style language models using the TransformerLens library. It provides comprehensive patterns for inspecting activations via HookPoints, performing causal tracing through activation patching, and identifying specific circuits like induction heads or Indirect Object Identification (IOI) pathways. By standardizing the workflow for reverse-engineering model algorithms, it facilitates a deeper understanding of how LLMs process information and make decisions during inference.

Key Features

01Advanced activation caching across all transformer layers and components

02Integration with SAELens for Sparse Autoencoder (SAE) research

03Support for 50+ model families including GPT-2, LLaMA, and Mistral

043,983 GitHub stars

05Direct logit attribution and circuit discovery patterns

06Activation patching and causal tracing for model behavior analysis

Use Cases

01Reverse-engineering specific linguistic or logical circuits within an LLM

02Performing surgical interventions on model activations to test causal hypotheses

03Identifying induction heads and studying in-context learning mechanisms

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add orchestra-research/ai-research-skills transformer-lens

For use in Claude.ai and ChatGPT

Download Skill