Can Mechanistic Interpretability Find Real Biology?

We decomposed a DNA foundation model into 12,288 features, found one that classical metrics couldn't explain, and sent 14 DNA sequences to a wet lab to test whether it was real.

Goodfire AI · March 2026 · Cell-free protein expression via Ginkgo Bioworks

1. The Question

Can mechanistic interpretability of a DNA foundation model reveal something biologically real — something that predicts outcomes in a wet lab? Not just rediscover known science, but find a signal that classical metrics miss, and prove it matters by measuring its effect on actual protein production?

2. Background — What You Need to Know

2a. DNA to Protein (The Central Dogma)

DNA is a string of four letters: A, T, C, G. The cell reads DNA three letters at a time. Each triplet is called a codon. There are 64 possible codons (4³), but they map to only 20 amino acids (the building blocks of proteins) plus a stop signal.

This means the genetic code is redundant. Multiple codons encode the same amino acid. For example, the amino acid Alanine can be encoded by four different codons: GCT, GCC, GCA, GCG. All four produce the identical protein, but they are not equivalent for protein production.

Synonymous variants are DNA sequences that encode the exact same protein but use different codon "spellings." They are the cleanest possible test case: any difference in behavior must come from the DNA sequence itself, not the protein.

Variant A (uses E. coli's preferred codons): ATG | GCT | GAA | GGT | CCG | AAA | GAT ... ↓ ↓ ↓ ↓ ↓ ↓ ↓ M A E G P K D

Variant B (different spelling, identical protein): ATG | GCA | GAG | GGC | CCA | AAG | GAC ... ↓ ↓ ↓ ↓ ↓ ↓ ↓ M A E G P K D

Same protein. Different DNA. Potentially very different expression levels.

2b. Protein Expression

Expression means: does the DNA successfully produce protein, and how much? When scientists want to make a protein, they put DNA instructions into a biological system and try to produce it. Expression routinely fails, even when the protein design is correct. The DNA "spelling" matters enormously.

Two classical metrics dominate expression prediction:

CAI (Codon Adaptation Index): Measures how well the sequence uses E. coli's preferred codons. Higher CAI generally means faster, more efficient translation.
MFE (Minimum Free Energy): Measures whether the mRNA folds into physical knots at its 5' end. More negative MFE = tighter knots = the ribosome (the protein-making machine) cannot land = less protein. An unfolded 5' end (MFE near 0) is good.

Cell-free expression produces protein in a test tube rather than in living cells. Purified ribosomes, amino acids, and energy molecules are combined, and they build protein directly from a DNA template. Results in hours, not days. Ginkgo Bioworks offers this as a cloud service: you upload a DNA sequence, their robotic lab runs the reaction, and you get back a concentration (nM) of how much protein was produced.

2c. The AI Model (NTv3)

NTv3-650M is a DNA foundation model developed by InstaDeep, trained on billions of DNA sequences. It processes individual nucleotides (A, T, C, G) — a finer granularity than codons. We attached a linear probe to NTv3's internal representations and trained it to predict protein expression from published datasets.

On synonymous variant expression prediction, NTv3 achieves AUROC 0.822, compared to 0.725 for the best classical metric (XGBoost on hand-crafted features). The model has learned something real about what makes DNA express well — though, as we found, most of that knowledge overlaps with what classical metrics already capture.

2d. Sparse Autoencoders (SAEs)

A Sparse Autoencoder decomposes a neural network's internal activations into a dictionary of interpretable features. Each feature ideally corresponds to one concept the model has learned. Think of it as an MRI for the AI: instead of a single dense activation vector, you get thousands of individually meaningful signals.

We trained a BatchTopK SAE on NTv3's layer 6 activations, producing 12,288 features (expansion factor 8, k=128). The goal: find features that correspond to biological mechanisms the model learned from raw DNA sequences — and determine whether any of them capture something beyond what classical metrics already measure.

3. What We Found Computationally

3a. Most features = known biology

When mean-pooled across the full sequence, the dominant SAE features are redundant with CAI — the simplest classical metric. The partial correlation of mean-pooled SAE features with expression after controlling for CAI and GC content is 0.024, essentially zero. The SAE correctly decomposes the model's knowledge, but at the whole-sequence level, that knowledge is mostly what biologists have understood since the 1980s.

This is both expected and a validation: the SAE recovers known biology cleanly. 617 features show period-3 firing patterns (oscillating every 3 nucleotides), meaning the model learned the triplet structure of the genetic code without ever being told proteins exist.

3b. 116 features generalize across proteins

We screened all 12,288 features across two independent datasets: Nieuwkoop (1,459 mRFP synonymous variants) and Cambray (11,421 sequences across 56 different proteins). After controlling for all 64 codon frequencies, 116 features still predict expression on both datasets. All fire at the 5' end (the translation initiation site, first ~7 codons). All are Bonferroni-significant on both datasets independently.

3c. The dominant signal is mRNA structure

ViennaRNA MFE_30 (the predicted folding energy of the first 30 nucleotides) has r = +0.642 with expression on the Nieuwkoop dataset. Many of the 116 SAE features encode this structural information. The model taught itself thermodynamics from sequence data alone — an impressive feat of representation learning, though not a new biological discovery.

3d. Two features resist simple explanation

Feature 5984 fires at codon position 0 (the start codon and its immediate context). It is negatively correlated with expression in both training datasets: partial r = -0.27 on Nieuwkoop, -0.11 on Cambray, after controlling for all 64 codon frequencies. About 80% of its variance can be explained by 5-mer nucleotide frequencies, but the remaining 20% cannot.

Feature 2139 fires at codon position 1. It is positively correlated with expression: partial r = +0.25 on Nieuwkoop, +0.10 on Cambray. Only 53% of its variance is explained by 5-mers. 47% remains unexplained.

Critically, both features survive controls for dinucleotide frequencies and MFE. And nonlinear models (Random Forest, Gradient Boosting) trained on 5-mer frequencies do not close the gap. Whatever these features encode, it is not a nonlinear function of local sequence statistics.

4. The Experiment Design

4a. The core idea

Design DNA sequences where the SAE features and classical metrics disagree. If the features correctly predict expression when classical metrics say "these should be equivalent," we have evidence that the model captures something real beyond known biology. If the features are wrong, we learn the signal is too weak or too noisy to matter.

4b. The red team process

Our original matched pairs had fatal confounds: GC content mismatch at the 5' end, Shine-Dalgarno motif count mismatch, and other classical differences that could explain any expression difference without invoking the SAE features. We subjected the panel to adversarial review, identified these problems, then generated 2,500 candidate synonymous variants and re-selected pairs with strict matching criteria.

The final v3 matched pairs match on: CAI (±0.011), MFE (±0.6 kcal/mol), GC at the 5' end (identical at 0.500), and SD motif count (±1). Any expression difference between these pairs cannot be attributed to classical metrics.

4c. The 14 constructs

All constructs encode the same mRFP protein (225 amino acids), except the SD dose-response group (231-aa variant with additional N-terminal residues for SD motif engineering). Only the DNA spelling differs. Each construct was tested with 12 replicates via Ginkgo's cell-free protein expression service with HiBiT quantification.

Construct	Group	CAI	MFE	Conc. (nM)	Purpose
FULL-CAI-MAX	Control	0.954	-1.7	273.0	Maximum codon optimization
SD-09-hiCAI	SD dose	0.989	-1.7	269.9	9 internal SD motifs
SD-25-hiCAI	SD dose	0.947	-1.7	246.0	25 internal SD motifs
CTRL-BEST	Control	0.607	0.0	231.9	Natural best expresser
RANDOM-BASELINE-2	Random	0.563	-6.4	225.9	Random codon choice
SD-13-hiCAI	SD dose	1.000	-1.7	218.7	13 internal SD motifs
SD-21-hiCAI	SD dose	0.967	-1.7	204.8	21 internal SD motifs
NTv3-MIN-goodCAI	Oracle	0.589	0.0	204.2	Model pessimistic, classical OK
SD-17-hiCAI	SD dose	0.986	-1.7	186.4	17 internal SD motifs
F5984-MATCHED-HIGH	Matched	0.554	-1.7	146.9	Feature fires (0.000363)
F5984-MATCHED-LOW	Matched	0.565	-2.3	97.2	Feature silent (0.000)
NTv3-MAX-badCAI	Oracle	0.363	-3.4	0.0	Model loved it, CAI = 0.36
RANDOM-BASELINE-1	Random	0.571	-5.6	0.0	Random codon choice
CTRL-WORST	Control	0.581	-10.6	0.0	Natural worst expresser

5. Results

5a. Controls work

273 nM

FULL-CAI-MAX

232 nM

CTRL-BEST

0 nM

CTRL-WORST

4.2%

Average CV

Control Constructs: Expression (nM)

The assay has clear dynamic range. The codon-optimized maximum (FULL-CAI-MAX) produces 273 nM of protein. The natural best expresser from the Nieuwkoop dataset (CTRL-BEST) produces 232 nM. The natural worst expresser (CTRL-WORST) produces nothing. These are synonymous variants — identical protein, different DNA — and the expression difference is 273x (or more precisely, infinity, since CTRL-WORST is zero). Codon choice matters enormously. The average coefficient of variation across all constructs is 4.2%, indicating excellent reproducibility.

5b. The Feature 5984 Matched Pair

Feature 5984 Matched Pair: Expression with Individual Replicates

Metric	HIGH (feature fires)	LOW (feature silent)	Delta
CAI	0.554	0.565	0.011
MFE_30 (kcal/mol)	-1.7	-2.3	0.6
GC at 5' end	0.500	0.500	0.000
SD motif count	15	14	1
NTv3 probe score	6.6	6.6	~0.0
Feature 5984 activation	0.000363	0.000	fires vs. silent
Expression	146.9 ± 5.7 nM	97.2 ± 4.4 nM	1.51x

Key finding: The SAE feature captures a real sequence property that produces a large, highly significant expression difference (p = 2.6 × 10^-17) between constructs matched on ALL classical metrics. These two sequences encode the same protein, have nearly identical CAI, MFE, GC content, SD motif count, and even the same NTv3 probe score. The only designed difference is whether Feature 5984 fires — and the construct where it fires produces 1.51x more protein.

The twist: Feature 5984 is negatively correlated with expression in both training datasets (partial r = -0.27 on Nieuwkoop, -0.11 on Cambray). In the training data, which comes from in-vivo E. coli experiments, high activation = worse expression. But in the cell-free wet lab, high activation = BETTER expression. The direction flipped.

Why might this happen? A cell-free expression system is fundamentally different from a living cell. There are no competing mRNAs fighting for ribosomes. The tRNA pools are different (purified, not naturally regulated). There is no mRNA degradation machinery. There is no protein quality control. The feature detects a genuine sequence property, but that property's effect on expression depends on the biological context it operates in.

5c. The NTv3 Oracle

NTv3 Oracle: Model Prediction vs. Wet Lab Reality

NTv3-MAX-badCAI is the sequence the model loved most — probe score 12.6, the highest confidence prediction. But it has CAI = 0.36, far below what E. coli can tolerate. Result: 0 nM. Complete failure. Zero protein.

NTv3-MIN-goodCAI is the sequence the model was most pessimistic about among those with acceptable classical metrics. Probe score 6.6, CAI = 0.59. Result: 204 nM. Perfectly fine.

The probe, trained on in-vivo E. coli expression data, does not transfer to cell-free expression at extreme CAI values. Classical metrics (particularly CAI) remain the dominant predictor in this assay. The model's internal representation captured real patterns, but the probe mapping those patterns to expression was calibrated for a different biological context.

5d. SD Dose-Response: A Deep Dive

Background: What are Shine-Dalgarno motifs?

In bacteria like E. coli, the ribosome (the molecular machine that reads mRNA and builds proteins) needs to find the right starting position on an mRNA molecule. It does this by recognizing a specific sequence called the Shine-Dalgarno (SD) sequence — typically GGAGG or close variants — located a few nucleotides upstream of the start codon (AUG). The SD sequence base-pairs with a complementary region on the ribosome's 16S rRNA, anchoring the ribosome in the correct position to begin translation.

This is the intended use of SD sequences. But here's the problem: the same motif can appear inside the coding region of a gene, purely by accident of codon choice. When the ribosome encounters an internal SD-like sequence while translating, it can potentially:

Pause or stall — the ribosome's 16S rRNA transiently binds the internal SD, slowing elongation
Cause frameshifting — the ribosome slips to a new reading frame, producing a garbled protein
Recruit a second ribosome — an internal SD could serve as a spurious translation initiation site, producing a truncated protein from the middle of the gene

For this reason, natural highly-expressed E. coli genes tend to avoid internal SD motifs. But here's the confound that makes this hard to study: genes with high CAI (good codons) also tend to have few internal SD motifs, simply because the most common E. coli codons don't form SD-like patterns. In natural genomes, high CAI and low SD count always co-occur. Nobody has been able to cleanly separate their effects.

Our experiment: breaking the confound

We designed five mRFP synonymous variants that hold CAI near-maximal (0.947–1.000) while varying the number of internal SD motifs from 9 to 25. All five have identical 5′ regions (same MFE = -1.7 kcal/mol) and encode the same protein. The only designed variable is SD count in the gene body.

An important biological constraint: mRFP contains 7 Glutamate-Glycine (EG) dipeptides where every possible synonymous codon combination creates an SD-like 5-mer. For example, Glutamate can be encoded by GAA or GAG, and Glycine by GGT, GGC, GGA, or GGG. Every combination of E+G codons produces GAA|GGx = GAAGG or GAG|GGx = GAGGG, both of which are SD-like motifs. So the minimum achievable SD count for mRFP is 9, not 0.

The five SD motif types we counted: GGAGG (the canonical SD), GAAGG, GAGGG, GGAAG, and AGGAG. To increase SD count, we swapped synonymous codons at positions where the swap creates a new SD motif across the codon boundary, while minimizing CAI loss.

Construct	SD Count	CAI	MFE_30	Codons Changed	Expression (nM)
SD-09-hiCAI	9	0.989	-1.7	4	269.9 ± 10.9
SD-13-hiCAI	13	1.000	-1.7	0 (max-CAI baseline)	218.7 ± 6.3
SD-17-hiCAI	17	0.986	-1.7	3	186.4 ± 6.9
SD-21-hiCAI	21	0.967	-1.7	7	204.8 ± 12.7
SD-25-hiCAI	25	0.947	-1.7	11	246.0 ± 20.4

Hypothesis: more SD motifs = less expression

The simple prediction: if internal SD motifs stall the ribosome, then adding more of them should monotonically reduce expression. SD-09 should express the most, SD-25 the least. The effect should be independent of CAI (which is held near-constant) and independent of 5′ structure (which is identical).

Result: a U-shaped curve

Internal Shine-Dalgarno Motifs vs. Expression

The result is decidedly not a simple monotonic decline. Expression drops from 270 nM at 9 SDs to a minimum of 186 nM at 17 SDs — a meaningful 31% reduction. But then it rises back to 246 nM at 25 SDs. The overall shape is a U-curve that bottoms out at 17 SD motifs.

Key numbers: The drop from SD-09 to SD-17 is 83.5 nM (31%). The recovery from SD-17 to SD-25 is 59.6 nM (32% increase from the trough). The overall difference between the minimum (SD-17, 186 nM) and maximum (SD-09, 270 nM) is 1.45x. All measurements have CVs of 2.9–8.3%, so the differences are well above noise.

Why the U-shape? Possible explanations

1. Different SD motif types have different strengths. At lower SD counts (9–13), the motifs are predominantly GAAGG and GGAAG — these arise from the unavoidable EG dipeptides and the max-CAI codon choices. To reach higher counts (17+), the design algorithm introduced GGAGG (the canonical, strongest SD) and AGGAG. But to reach 21–25, it may have used weaker motif variants or placed them in less disruptive positions. The motifs are not all equal — GGAGG binds ribosomes ~10x more strongly than GAAGG. The SD-17 construct may have hit a "sweet spot" of strong motifs in disruptive positions.

2. Ribosome queuing / traffic effects. At moderate SD density, individual internal SDs can stall a translating ribosome long enough to cause problems. But at very high density, stalled ribosomes may physically block upstream ribosomes from reaching the next SD motif, paradoxically reducing the per-motif stalling effect. Think of it like traffic: a few red lights slow you down, but if every block has a red light, cars never get enough speed to be affected by the next one.

3. CAI confound (partial). CAI decreases from 1.000 (SD-13) to 0.947 (SD-25) as more codon swaps are needed to introduce SDs. However, the correlation between CAI and expression across these 5 points is only r = -0.14 (essentially zero). SD-13 has perfect CAI (1.000) yet expresses less than SD-09 (CAI = 0.989). And SD-25 has the worst CAI (0.947) yet is the second-highest expresser. CAI does not explain the U-shape.

4. mRNA secondary structure changes. Although all constructs have the same MFE_30 (5′ region is fixed), the codon swaps in the gene body could alter internal mRNA structure. SD-rich sequences tend to be purine-rich (lots of G and A), which could reduce internal base pairing and create a more open mRNA. More open internal structure might improve ribosome processivity, partially offsetting the SD stalling effect at high counts.

What this means

The SD dose-response demonstrates three things:

Internal SD motifs do affect cell-free expression — the 31% drop from SD-09 to SD-17 is substantial and occurs despite near-identical CAI. This is the first demonstration of this effect in a cell-free system with controlled CAI.
The relationship is not simple. A naive model of "each SD reduces expression by X%" is wrong. The effect depends on motif type, position, density, and likely interactions between these factors. This is more complex biology than what was previously assumed.
This experiment was independent of the SAE/probe work. The SD dose-response was designed based on biological reasoning about ribosome dynamics, not from SAE features or NTv3 predictions. It's a clean biology experiment that stands on its own. The SAE features helped identify that NTv3 tracks SD motifs internally (16 features overlap between "expression-predictive" and "SD-tracking" feature sets, p = 1.1 × 10^-16), but a biologist could have designed this experiment without any AI.

What the literature says (and doesn't)

The effect of internal SD motifs on expression is the subject of an active scientific debate. The landmark paper by Li, Oh & Weissman (2012, Nature) used ribosome profiling in E. coli to show that SD-like hexamers within coding sequences cause pervasive translational pausing, with ~70% of strong pauses associated with internal SD motifs. However, Mohammad et al. (2019, eLife) substantially challenged this finding, showing that Li et al.'s protocol selectively isolated long ribosome-protected fragments, and that SD-like sequences produce longer footprints due to extra 16S rRNA protection — creating a sampling artifact. In libraries capturing the full distribution of footprint sizes, no SD-induced pauses were observed.

What is well-established is the genomic signature: highly expressed genes contain fewer internal SD motifs than expected (Yang et al. 2016, G3; analysis of 187 bacterial genomes, p < 10^-18). But whether this depletion is due to pausing, aberrant internal initiation, or some other mechanism remains unresolved.

Our literature review found three gaps our experiment addresses:

No prior study has varied SD count while holding CAI constant. In natural genomes, high CAI and low SD always co-occur. Ours is the first controlled deconfounding.
No prior study has tested internal SD effects in a cell-free (PURE-like) system. All prior work is in vivo, where transcription-translation coupling, mRNA degradation, and ribosome competition add confounds.
No prior study has reported a non-monotonic / U-shaped SD dose-response. All published relationships are monotonically negative or binary (SD present vs. absent).

Our cell-free result is relevant to the Li vs. Mohammad debate: if internal SDs reduce expression in a reconstituted system (where many in-vivo confounds are absent), that provides independent evidence the effect is real and not purely a ribosome profiling artifact. The 31% drop from SD-09 to SD-17 at matched CAI supports a genuine functional effect of internal SD motifs on translation.

Honest note: With only 5 data points, we cannot definitively distinguish a U-shape from noise around a weak downward trend. The non-monotonic pattern is reproducible within each construct's 12 replicates (tight CVs), but fitting a curve to 5 points is inherently fragile. A follow-up with 15–20 SD levels and finer spacing would be needed to confirm the U-shape and map the biology in detail. The 31% drop from 9 to 17 SDs is the robust finding; the recovery at 25 SDs is intriguing but needs replication.

5e. Random Baselines

Random Synonymous Variants: Expression (nM)

RANDOM-BASELINE-1: 0 nM (no protein at all). RANDOM-BASELINE-2: 226 nM (expressed well). A randomly chosen synonymous variant of mRFP has roughly a coin-flip chance of expressing any protein at all. This underscores the scale of the codon optimization problem: the DNA spelling has MASSIVE effects on protein production, and random choices are unreliable.

6. What This Means

6a. The SAE feature found something real

A specific, interpretable feature from the SAE decomposition of NTv3 identifies a sequence property that:

Generalizes across two independent datasets and 56 different proteins
Survives controls for all 64 codon frequencies, dinucleotides, and mRNA folding energy
Resists explanation by both linear and nonlinear models of local sequence statistics
Produces a 1.51x expression difference between otherwise-matched constructs in a wet lab (p = 2.6 × 10^-17)

6b. But the sign flipped

The feature's direction of effect reversed between the training context (in-vivo E. coli) and the test context (cell-free expression). This means:

The feature detects a real biological property of the DNA sequence
But that property's effect on expression is context-dependent
A cell-free system is fundamentally different from a living cell: no competing mRNAs, different tRNA pools, no mRNA degradation, no protein quality control
This is both a limitation and a finding — it tells us something about what differs between expression systems

6c. The model doesn't transfer at extremes

The NTv3 probe (trained on in-vivo data) completely fails when pushed to extreme CAI values in a cell-free system. The model's highest-confidence prediction produced zero protein. Classical metrics — particularly CAI — remain the dominant predictor in this assay. This is a cautionary tale about domain transfer: a model can learn genuine patterns and still fail when the deployment context differs from the training context.

6d. SD motifs have complex, non-linear effects

The SD dose-response produced the experiment's most biologically interesting result. Internal SD motifs reduce expression at moderate density (31% drop from 9 to 17 motifs at matched CAI), confirming that these ribosome-confusing sequences matter even in a cell-free system. But the U-shaped recovery at high density (25 motifs expressing nearly as well as 9) was unexpected and suggests the effect is far more complex than "more SD = worse." This non-linearity — potentially driven by motif type differences, ribosome queuing, or compensatory structural effects — represents genuinely novel biology that warrants follow-up with finer-grained dose-response curves.

7. Honest Assessment

What We Can Claim

SAE decomposition of a DNA foundation model identified a feature that captures a genuine biological signal invisible to classical metrics, validated by a 1.51x expression difference in a wet lab (p = 2.6 × 10^-17)
The direction of effect is assay-dependent, highlighting context-dependence of sequence features across expression systems

What We Cannot Claim

We did NOT discover new biology — we do not know what the feature detects mechanistically
We did NOT show that SAE features improve expression prediction (the sign was wrong for this assay)
We did NOT show that NTv3 outperforms classical metrics in cell-free (it does not, especially at extremes)
N=14 constructs is a pilot, not a definitive study

What is genuinely novel

First wet-lab validation of SAE features from a DNA foundation model. SAEs have been applied to language models extensively, but this is (to our knowledge) the first time an SAE feature from a biological foundation model has been tested in a physical experiment.
First evidence that SAE features capture real, non-classical sequence properties that affect expression. The feature survived every statistical control we could apply, and the wet-lab result confirmed it detects something real.
First demonstration of context-dependent feature effects across expression systems. The sign flip between in-vivo and cell-free is a finding in itself — it constrains what the feature might be detecting.
The methodology is reusable. SAE decomposition → cross-dataset feature selection → matched pair design → wet lab validation is a general framework applicable to any foundation model in any domain where physical validation is possible.

8. Methods Summary

Model and SAE

NTv3-650M-pre (InstaDeep), layer 6 activations. BatchTopK SAE with 12,288 features (expansion factor 8, k=128). Trained on activations from diverse DNA sequences.

Cross-dataset feature screen

All 12,288 features screened for partial correlation with expression after controlling for all 64 codon frequencies. Required Bonferroni significance on both the Nieuwkoop dataset (1,459 mRFP synonymous variants) AND the Cambray dataset (11,421 sequences across 56 proteins). 116 features passed.

Matched pair selection

2,500 candidate synonymous mRFP variants generated. Matched pairs selected with strict criteria: CAI ±0.02, MFE ±1.0 kcal/mol, GC at 5' end ±0.05, SD motif count ±2. The Feature 5984 pair achieved CAI ±0.011, MFE ±0.6, identical GC_5prime at 0.500, SD count ±1.

Wet lab

Ginkgo Bioworks Cell-Free Protein Expression (CFPE) service. HiBiT quantification (C-terminal tag, luminescence readout). 12 replicates per construct. 14 constructs total. All constructs encode mRFP (225 aa, except SD series at 231 aa).

Cost

Total experiment cost: $1,638.