We decomposed a DNA foundation model into 12,288 features, found one that classical metrics couldn't explain, and sent 14 DNA sequences to a wet lab to test whether it was real.
Can mechanistic interpretability of a DNA foundation model reveal something biologically real — something that predicts outcomes in a wet lab? Not just rediscover known science, but find a signal that classical metrics miss, and prove it matters by measuring its effect on actual protein production?
DNA is a string of four letters: A, T, C, G. The cell reads DNA three letters at a time. Each triplet is called a codon. There are 64 possible codons (43), but they map to only 20 amino acids (the building blocks of proteins) plus a stop signal.
This means the genetic code is redundant. Multiple codons encode the same amino acid. For example, the amino acid Alanine can be encoded by four different codons: GCT, GCC, GCA, GCG. All four produce the identical protein, but they are not equivalent for protein production.
Synonymous variants are DNA sequences that encode the exact same protein but use different codon "spellings." They are the cleanest possible test case: any difference in behavior must come from the DNA sequence itself, not the protein.
Same protein. Different DNA. Potentially very different expression levels.
Expression means: does the DNA successfully produce protein, and how much? When scientists want to make a protein, they put DNA instructions into a biological system and try to produce it. Expression routinely fails, even when the protein design is correct. The DNA "spelling" matters enormously.
Two classical metrics dominate expression prediction:
Cell-free expression produces protein in a test tube rather than in living cells. Purified ribosomes, amino acids, and energy molecules are combined, and they build protein directly from a DNA template. Results in hours, not days. Ginkgo Bioworks offers this as a cloud service: you upload a DNA sequence, their robotic lab runs the reaction, and you get back a concentration (nM) of how much protein was produced.
NTv3-650M is a DNA foundation model developed by InstaDeep, trained on billions of DNA sequences. It processes individual nucleotides (A, T, C, G) — a finer granularity than codons. We attached a linear probe to NTv3's internal representations and trained it to predict protein expression from published datasets.
On synonymous variant expression prediction, NTv3 achieves AUROC 0.822, compared to 0.725 for the best classical metric (XGBoost on hand-crafted features). The model has learned something real about what makes DNA express well — though, as we found, most of that knowledge overlaps with what classical metrics already capture.
A Sparse Autoencoder decomposes a neural network's internal activations into a dictionary of interpretable features. Each feature ideally corresponds to one concept the model has learned. Think of it as an MRI for the AI: instead of a single dense activation vector, you get thousands of individually meaningful signals.
We trained a BatchTopK SAE on NTv3's layer 6 activations, producing 12,288 features (expansion factor 8, k=128). The goal: find features that correspond to biological mechanisms the model learned from raw DNA sequences — and determine whether any of them capture something beyond what classical metrics already measure.
When mean-pooled across the full sequence, the dominant SAE features are redundant with CAI — the simplest classical metric. The partial correlation of mean-pooled SAE features with expression after controlling for CAI and GC content is 0.024, essentially zero. The SAE correctly decomposes the model's knowledge, but at the whole-sequence level, that knowledge is mostly what biologists have understood since the 1980s.
This is both expected and a validation: the SAE recovers known biology cleanly. 617 features show period-3 firing patterns (oscillating every 3 nucleotides), meaning the model learned the triplet structure of the genetic code without ever being told proteins exist.
We screened all 12,288 features across two independent datasets: Nieuwkoop (1,459 mRFP synonymous variants) and Cambray (11,421 sequences across 56 different proteins). After controlling for all 64 codon frequencies, 116 features still predict expression on both datasets. All fire at the 5' end (the translation initiation site, first ~7 codons). All are Bonferroni-significant on both datasets independently.
ViennaRNA MFE_30 (the predicted folding energy of the first 30 nucleotides) has r = +0.642 with expression on the Nieuwkoop dataset. Many of the 116 SAE features encode this structural information. The model taught itself thermodynamics from sequence data alone — an impressive feat of representation learning, though not a new biological discovery.
Feature 5984 fires at codon position 0 (the start codon and its immediate context). It is negatively correlated with expression in both training datasets: partial r = -0.27 on Nieuwkoop, -0.11 on Cambray, after controlling for all 64 codon frequencies. About 80% of its variance can be explained by 5-mer nucleotide frequencies, but the remaining 20% cannot.
Feature 2139 fires at codon position 1. It is positively correlated with expression: partial r = +0.25 on Nieuwkoop, +0.10 on Cambray. Only 53% of its variance is explained by 5-mers. 47% remains unexplained.
Critically, both features survive controls for dinucleotide frequencies and MFE. And nonlinear models (Random Forest, Gradient Boosting) trained on 5-mer frequencies do not close the gap. Whatever these features encode, it is not a nonlinear function of local sequence statistics.
Design DNA sequences where the SAE features and classical metrics disagree. If the features correctly predict expression when classical metrics say "these should be equivalent," we have evidence that the model captures something real beyond known biology. If the features are wrong, we learn the signal is too weak or too noisy to matter.
Our original matched pairs had fatal confounds: GC content mismatch at the 5' end, Shine-Dalgarno motif count mismatch, and other classical differences that could explain any expression difference without invoking the SAE features. We subjected the panel to adversarial review, identified these problems, then generated 2,500 candidate synonymous variants and re-selected pairs with strict matching criteria.
The final v3 matched pairs match on: CAI (±0.011), MFE (±0.6 kcal/mol), GC at the 5' end (identical at 0.500), and SD motif count (±1). Any expression difference between these pairs cannot be attributed to classical metrics.
All constructs encode the same mRFP protein (225 amino acids), except the SD dose-response group (231-aa variant with additional N-terminal residues for SD motif engineering). Only the DNA spelling differs. Each construct was tested with 12 replicates via Ginkgo's cell-free protein expression service with HiBiT quantification.
| Construct | Group | CAI | MFE | Conc. (nM) | Purpose |
|---|---|---|---|---|---|
| FULL-CAI-MAX | Control | 0.954 | -1.7 | 273.0 | Maximum codon optimization |
| SD-09-hiCAI | SD dose | 0.989 | -1.7 | 269.9 | 9 internal SD motifs |
| SD-25-hiCAI | SD dose | 0.947 | -1.7 | 246.0 | 25 internal SD motifs |
| CTRL-BEST | Control | 0.607 | 0.0 | 231.9 | Natural best expresser |
| RANDOM-BASELINE-2 | Random | 0.563 | -6.4 | 225.9 | Random codon choice |
| SD-13-hiCAI | SD dose | 1.000 | -1.7 | 218.7 | 13 internal SD motifs |
| SD-21-hiCAI | SD dose | 0.967 | -1.7 | 204.8 | 21 internal SD motifs |
| NTv3-MIN-goodCAI | Oracle | 0.589 | 0.0 | 204.2 | Model pessimistic, classical OK |
| SD-17-hiCAI | SD dose | 0.986 | -1.7 | 186.4 | 17 internal SD motifs |
| F5984-MATCHED-HIGH | Matched | 0.554 | -1.7 | 146.9 | Feature fires (0.000363) |
| F5984-MATCHED-LOW | Matched | 0.565 | -2.3 | 97.2 | Feature silent (0.000) |
| NTv3-MAX-badCAI | Oracle | 0.363 | -3.4 | 0.0 | Model loved it, CAI = 0.36 |
| RANDOM-BASELINE-1 | Random | 0.571 | -5.6 | 0.0 | Random codon choice |
| CTRL-WORST | Control | 0.581 | -10.6 | 0.0 | Natural worst expresser |
The assay has clear dynamic range. The codon-optimized maximum (FULL-CAI-MAX) produces 273 nM of protein. The natural best expresser from the Nieuwkoop dataset (CTRL-BEST) produces 232 nM. The natural worst expresser (CTRL-WORST) produces nothing. These are synonymous variants — identical protein, different DNA — and the expression difference is 273x (or more precisely, infinity, since CTRL-WORST is zero). Codon choice matters enormously. The average coefficient of variation across all constructs is 4.2%, indicating excellent reproducibility.
| Metric | HIGH (feature fires) | LOW (feature silent) | Delta |
|---|---|---|---|
| CAI | 0.554 | 0.565 | 0.011 |
| MFE_30 (kcal/mol) | -1.7 | -2.3 | 0.6 |
| GC at 5' end | 0.500 | 0.500 | 0.000 |
| SD motif count | 15 | 14 | 1 |
| NTv3 probe score | 6.6 | 6.6 | ~0.0 |
| Feature 5984 activation | 0.000363 | 0.000 | fires vs. silent |
| Expression | 146.9 ± 5.7 nM | 97.2 ± 4.4 nM | 1.51x |
Key finding: The SAE feature captures a real sequence property that produces a large, highly significant expression difference (p = 2.6 × 10-17) between constructs matched on ALL classical metrics. These two sequences encode the same protein, have nearly identical CAI, MFE, GC content, SD motif count, and even the same NTv3 probe score. The only designed difference is whether Feature 5984 fires — and the construct where it fires produces 1.51x more protein.
The twist: Feature 5984 is negatively correlated with expression in both training datasets (partial r = -0.27 on Nieuwkoop, -0.11 on Cambray). In the training data, which comes from in-vivo E. coli experiments, high activation = worse expression. But in the cell-free wet lab, high activation = BETTER expression. The direction flipped.
Why might this happen? A cell-free expression system is fundamentally different from a living cell. There are no competing mRNAs fighting for ribosomes. The tRNA pools are different (purified, not naturally regulated). There is no mRNA degradation machinery. There is no protein quality control. The feature detects a genuine sequence property, but that property's effect on expression depends on the biological context it operates in.
NTv3-MAX-badCAI is the sequence the model loved most — probe score 12.6, the highest confidence prediction. But it has CAI = 0.36, far below what E. coli can tolerate. Result: 0 nM. Complete failure. Zero protein.
NTv3-MIN-goodCAI is the sequence the model was most pessimistic about among those with acceptable classical metrics. Probe score 6.6, CAI = 0.59. Result: 204 nM. Perfectly fine.
The probe, trained on in-vivo E. coli expression data, does not transfer to cell-free expression at extreme CAI values. Classical metrics (particularly CAI) remain the dominant predictor in this assay. The model's internal representation captured real patterns, but the probe mapping those patterns to expression was calibrated for a different biological context.
In bacteria like E. coli, the ribosome (the molecular machine that reads mRNA and builds proteins) needs to find the right starting position on an mRNA molecule. It does this by recognizing a specific sequence called the Shine-Dalgarno (SD) sequence — typically GGAGG or close variants — located a few nucleotides upstream of the start codon (AUG). The SD sequence base-pairs with a complementary region on the ribosome's 16S rRNA, anchoring the ribosome in the correct position to begin translation.
This is the intended use of SD sequences. But here's the problem: the same motif can appear inside the coding region of a gene, purely by accident of codon choice. When the ribosome encounters an internal SD-like sequence while translating, it can potentially:
For this reason, natural highly-expressed E. coli genes tend to avoid internal SD motifs. But here's the confound that makes this hard to study: genes with high CAI (good codons) also tend to have few internal SD motifs, simply because the most common E. coli codons don't form SD-like patterns. In natural genomes, high CAI and low SD count always co-occur. Nobody has been able to cleanly separate their effects.
We designed five mRFP synonymous variants that hold CAI near-maximal (0.947–1.000) while varying the number of internal SD motifs from 9 to 25. All five have identical 5′ regions (same MFE = -1.7 kcal/mol) and encode the same protein. The only designed variable is SD count in the gene body.
An important biological constraint: mRFP contains 7 Glutamate-Glycine (EG) dipeptides where every possible synonymous codon combination creates an SD-like 5-mer. For example, Glutamate can be encoded by GAA or GAG, and Glycine by GGT, GGC, GGA, or GGG. Every combination of E+G codons produces GAA|GGx = GAAGG or GAG|GGx = GAGGG, both of which are SD-like motifs. So the minimum achievable SD count for mRFP is 9, not 0.
The five SD motif types we counted: GGAGG (the canonical SD), GAAGG, GAGGG, GGAAG, and AGGAG. To increase SD count, we swapped synonymous codons at positions where the swap creates a new SD motif across the codon boundary, while minimizing CAI loss.
| Construct | SD Count | CAI | MFE_30 | Codons Changed | Expression (nM) |
|---|---|---|---|---|---|
| SD-09-hiCAI | 9 | 0.989 | -1.7 | 4 | 269.9 ± 10.9 |
| SD-13-hiCAI | 13 | 1.000 | -1.7 | 0 (max-CAI baseline) | 218.7 ± 6.3 |
| SD-17-hiCAI | 17 | 0.986 | -1.7 | 3 | 186.4 ± 6.9 |
| SD-21-hiCAI | 21 | 0.967 | -1.7 | 7 | 204.8 ± 12.7 |
| SD-25-hiCAI | 25 | 0.947 | -1.7 | 11 | 246.0 ± 20.4 |
The simple prediction: if internal SD motifs stall the ribosome, then adding more of them should monotonically reduce expression. SD-09 should express the most, SD-25 the least. The effect should be independent of CAI (which is held near-constant) and independent of 5′ structure (which is identical).
The result is decidedly not a simple monotonic decline. Expression drops from 270 nM at 9 SDs to a minimum of 186 nM at 17 SDs — a meaningful 31% reduction. But then it rises back to 246 nM at 25 SDs. The overall shape is a U-curve that bottoms out at 17 SD motifs.
Key numbers: The drop from SD-09 to SD-17 is 83.5 nM (31%). The recovery from SD-17 to SD-25 is 59.6 nM (32% increase from the trough). The overall difference between the minimum (SD-17, 186 nM) and maximum (SD-09, 270 nM) is 1.45x. All measurements have CVs of 2.9–8.3%, so the differences are well above noise.
1. Different SD motif types have different strengths. At lower SD counts (9–13), the motifs are predominantly GAAGG and GGAAG — these arise from the unavoidable EG dipeptides and the max-CAI codon choices. To reach higher counts (17+), the design algorithm introduced GGAGG (the canonical, strongest SD) and AGGAG. But to reach 21–25, it may have used weaker motif variants or placed them in less disruptive positions. The motifs are not all equal — GGAGG binds ribosomes ~10x more strongly than GAAGG. The SD-17 construct may have hit a "sweet spot" of strong motifs in disruptive positions.
2. Ribosome queuing / traffic effects. At moderate SD density, individual internal SDs can stall a translating ribosome long enough to cause problems. But at very high density, stalled ribosomes may physically block upstream ribosomes from reaching the next SD motif, paradoxically reducing the per-motif stalling effect. Think of it like traffic: a few red lights slow you down, but if every block has a red light, cars never get enough speed to be affected by the next one.
3. CAI confound (partial). CAI decreases from 1.000 (SD-13) to 0.947 (SD-25) as more codon swaps are needed to introduce SDs. However, the correlation between CAI and expression across these 5 points is only r = -0.14 (essentially zero). SD-13 has perfect CAI (1.000) yet expresses less than SD-09 (CAI = 0.989). And SD-25 has the worst CAI (0.947) yet is the second-highest expresser. CAI does not explain the U-shape.
4. mRNA secondary structure changes. Although all constructs have the same MFE_30 (5′ region is fixed), the codon swaps in the gene body could alter internal mRNA structure. SD-rich sequences tend to be purine-rich (lots of G and A), which could reduce internal base pairing and create a more open mRNA. More open internal structure might improve ribosome processivity, partially offsetting the SD stalling effect at high counts.
The SD dose-response demonstrates three things:
The effect of internal SD motifs on expression is the subject of an active scientific debate. The landmark paper by Li, Oh & Weissman (2012, Nature) used ribosome profiling in E. coli to show that SD-like hexamers within coding sequences cause pervasive translational pausing, with ~70% of strong pauses associated with internal SD motifs. However, Mohammad et al. (2019, eLife) substantially challenged this finding, showing that Li et al.'s protocol selectively isolated long ribosome-protected fragments, and that SD-like sequences produce longer footprints due to extra 16S rRNA protection — creating a sampling artifact. In libraries capturing the full distribution of footprint sizes, no SD-induced pauses were observed.
What is well-established is the genomic signature: highly expressed genes contain fewer internal SD motifs than expected (Yang et al. 2016, G3; analysis of 187 bacterial genomes, p < 10-18). But whether this depletion is due to pausing, aberrant internal initiation, or some other mechanism remains unresolved.
Our literature review found three gaps our experiment addresses:
Our cell-free result is relevant to the Li vs. Mohammad debate: if internal SDs reduce expression in a reconstituted system (where many in-vivo confounds are absent), that provides independent evidence the effect is real and not purely a ribosome profiling artifact. The 31% drop from SD-09 to SD-17 at matched CAI supports a genuine functional effect of internal SD motifs on translation.
Honest note: With only 5 data points, we cannot definitively distinguish a U-shape from noise around a weak downward trend. The non-monotonic pattern is reproducible within each construct's 12 replicates (tight CVs), but fitting a curve to 5 points is inherently fragile. A follow-up with 15–20 SD levels and finer spacing would be needed to confirm the U-shape and map the biology in detail. The 31% drop from 9 to 17 SDs is the robust finding; the recovery at 25 SDs is intriguing but needs replication.
RANDOM-BASELINE-1: 0 nM (no protein at all). RANDOM-BASELINE-2: 226 nM (expressed well). A randomly chosen synonymous variant of mRFP has roughly a coin-flip chance of expressing any protein at all. This underscores the scale of the codon optimization problem: the DNA spelling has MASSIVE effects on protein production, and random choices are unreliable.
A specific, interpretable feature from the SAE decomposition of NTv3 identifies a sequence property that:
The feature's direction of effect reversed between the training context (in-vivo E. coli) and the test context (cell-free expression). This means:
The NTv3 probe (trained on in-vivo data) completely fails when pushed to extreme CAI values in a cell-free system. The model's highest-confidence prediction produced zero protein. Classical metrics — particularly CAI — remain the dominant predictor in this assay. This is a cautionary tale about domain transfer: a model can learn genuine patterns and still fail when the deployment context differs from the training context.
The SD dose-response produced the experiment's most biologically interesting result. Internal SD motifs reduce expression at moderate density (31% drop from 9 to 17 motifs at matched CAI), confirming that these ribosome-confusing sequences matter even in a cell-free system. But the U-shaped recovery at high density (25 motifs expressing nearly as well as 9) was unexpected and suggests the effect is far more complex than "more SD = worse." This non-linearity — potentially driven by motif type differences, ribosome queuing, or compensatory structural effects — represents genuinely novel biology that warrants follow-up with finer-grained dose-response curves.
NTv3-650M-pre (InstaDeep), layer 6 activations. BatchTopK SAE with 12,288 features (expansion factor 8, k=128). Trained on activations from diverse DNA sequences.
All 12,288 features screened for partial correlation with expression after controlling for all 64 codon frequencies. Required Bonferroni significance on both the Nieuwkoop dataset (1,459 mRFP synonymous variants) AND the Cambray dataset (11,421 sequences across 56 proteins). 116 features passed.
2,500 candidate synonymous mRFP variants generated. Matched pairs selected with strict criteria: CAI ±0.02, MFE ±1.0 kcal/mol, GC at 5' end ±0.05, SD motif count ±2. The Feature 5984 pair achieved CAI ±0.011, MFE ±0.6, identical GC_5prime at 0.500, SD count ±1.
Ginkgo Bioworks Cell-Free Protein Expression (CFPE) service. HiBiT quantification (C-terminal tag, luminescence readout). 12 replicates per construct. 14 constructs total. All constructs encode mRFP (225 aa, except SD series at 231 aa).
Total experiment cost: $1,638.