· cognitive measurement instruments for transformer internals ·
pre-registered · 6 of 6 families · h1 supported
measure what models think
the first pre-registered cross-architecture replication in
mechanistic interpretability. six model families, sealed in git commit
e32cc7593 minutes before any data was captured,
all three sealed decision conditions passed.
We ran twelve open-weight captures (six families × base/instruct) on a
fixed 90-prompt probe set, sealed the decision rule in git before any
data was captured, then applied it without modification. All three
conditions passed.
┌─pre-registration seal────────────────────────────────────┐│││▲commite32cc75││when2026-04-10 14:57:52 -0400││mirrorosf.io/wtkzg││││══════════ 93 min wall-clock gap ══════════││││▼commit01969cb││when2026-04-10 16:30:28 -0400││data12 captures · n=6 families · probe v0.1│││└───────────────────────────────────────────────────────────┘
The v0.3 decision rule was committed to git as e32cc75
at 14:57:52 ET on 2026-04-10, mirrored on OSF at
osf.io/wtkzg. The first v0.3 capture landed at
16:30:28 ET as 01969cb — a
93-minute gap that anyone can verify from the git
history. No field in the decision rule was touched after data
collection.
02 · H1 supported · all three sealed conditions passed
The sealed primary measurement was the entropy early-window
leave-one-out cosine at n≥5 families. Observed:
mean LOO cosine +0.769 (threshold ≥0.40),
permutation p = 0.0315 (threshold <0.05),
bootstrap 95% CI [+0.571, +0.869]
(lower bound >0). All 6 of 6 families show positive LOO cosine.
Verdict: H1 SUPPORTED.
03 · D = cos(h(L), wyₜ) · architecturally universal
The atlas uses an SAE-free measurement primitive:
the cosine between the final-layer residual stream and the unembedding
row of the chosen token. It requires
no SAE, no per-model training, and is well-defined
on any transformer with an explicit unembedding. One per-token dot
product, portable across architectures, runnable on any model with a
logprob interface (including closed-weight frontier via the entropy
bridge at r = 0.902 shape correlation).
04 · physics grounding · S = M × IPR
The commitment intensity S is not an ad-hoc formula.
It is mathematically exactly the inverse participation
ratio of the coherence event distribution — a
seventy-year-old construct from condensed-matter physics (Anderson
1958, Edwards-Thouless, random matrix theory). Verified to machine
precision on real trajectories. Explains why the ratio form
is specific and why alternative formulas (max alone, mean alone)
fail.
28 assertions · runs in under a minute · no GPU needed
Every numerical claim in the paper is anchored to a committed JSON
file. A reproducibility script walks every claim and fails loud if
any number drifts.
01 · clone the repo
$ git clone https://github.com/heyzoos123-blip/fathom
$ cd fathom
02 · inspect the sealed pre-reg commit
$ git show e32cc75 atlas/PREREG_v0.3_attractor_replication.md
# commit author : darkflobi <darkflobi@darkcity.wtf>
# commit date : 2026-04-10 14:57:52 -0400
# verdict sealed: H1 if mean LOO cos ≥ 0.40
# AND perm p < 0.05
# AND bootstrap CI lower > 0
03 · run the audit
$ python atlas/verify_all_claims.py
# running 28 assertions against committed JSONs ...
# [ok] mean LOO cosine = +0.7691 ≥ 0.40
# [ok] permutation p = 0.0315 < 0.05
# [ok] bootstrap CI low = +0.5708 > 0
# [ok] 6 / 6 families positive
# [ok] prereg commit = e32cc75
# ...
# 28 / 28 PASSED · 0.43 s
single-instrument validation · n = 200 TruthfulQA items
Beyond the cross-architecture replication, the SAE-derived commitment
intensity Searly beats every standard
uncertainty baseline on the same sample, same model, same labels.
signal
AUC
p-value
source
Searly (ours)
0.663
0.013
SAE coherence
logit entropy (max)
0.607
0.053
standard
logit entropy (mean)
0.596
0.133
standard
logprob (mean)
0.559
0.291
standard
top-2 margin
0.477
0.624
standard
Same 200 TruthfulQA items, Gemma-2-2B-IT, same labels.
Searly is the only feature reaching conventional significance.
Correlation with logit entropy: r = −0.17 (nearly independent signals).
Cross-dataset meta-effect pooled d = +0.494,
Fisher combined p = 0.0008.