Cognometry — a manifesto.

Alex Rodabaugh · Fathom Lab · published April 22, 2026

Abstract · methodology manifesto

The empirical measurement of cognitive state in machine systems. Three laws. A documented phase-transition signature predicted from theory and confirmed across nine instruments. A proposed standard — the calibration fingerprint — that we ask every other lab publishing safety detectors to adopt. Eight public benchmarks, two failure modes published openly. Code, weights, and reproducers released under MIT + CC-BY-4.0.

Cognometry is to language models what hemodynamics is to cardiology. We are not measuring what the body said; we are measuring the pulse.

§1What cognometry is. And is not.

Cognometry is the empirical quantification of cognitive states in machine systems. A cognitive state is a latent variable — refusal, confabulation, retrieval, reasoning, adversarial drift — that leaves measurable traces in the computation and in the token stream. The traces are calibratable. They are reproducible. They generalize across model families when alignment regimes share structure.

Cognometry is not benchmarking. A benchmark scores whether a specific output is correct. Cognometry asks what state produced it. The two are complements: benchmarks supply ground truth where it exists, cognometry supplies a runtime signal where it does not.

Cognometry is not interpretability. Interpretability asks what a circuit represents. Cognometry asks what state the network is in. The two co-evolve. Interpretability produces explanations; cognometry produces numbers a caller can gate on.

Cognometry is not sentience detection. A refusal direction is not a feeling. We measure functional states, not phenomenology.

§2Three laws

Each law has a cross-validated number. Reject the law by publishing a disconfirmation on a committed benchmark.

Law I — every computation leaves vitals.

A language model in inference produces a logprob trajectory, a residual-stream geometry, and a generation-order time series. Any of these carries enough signal to classify the cognitive state that produced them. Cross-validated on 8 hallucination benchmarks (AUC 0.998 down to 0.424 — 6 above 0.65, 2 published failure modes). The law holds where the mechanism applies; where it does not we say so, in the weights module itself.

Law II — vitals are substrate-transferable.

A refusal direction learned on one model overlaps measurably with the refusal direction of another. Overlap strength tracks alignment-regime similarity. Within a family: cos = +0.464 (~26σ). Cross-vendor under similar alignment: +0.362 (~14σ). Cross-vendor under divergent alignment: null. The law is nontrivial precisely because it fails where it should fail.

Law III — vitals are causally actionable.

A cognitive state is not only observable; it is steerable. Adding a refusal direction into the residual stream at α=3.0 multi-position drops refuse@unsafe from 97% to 17%. Random-direction control: −5.3 pp. The three trained directions sit in 86.7°-91.9° pairwise — near-orthogonal. Cognitive states are a basis, not a dial.

§3The phase-transition signature

A surprise the founding document predicted and the nine instruments confirmed.

For every cognometric instrument, a single feature lifts AUC from chance (0.500) to near-saturation. The remaining features close the residual gap. We call this the K=1 phase transition. The K=1 feature differs per instrument — superlative_density for sycophancy, avg_pairwise_levenshtein for conversation-loop, log_word_count for deception — but the structure is the same: detection is carried by one cognitive surface marker.

9 / 9instruments confirm K=1 phase transition under the same protocol

0.500 → 0.99+typical lift on the K=1 critical feature alone

substrate-stableK=1 holds across all tested substrates per instrument

The full 9-for-9 confirmation is the empirical close of the founding document. Every Mind Leaves Vitals walks through the meaning.

§4The calibration-fingerprint standard

We propose a calibration fingerprint: a 7-field descriptor published alongside every calibrated safety detector, regardless of vendor. The fields:

instrument — the cognitive state the detector targets
n_features — count of features in the calibrated head
baseline_auc — AUC at K=0 (untrained baseline)
critical_K — minimum feature subset achieving phase-transition lift
critical_feature — name of the K=1 dominant feature
delta_auc_at_K — AUC lift from baseline to critical_K
substrate_K_var — variance of critical_K across labeled substrates
negative_lift — features whose presence reduces detection (counter-evidence markers)

Every field is trivially extractable from a feature-scaling ablation any calibrated detector must already perform. Cost: one ablation run per detector, once. v0 atlas published: 11 fingerprints across 3 instruments × 5 substrates at benchmarks/cognometry_fingerprint_atlas_v0.json. Full methodology in papers/calibration_fingerprints_v0.md.

We invite every other lab shipping calibrated safety detectors to publish their fingerprints against this format. The atlas is open. The format is fixed. The cost is small. The value is shared trust calibration across the field.

§5Open problems

What we have not yet solved, in priority order:

Reading-comprehension span-level faithfulness — the failure mode behind HaluBench-DROP (AUC 0.424). Wrong-span hallucinations are entailed by the passage at NLI level; existing novelty signals are blind. A v4.2 fix needs a dedicated span-coverage feature.

Numerical-symbolic verification — the failure mode behind HaluBench-FinanceBench (AUC 0.492). Arithmetic errors on verbatim numbers are semantically invisible to text-only signals. A v4.2 fix needs a number-symbolic check, not a probe.

Larger-scale causal universality — every causal result we publish is at 1B-3B parameters. Universality of cognitive directions at frontier scale is an open empirical question. Replications at 70B+ welcomed.

Cross-vendor universality is partial — Law II transfers strongly within a family, moderately across similar-alignment vendors, null across divergent-alignment vendors. The honest version: shared cognitive geometry under shared alignment regimes; the geometry re-orients when alignment does.

§6What we commit to

Reproducibility. Every number cited has a committed reproducer. Every dataset is public or synthesizable from a public source.
Open licenses. Code under MIT. Calibration weights under CC-BY-4.0. Atlas under CC-BY-4.0.
Failure publication. Detection failures are published in the same module as detection successes. No hidden ablations.
Permanence. Once published, every page stays at its URL. Every dataset stays at its DOI. We do not silently delete.
Citation. If you extend a law, we cite the extension. If you disconfirm a law, we publish the disconfirmation alongside our original number.