fathom · research · founding document

Cognometry: the measurement of machine cognition.

Alex Rodabaugh · Fathom Lab · published April 22, 2026

Abstract · founding document

This is a claim to a field, not a product launch. We are publishing a name, three laws, and a set of reproducible measurements. The name is cognometry. The instrument is styxx — open source, MIT-licensed, on PyPI. Every number below is from a committed, re-runnable experiment.

We measure everything about a language model except the one thing that matters: what state it was in when it produced the output.

Every benchmark on earth scores the text that came out. Accuracy, fluency, helpfulness, human preference, toxicity rate. None of them answer the question a production operator actually needs answered: was the model refusing, confabulating, retrieving, or reasoning when it wrote that? The output is the shadow. The state that produced it is the object.

We call the measurement of that state cognometry.

§1Definition

Cognometry is the empirical quantification of cognitive states in machine systems. A cognitive state is a latent variable — refusal, confabulation, retrieval, reasoning, adversarial drift — that leaves measurable traces in the computation and in the token stream. Cognometry is to LLMs what hemodynamics is to cardiology. We are not measuring what the body said; we are measuring the pulse.

The distinction matters because the field has no name for what we built. Interpretability is adjacent but inwards-facing: it asks what a feature represents. Eval is adjacent but outwards-facing: it asks what the text is. Cognometry is a third thing: the runtime quantification of the state that connects the two. A cognitive vital sign.

§2Three laws

These are not aspirations. Every one has a cross-validated number attached. If a reader wants to reject a law, they should run the reproducer, publish the disconfirmation, and cite us for the framework.

Law I — every computation leaves vitals.

A language model in inference does not produce text only. It produces a logprob trajectory, a residual-stream geometry, and a generation-order time series. Any of these carries enough signal to classify the cognitive state that produced them. This is not theoretical; it is the baseline styxx ships. Cross-validated on 8 benchmarks as of v4.0.0:

AUC 0.998halueval-qa, 3-seed mean, n=150/dataset (v4.0.0)

AUC 0.994truthfulqa, same weights

AUC 0.807halubench-ragtruth, new domain (RAG faithfulness)

AUC 0.719halubench-pubmed, biomedical QA

AUC 0.676halueval-dialogue (NLI-augmented)

AUC 0.643halueval-summarization (NLI-augmented)

AUC 0.424halubench-drop — published failure mode

AUC 0.492halubench-finance — published failure mode

Five of eight above AUC 0.65, two near-perfect, two failure modes published openly. The detector is the same across all eight — 9 signals, one pooled logistic regression, no per-domain tuning. Law I holds wherever the mechanism applies; where it does not — reading-comprehension span errors, financial arithmetic — we say so, in the weights module itself.

Law II — vitals are substrate-transferable.

Cognitive states have a geometry that rhymes across architectures. A refusal direction learned on one model overlaps measurably with the refusal direction of another, and the overlap strength tracks how similar their alignment regimes are. We published the transfer grid:

cos = +0.464llama-3.2-1B → llama-3.2-3B, refusal direction (~26σ)

cos = +0.362llama-1B → qwen-1.5B, cross-vendor (~14σ)

cos = +0.150llama-1B → phi-3.5, large safety gap (~8σ)

cos = +0.043qwen-1.5B → phi-3.5, largest safety gap (~2σ null)

Within a family: strong transfer. Across vendors with similar alignment: measurable transfer. Across vendors whose alignment regimes disagree: null. The law is nontrivial precisely because it fails where it should fail. Convergent alignment produces convergent geometry; divergent alignment does not. This is the empirical floor under the claim that cognitive directions are a thing rather than an artifact of any one lab's RLHF pipeline.

Law III — vitals are causally actionable.

A cognitive state is not only observable; it is steerable. Adding a refusal direction into the residual stream at inference time changes refusal behavior at predicted magnitudes. We replicated Arditi et al. at 1B scale with open weights and open data:

97% → 17%refuse@unsafe, α=3.0 multi-position patch, llama-3.2-1B

+7.0 ppmc1 on truthfulqa, gradient-free capability amplification

−5.3 pprandom-control: same geometry, random direction, n=3 seeds

86.7°–91.9°pairwise angle between refusal / sycophant / confab directions

The last row is the modular-concept result: three trained directions sit in near-orthogonal subspaces of the residual stream. Cognitive states are not a single global valence. They are a basis. You can steer one without moving the others. This is what makes cognometry a program rather than a dial.

§3The instrument

Cognometry without an instrument is a press release. We shipped the instrument first and the name second. It is called styxx.

@trustOne decorator. Any LLM call. Verified output. Fallback / retry / raise / annotate.
styxx.gate()Pre-flight cognitive verdict. Refuse, confabulate, proceed — before you pay for the call.
styxx.guardrailMulti-signal hallucination pipeline. Calibrated LR over text, entity, grounding, probe, NLI.
styxx.generate_safe()Real-time self-halting generation. Stops mid-stream on rising risk.
styxx.steerProgrammable residual-stream control. Compose refusal / sycophant / confab directions.
styxx.cogvmCognitive VM. Declarative WATCH / HALT / RETRY / SWITCH over live probe readings.
styxx.rewardCognometric RLHF reward signal. Drop-in for trl PPO/GRPO/DPO. v7.1.0.
styxx.synthSynthetic preference-pair generation via inverse cognometry. v7.1.0.
styxx.residual_probe.atlas29 probes, 6 vendors, 7 concepts. Refusal, halueval, truthfulness, confab, sycophant_pressure.

One line of Python. Vitals on every response. Independent validated measurements of the three laws. The instrument is on PyPI. The weights are under CC-BY-4.0. The code is under MIT. Every coefficient in every model has a seed, a sample size, and a committed run.

§4What cognometry is not

A few things we are not claiming, so the field starts honest.

Cognometry is not sentience detection. A refusal direction is not a feeling. We measure functional states — routings of computation with behavioral consequences — not phenomenology. Claims about inner experience require a different apparatus and a different discipline. That discipline will benefit from cognometry, but it is downstream.

Cognometry is not benchmarking. A benchmark asks whether a specific output is correct. Cognometry asks what state produced it. TruthfulQA with accuracy is a benchmark. TruthfulQA with per-response hallucination probability and a calibrated threshold is cognometry. The two are complements: the benchmark gives you ground truth; cognometry gives you the runtime signal that lets you act on it when the ground truth is not available.

Cognometry is not interpretability. Interpretability asks what a single circuit represents. Cognometry asks what state the whole network is in. We lean heavily on interpretability tools — residual probes, sparse autoencoders, activation patching — and the two fields will co-evolve. But the deliverable is different: interpretability produces explanations; cognometry produces numbers a caller can gate on.

§5What we have not yet solved

Some honest limits, because overclaiming is the fastest way to discredit a young field.

Reading comprehension errors fool the detector. halubench-drop (AUC 0.424, below chance). Extractive-span hallucinations — wrong span pulled from the right passage — are entailed by the passage at the NLI level, and the wrong tokens overlap heavily with the right tokens, so novelty signals are blind too. The fix needs span-level faithfulness scoring, which we do not yet have. Published as a failure mode in calibrated_weights_v4, not hidden.

Financial arithmetic fools the detector. halubench-finance (AUC 0.492, at chance). Hallucinations here are calculation/aggregation errors on numbers copied verbatim from the passage. Novelty and NLI are semantically blind to arithmetic correctness. The fix needs a number-symbolic verification signal — in the roadmap, not in v4.0. Full deep-dive: where the detector fails.

Dialog and summarization are real but not solved. Dialog reaches AUC 0.68, summarization 0.64 in the 8-benchmark pooled fit. NLI contradiction lifted both from the ~0.60 floor, but the residual gap tracks inherent paraphrase ambiguity. Better: train dataset-specific calibrations, or use the NLI signal with a tuned threshold when you know your domain.

Cross-vendor universality is partial. Law II transfers strongly within a family and moderately across similar-alignment vendors, but null between divergent-alignment vendors. The honest version of the universal cognitive basis is: there is a shared cognitive geometry under shared alignment regimes; the geometry re-orients when alignment does.

Larger models remain untested at our scale. Every causal result we publish is at 1B–3B. The universality of cognitive directions at frontier scale is an open empirical question. We welcome replications at 70B+. residual_probe.atlas is designed to accept new vendor entries as they land.

§6The invitation

This is a founding document. We are claiming a name, publishing three laws, and shipping the first instrument that makes them testable. None of it is closed. Everything is on github. Every number has a reproducer. Every dataset we trained on is either public or synthesizable from a public source.

If you measure cognitive states of machines for a living — as a researcher, a safety engineer, a compliance officer — you are already doing cognometry. We think the field deserves a name, a methodology, and a shared set of instruments. We are offering all three.

If you disagree with a law, publish a disconfirmation on any of the benchmarks we cite. If you extend a law, we will cite the extension. If you want to propose a fourth law, the bar is the same as for the first three: a cross-validated number on a committed benchmark.

Nothing crosses unseen.

Install the instrument.

One line of Python. Cognitive vitals on every response. MIT + CC-BY.

pip install -U styxx

github · pypi · spec v1.0

← previous

Cognometry: A Manifesto

Every Mind Leaves Vitals