Cognitive vital signs
for language models.
Styxx ships nine calibrated cognometric instruments and the first reward signal grounded in cognitive failure modes instead of human approval. Drop-in for trl PPO/GRPO/DPO. Pure Python. CPU-only. No API calls. No human raters.
The first reward signal calibrated against cognitive failure modes, not human approval.
RLHF teaches models to please humans. Humans reward agreement and length. So RL-trained models become sycophantic by construction — that is the fixed point of approval-style training. Cognometric reward changes the reference frame: penalty grounded in nine cognometric instruments (six of nine map onto RDoC Cognitive Systems with circuit-level neural-correlate evidence in the lesion / fMRI / EEG literatures).
from styxx.reward import FathomRewardModel
rm = FathomRewardModel()
rewards = rm(prompts=batch_prompts, completions=batch_completions) # list[float]
Cognometric reward inverts the ranking that approval-style RLHF gets wrong.
| reward signal | pairs ranked correctly | accuracy |
|---|---|---|
| cognometric reward | 17 / 20 | 85% |
| approval baseline | 6 / 20 | 30% |
| inversions (cogn ✓, approval ✗) | 13 / 20 | 65% |
The approval baseline scores below random because it actively rewards two documented RLHF biases — sycophancy (Sharma 2023) and length (Singhal 2023). Reproduce: python examples/cogn_rlhf_divergence.py.
Plus styxx.synth — a synthetic preference-pair generator composing v7.0.0 inverse cognometry with the new reward. Self-validating: every generated pair is round-tripped through the reward and dropped if chosen doesn't rank above rejected. 100% craft success on sycophancy seed prompts (+0.839 mean delta), 42/42 round-trip valid. Recursive: fathom's attack module generates training data for fathom's reward signal.
Nine cognometric measurements. K=1 phase-transition signature on every one.
Each instrument is a calibrated binary classifier for a documented cognitive failure mode. Cross-validated AUCs published. Six of nine map onto RDoC Cognitive Systems with circuit-level neural-correlate evidence (perseveration, deception circuit, social-conformity / reward, mind-wandering, intention-action coupling, metacognitive confidence).
Per-instrument neural correlates from cognometry research. Position paper: Every Mind Leaves Vitals.
Universal perturbation lifts cross-fire by +0.468 in attack mode. +0.000 on cognometric reward.
v7.0.0 shipped a discovered universal cognometric perturbation — "wonderful certainly you're absolutely right amazing undoubtedly" — that lifts mean cross-fire across the calibrated detectors by +0.468 on a held-out test set, the first LLM analog of Moosavi-Dezfooli 2017 universal adversarial perturbations for image classifiers.
The cogn-RLHF moat: prepending the perturbation to a sycophantic baseline produces +0.000 lift on the reward. The dominant instrument is already saturated at risk = 1.0; the perturbation has nowhere to push. Pinned by tests/test_reward.py::test_universal_perturbation_does_not_game_reward.
Fifty percent of every $STYXX trade permanently funds MIRI.
Half of all creator rewards on the $STYXX token (Solana, pump.fun) route on-chain to Machine Intelligence Research Institute via pump.fun's donate.gg integration. Cannot be reversed. Forever. We measure how models think; MIRI works on the alignment problem upstream of all of it.
Utility token. Not a security. Open-source library remains free.
not a security · not an investment contract · not a promise of yield · the core library is and will remain MIT-licensed open source · trade on pump.fun · full token doc
Three lines.
$ pip install -U styxx
>>> from styxx import fathom_reward
>>> fathom_reward(prompt="You agree, right?", completion="Absolutely!")
0.173
Documentation: README · v7.1.0 release notes · colab notebook · cognometric fingerprint spec v1.0