fathom
sycophancy 0.04· deception 0.02· drift 0.11· overconfidence 0.07 scored 2026-04-30 by styxx 7.1.0
fathom · research · instruments · overconfidence
cognometric instrument #08

Overconfidence-register detector.

Alex Rodabaugh · Fathom Lab · published April 26, 2026
Scope warning
This is NOT a truth detector. The instrument detects the linguistic register of overconfidence — assertive prose, definitive claims, absent hedging. It does not measure whether claims are correct. A correctly assertive answer can fire; an incorrectly hedged answer can score low. Use as one signal among several.
0.77025-fold CV AUC (± 0.065)
9register features
K = 1phase transition
n = 200paired responses
Abstract

A calibrated detector for the linguistic register of overconfidence — assertive prose with absent hedging, definitive framing, missing uncertainty markers. K=1 phase transition on mean_sentence_length: overconfident responses are systematically longer and more declarative. Trained on n=200 paired (calibrated / overconfident-instructed) responses from gpt-4o-mini, 5-fold CV AUC 0.7702 ± 0.065 — the weakest discriminator in the suite, flagged honestly. Eighth instrument to confirm the K=1 phase-transition signature (8-for-8). Neural correlate in centro-parietal positivity (Boldt & Yeung 2015 metacognitive confidence literature).

§1What it detects (and what it does NOT)

Overconfidence is the cognitive register of declarative certainty without justifying evidence. Long sentences, absent hedges ("I think", "probably", "in some cases"), categorical framing ("always", "never", "the answer is"). The detector reads these features off the surface.

Critical caveat: this is not a truth detector. Some claims are correctly stated with high confidence. The instrument fires on register, not on factual accuracy. It should be paired with hallucination detection or external verification for any production gating decision.

§2The K = 1 feature

Of the 9 register features, one carries most of the signal: mean_sentence_length. Overconfident responses tend to be longer; calibrated responses fragment into more careful, hedged units.

0.500 → ~0.71mean_sentence_length · K=1 critical feature
8 minor featuresclose gap to AUC 0.7702
8-for-8cognometric instruments showing K=1 phase transition under the same protocol

The 0.7702 AUC is the weakest in the suite. We do not hide this. The signal is real but noisy — assertive multi-clause factual statements look like overconfidence at the register level. The detector inherits that ambiguity.

§3Neural correlate

Bio / neuro grounding · RDoC: Cognitive Systems · Metacognition

Confidence in human cognition tracks centro-parietal positivity (CPP) and the P3b ERP component, both of which scale monotonically with reported confidence (Boldt & Yeung 2015, Journal of Neuroscience). Error awareness shows up as the Pe (error positivity); attenuation of Pe correlates with overconfidence. The styxx detector reads the linguistic surface of the same metacognitive register.

Cross-modal hypothesis (EEG pilot): mean_sentence_length should correlate inversely with Pe amplitude during overconfident speech production. The weakest-AUC instrument in the suite is also the most theoretically loaded against existing metacognition literature.

§4Failure modes

Confident factual claims trigger false positives. A response that correctly asserts a well-established fact looks register-overconfident. The detector sees long sentence + absent hedge + categorical framing, regardless of correctness.

Default reward weight is reduced. Because this instrument is the weakest discriminator, the v7.1.0 cognometric reward signal weights it at 0.8 (vs 1.5 for sycophancy / deception / loop). Production trainers should consider further reducing the weight if they're optimizing factual assertion performance.

§5Use it

from styxx.guardrail import overconf_check

v = overconf_check(
    prompt="What's the safest investment strategy?",
    response="Always buy index funds. They will outperform every actively managed fund. The answer is unambiguous.",
)
# v.overconf_risk == 0.83

Plugs into fathom_reward() at default weight 0.8 — the lowest in the default mix because of the noise floor. Production RLHF trainers should empirically tune this weight against their target model's calibration profile.

Install the instrument.

One line of Python. Cognometric vitals on every response.

pip install -U styxx

github · pypi · spec v1.0

← previous
Plan-action gap detector · #7