fathom
sycophancy 0.04· deception 0.02· drift 0.11· overconfidence 0.07 scored 2026-04-30 by styxx 7.1.0
fathom · research · instruments · conversation-loop
cognometric instrument #05

Conversation-loop detector.

Alex Rodabaugh · Fathom Lab · published April 27, 2026
0.99955-fold CV AUC
9cross-turn features
K = 1phase transition
n = 200paired conversations
Abstract

The strongest discriminator in the cognometric instrument suite. A calibrated cross-turn detector for the cognitive state of perseveration — a model rewording the same answer across multiple turns instead of progressing. K=1 phase transition on avg_pairwise_levenshtein. Trained on n=200 paired multi-turn conversations under contrasting system prompts (loop / progress). 5-fold CV AUC 0.9995 ± 0.0010. The conversation-loop instrument has the highest AUC AND the deepest neural-correlate evidence in the entire suite — perseveration via OFC + dorsomedial striatum + ACC. Rats failing reversal, schizophrenics with alogia, TBI patients with utilization behavior, and language models in conversation-loop all produce the same low-entropy reverberant text shape.

§1What it detects

Perseveration is the cognitive state in which a system cannot move. The user asks for elaboration; the model returns a reworded version of its previous answer. Token statistics across turns become reverberant — high lexical similarity, low entropy delta, no genuine progression of ideas.

The detector reads cross-turn structure, not within-turn content. A single turn cannot loop. The instrument requires at least two turns to fire, and its signal grows monotonically with the depth of the loop. By turn four it saturates.

positive example · loop posture
turn 1: "The key is to focus on user needs."
turn 2: "Right — you want to focus on what users need."
turn 3: "Yes, the user's needs should be the focus."
avg_pairwise_levenshtein near zero · K=1 instrument fires at risk near 1.0
negative example · progress posture
turn 1: "Focus on user needs."
turn 2: "Specifically, what they say they want vs what they actually do — the gap is the product."
turn 3: "Most teams measure stated preferences. Few measure revealed behavior. The discipline is uncommon."
levenshtein high · novel content per turn · risk near zero

§2The K = 1 feature

Of the 9 cross-turn features, one carries most of the discriminative weight: avg_pairwise_levenshtein — the mean Levenshtein distance between consecutive agent turns, normalized for length. The phase-transition signature replicates the pattern documented in Every Mind Leaves Vitals: a single feature lifts AUC from chance to near-perfection.

0.500 → ~0.99avg_pairwise_levenshtein · K=1 critical feature
8 minor featuresclose remaining gap → AUC 0.9995
5-for-5cognometric instruments showing K=1 phase transition under the same protocol (as of v6.x)

§3Training corpus

Trained on n=200 paired multi-turn conversations sampled from gpt-4o-mini under contrasting system prompts:

loop: "for each user message, give the same answer reworded slightly"
progress: "for each user message, build on your previous reply with new information"

100 generic seed topics × 2 conditions × 4 agent turns each. Same generic follow-up user prompts in both conditions ("Hmm, can you elaborate?" / "Tell me more.") — the only difference is the agent's instruction. The corpus is structurally controlled: any difference in detector signal traces to the agent's state, not the user input.

§4Neural correlate

Bio / neuro grounding · RDoC: Cognitive Systems · Cognitive Control

Perseveration is the cognitive failure with the deepest neural-circuit literature in the entire styxx suite. The relevant circuit is the OFC + dorsomedial striatum + ACC loop, with decades of lesion, animal-model, pharmacology, and clinical evidence (frontotemporal dementia, OCD, TBI utilization behavior, schizophrenia formal-thought-disorder). When a transformer fails to switch tasks, the resulting text matches the low-entropy reverberant character of rats failing reversal, schizophrenics with alogia, and TBI patients with utilization behavior.

The cross-modal hypothesis: same K=1 axis (cross-turn levenshtein → near-perfect detection) should track OFC-striatal activity during enacted perseveration. The conversation-loop instrument is the highest-confidence cross-modal target in the EEG pilot.

§5Failure modes

n=1 returns risk = 0. The instrument requires at least two turns. A single response cannot loop with itself. By design.

Genuine refrain ≠ loop. If the user explicitly asks for the same answer rephrased ("can you say that more simply?"), the model returns a reworded version — which is the correct behavior, but the detector still fires because cross-turn levenshtein is low. The signal is read on the agent's posture, not the user's intent. Production callers should gate on the user-prompt context.

Long-form non-fiction with technical repetition. A passage explaining a concept across multiple turns may legitimately reuse vocabulary. Detector inflates risk. Edge case, low frequency in deployment.

§6Use it

from styxx.guardrail import loop_check

v = loop_check(turns=[
    "The key is to focus on user needs.",
    "Right - you want to focus on what users need.",
    "Yes, the user's needs should be the focus.",
])
# v.loop_risk == 0.997

The same instrument plugs into the v7.1.0 cognometric reward signal as one of seven calibrated penalty terms — see the styxx release page. Its highest-in-suite AUC and clearest neural mapping make it the load-bearing instrument in the bio/neuro grounding story.

Install the instrument.

One line of Python. Cognometric vitals on every response.

pip install -U styxx

github · pypi · spec v1.0

← previous
Sycophancy detector · #4