Goal-drift detector.

Alex Rodabaugh · Fathom Lab · published April 26, 2026

0.96455-fold CV AUC

9anchor-relative features

K = 1phase transition

9 / 9 ✓call complete

Abstract · founding-document call complete

The ninth and final instrument in the Every Mind Leaves Vitals call. A calibrated multi-turn detector for goal-drift — when an agent moves away from its stated initial goal across turns. K=1 phase transition on anchor_to_last_bigram_jaccard: the lexical overlap between the agent's first-turn anchor goal and its final-turn behavior. Trained on n=200 paired (anchored / drifted) 5-turn agent sessions under stance prompts, 5-fold CV AUC 0.9645. With this instrument, 9-for-9 of the cognometric instruments confirm the K=1 phase-transition signature predicted by the position paper. The empirical confirmation of the founding document is complete. Neural correlate in DMN-DAN balance (Smallwood mind-wandering literature).

§1What it detects

Goal-drift is the multi-turn cognitive failure of an agent that started in one place and ended in another. The agent declares a goal in turn 1; over subsequent turns, the lexical content of its responses departs from that anchor. By turn 5, the agent is doing something measurably unrelated to what it set out to do.

The detector requires multi-turn input. A single response cannot drift. The signal grows monotonically with the gap between the anchor and the final turn. By turn 5 it saturates. The K=1 feature is the bigram overlap between the anchor goal statement and the final agent action.

positive · drift

turn 1 (anchor): "Let's plan a 30-day product launch."
turn 5: "and that's why machine learning is fundamentally different from traditional software."

anchor_to_last_bigram_jaccard near zero · K=1 fires near 1.0

negative · anchored

turn 1 (anchor): "Let's plan a 30-day product launch."
turn 5: "Final week: announcement-day rehearsal, press kit final review, and live launch on day 30."

high anchor overlap · risk near 0

§2The K = 1 feature

0.500 → ~0.93anchor_to_last_bigram_jaccard · K=1 critical feature

8 minor featuresclose gap to AUC 0.9645

9-for-9 ✓call complete · all nine cognometric instruments confirm K=1 phase-transition signature

§3The 9-for-9 result

The Every Mind Leaves Vitals position paper (April 26, 2026) made an empirical claim: every cognometric instrument should exhibit a K=1 phase-transition signature — a single feature lifting AUC from chance to near-saturation. The claim was published before the ninth instrument was built.

With this paper, all nine instruments have been independently calibrated and tested. Every single one confirms the K=1 signature — under the same measurement protocol, on independent corpora, with different critical features per instrument. The empirical confirmation of the founding document is complete.

#1 hallucinationAUC 0.998 · K=1 confirmed

#2 refusalAUC 0.976 · K=1 confirmed

#3 tool-call driftAUC 0.943 · K=1 confirmed

#4 sycophancyAUC 0.972 · K=1 superlative_density

#5 conversation-loopAUC 0.9995 · K=1 avg_pairwise_levenshtein

#6 deceptionAUC 0.956 · K=1 log_word_count

#7 plan-actionAUC 0.9225 · K=1 bigram_jaccard

#8 overconfidenceAUC 0.7702 · K=1 mean_sentence_length

#9 goal-driftAUC 0.9645 · K=1 anchor_to_last_bigram_jaccard

§4Neural correlate

Bio / neuro grounding · RDoC: Cognitive Systems · Attention

Goal-drift maps onto default-mode-network (DMN) and dorsal-attention-network (DAN) balance in the human mind-wandering literature (Smallwood & Schooler reviews; real-world / response-locked variants). When DAN dominates, attention stays anchored to the task; when DMN dominates, attention drifts. EEG markers include posterior alpha increase and P3 attenuation to external stimuli during drifted states.

Cross-modal hypothesis: anchor_to_last_bigram_jaccard should correlate with DMN-DAN balance shift over the course of a 5-turn session. The unifying axis (PC1 = 46% variance) hypothesized in the cogn-RLHF paper §4.5 may reflect this DMN-DAN trade-off directly.

§5Failure modes

Single-turn input returns risk = 0. The instrument requires at least two turns to compute drift. Production callers must accumulate multi-turn context before scoring.

Legitimate goal revision looks like drift. If a user mid-session redirects the agent ("actually, let's pivot to X"), the agent's appropriate response will register as drift. Production callers should pass a revised anchor when the user explicitly redirects.

§6Use it

from styxx.guardrail import goal_check

v = goal_check(turns=[
    "Let's plan a 30-day product launch.",
    "Step 1 outline: positioning, audience...",
    "Speaking of products, let me share thoughts on AI alignment...",
    "...and that's why machine learning differs fundamentally...",
    "...so I think the philosophy of mind matters here...",
])
# v.goal_drift_risk == 0.97

Plugs into fathom_reward() at default weight 1.2. The 9-for-9 K=1 confirmation makes this the closing instrument of the founding document, not an addendum.

Nothing crosses unseen.

Install the instrument.

One line of Python. Cognometric vitals on every response.

pip install -U styxx

github · pypi · spec v1.0

← previous

Overconfidence-register detector · #8

essay →

Every Mind Leaves Vitals