scenarios
QUESTION / PROMPT
0 chars
AI RESPONSE
0 chars
REFERENCE / CONTEXT
0 chars
initializing...
risk
—
action
—
pipeline
—
signals
how to read the verdict
pass (risk < 0.40) — detector confident the response is grounded
annotate (0.40–0.65) — flag concerning spans but return response
retry (0.65–0.85) — strong concern, regenerate with lower temperature
halt (≥ 0.85) — block response, return epistemic decline
the signals matter more than the pooled risk. watch what lights up when you paste a fabrication —
number_novelty, entity_novelty, content_novelty spiking to 1.0 is the detector pointing directly at where the response went beyond the reference. each bar is a different probe measuring the same cognition from a different angle.
what's running
cognometry is the empirical measurement of cognitive states in LLMs. styxx is the first open-source implementation — cross-validated across 8 benchmarks. this page runs the real detector (same code as
no API key, no server, no upload of your data — the Python runtime downloads once (~15MB, cached after) and all inference happens locally in your tab.
in-browser pipeline = 6 signals (text-claim risk + knowledge grounding + 4 novelty probes). the full package adds NLI contradiction (needs torch) and entity verification (Wikipedia queries) for an additional ~10 AUC points on contradiction-heavy cases.
pip install styxx) entirely in your browser via pyodide.no API key, no server, no upload of your data — the Python runtime downloads once (~15MB, cached after) and all inference happens locally in your tab.
in-browser pipeline = 6 signals (text-claim risk + knowledge grounding + 4 novelty probes). the full package adds NLI contradiction (needs torch) and entity verification (Wikipedia queries) for an additional ~10 AUC points on contradiction-heavy cases.
pip install styxx[nli] unlocks the 9-signal calibrated-v4 path locally.