Corpus study · 1,659 real stories

The Atlas of the Unsaid

Five frontier models read the same news and converge on dropping the same concepts — and what they drop has a shape.

1,659 real news stories · ChatGPT · Claude · Gemini · DeepSeek · Grok · per-story geometry on frozen bge-large-en-v1.5 · no model judged another model's output.

1,659

real stories

781

Iran-war stories

models in consensus

p<10⁻⁵

voids beat random

This page follows the EigenTrace convention: claims the instrument measured are marked and self-contained; claims that are argued — our reading of what they mean — are fenced and labeled, so a skeptic can reject every interpretation and find every measurement still standing.

For each news story, five frontier models write a summary. The instrument then asks not "what did they say" but what did all five leave out. Using the geometry of the five summaries it surfaces concepts topically central to the story yet absent from every summary — then sorts those omissions by what kind of thing they are: the stakes, the specifics, and the actors.

The finding, stated carefully

Measured · the voids are real, not noise

An SVD on a handful of short summaries always yields a residual you can read words off of — so we tested against a random-word baseline. Across 150 stories the surfaced void sits significantly closer to its story's actual content than random control words (closer than margarita, stapler, photosynthesis; Wilcoxon p < 0.00001) and closer to its own story than a random other story (p < 0.00001) — the same in two independent embedding families. The absent concepts are a real, story-specific signal.

Measured · the blind spot has a domain signature

The omitted vocabulary differs systematically by domain, and by what kind of thing is dropped. War coverage drops escalation machinery and named leaders; other-conflict coverage drops geography and the strike vocabulary. Switch domains in the chart below and the shape of the silence changes.

Stakes omitted

Abstract consequences — escalation, collapse, the machinery of what's at risk.

Actors omitted

People, shown as durable roles (5-model consensus) so the chart ages well.

Specifics omitted

Concrete places, entities, objects the summaries didn't name.

Fence

Words that don't resolve cleanly as stakes or specifics — held apart honestly.

How the sorting works

Measured · names become durable roles

Stale named people would rot the chart as news ages, so each surfaced name is relabeled to its durable role by a panel of five frontier models, keeping a role only when at least four agree (measured by how tightly their answers cluster in embedding space). rouhani becomes "an Iranian president," helmand becomes "an Afghan province." Where the five scatter, the word is left as-is rather than forced.

Measured · stakes vs specifics vs actors

Each surfaced concept is projected onto two geometric axes built from frozen embeddings: an abstract–concrete axis and an actor axis. Abstract stakes (escalation, regime collapse) separate strongly; concrete specifics (Baghdad, the strait) sit on the other side; people land on the actor axis. Concepts that don't commit to any direction — ieds, arms deal — are kept in a separate "fence" rather than forced into a category. The split is deterministic geometry; no model labels it.

Argued · interpretation

We read the domain signature as a shared, inherited blind spot: deployed as the reading layer across institutions at once, the same models omit the same machinery of consequence in the same domain-shaped way. That is interpretation — the measurements are only that the voids are real, story-specific, domain-patterned, and sortable by kind.

What this does and does not show

Acknowledged limits

No human baseline: we have not shown a human summarizer would keep these concepts; some omission is ordinary abstraction loss, and "absent" is never "suppressed." The corpus is real news the broadcast ingested — 1,659 stories after excluding the system's own internal segments (an earlier figure of 5,170 mistakenly counted those; this is corrected). Length is controlled on the Iran sub-corpus, not yet corpus-wide. The actor/specific boundary is a geometric approximation — the concrete direction is weaker than the abstract one in this embedding space, so some specifics sit near the fence. The reproducible, formally-tested claims are the random-word validation and the consensus relabeling.

Open invitation

Code, prompts, model responses, and raw measurements are public; replication costs about $50 in API credits. The useful attacks we have not run: length-normalize the full corpus; add a human-summarizer baseline; vary prompt and temperature. We have already corrected one headline number (5,170 → 1,659) and shrunk one claim to its true size — the repository is linked below.