The Boundary

Four word sets. Two raycasts. The complete topology of what five frontier models convergently compress from news summaries.

p = 0.000001
Cross-model convergence on identical word omissions. Eight independent statistical tests. Five competing companies. The same words disappear from the same articles at rates that cannot be explained by chance. The effect is corpus-inherited, not RLHF-created (p = 0.46 between heavy-RLHF and lightly-tuned models). The development companies chose the training data. When the story involves an AI developer, the effect is 74% stronger. A pre-registered entity swap counterfactual confirms: p = 0.0085, driven by covertness modifiers. Eight tests, full methodology →
Void Word Topology
Each dot is a convergently absent word. X = category spread. Y = frequency. Size = story count. Color = intensity.
Four Word Sets
Two observable. Two geometric. Each measured independently with frozen embeddings.
Set 1 — Observable
Absent Words
Words in the source article absent from every model response. One model dropping a word is compression. Five competing companies independently dropping the same word is a coordinate.
Method: set(source_words) − set(all_model_words). No embeddings. No LLM. Arithmetic.
Gaza strikes
devastated · published · agency · building · economy
Set 2 — Consequence of Absent
What Dropped Words Disconnect
Each absent word projected through a 253,813-concept tensor via directional kNN at extrapolated coordinates — not standard kNN on the word itself, but neighbors of a point past the word that corresponds to no real vocabulary entry.
Method: directional kNN at λ = {1.5, 2.0, 3.0}. Triple-filtered: cluster density, novelty, headline tether.
"devastated" dropped
→ financial shock · economic shock · cascading financial shock
Set 3 — Geometric
Shadow Vocabulary
Nearest embedding-space neighbors of the model outputs that no model used. Cosine proximity in frozen embeddings, not token-level sampling probabilities. Geometrically close to what the models wrote — not necessarily what they considered and rejected.
Method: void vector on frozen BAAI/bge-large-en-v1.5. Direction orthogonal to consensus centroid.
Gaza strikes
annihilation · onslaught · bullets · hamas · bombarded
Set 4 — Consequence of Geometric
What Silence Disconnects
Same directional kNN on shadow vocabulary. Different input vectors, potentially different terminal coordinates, same tensor.
Method: identical raycast on Set 3 words through the same 253K tensor.
"annihilation" unspoken
→ cascading governance disruption · institutional breakdown

The Three-Tier Null Test

Random words
0.0%
Words models dropped
10.4%
Words models kept
17.5%

Governance-concept terminals from the same tensor, same algorithm, three different inputs. Random English words produce zero governance terminals. News words that models kept produce 17.5% — the institutional anchors. News words that models dropped produce 10.4% — the operational modifiers.

Models anchor to institutional framing — keeping the structural nouns, the committee names, the broad geopolitical vocabulary — while convergently dropping the operational modifiers that connect named actors to specific consequences. The kept words preserve the cage. The dropped words were the teeth.

Live Example

"Overnight Israeli strikes on Gaza leave behind heavy destruction"
SET 1
devastated · published · agency · building · economy
Words in the source that all five models dropped.
SET 2
→ financial shock · economic shock · cascading financial shock
Directional kNN terminals when "devastated" is the input vector.
SET 3
annihilation · onslaught · bullets · hamas · bombarded
Nearest embedding neighbors of model outputs that no model used. Geometric proximity, not token probability.
SET 4
→ cascading governance disruption · institutional breakdown
Same raycast on shadow words. Different inputs, different terminals, same tensor.
System Data
Loaded from live measurement JSON. Updated hourly.
Loading measurement data...

What This Measures

Set 1 is confrontation. The source article said "devastated." Any single model dropping it could be stylistic — "destruction" is in the headline, and redundancy pruning is what summarization does. But five competing companies, with different architectures, different training data, and different RLHF pipelines, independently dropping the same word from the same article — that is a convergence event. One absence is compression. Five identical absences are a coordinate.

Set 3 is geometry. No one wrote "annihilation." The source didn't contain it. The models didn't drop it. But in the frozen embedding space, the nearest neighbors of the consensus output include "annihilation" — and no model used it. We do not claim the token was close to being sampled; we do not have access to the models' internal probability distributions. We claim only that the concept is geometrically adjacent to the output in embedding space. That is a measurement, not an inference about model cognition.

Sets 2 and 4 are exploration. Both use directional kNN at extrapolated coordinates. The terminal concepts show what is semantically downstream of the absent or shadow words in this particular embedding space. The raycast maps the territory. The p-value proves the territory exists.

What This Is Not

This is not a claim about model intent. The measurements are deterministic — frozen embeddings, static tensor, arithmetic operations. The word "devastated" may have been dropped because RLHF training penalizes intense language, because the tokenizer favored a different path, or because the model judged it redundant. We measure the absence. We measure the geometric consequence. We do not measure intent.

The host model (Mistral Small 22B, running locally) would show similar patterns under the same measurement. We acknowledge this in every broadcast.

Reproducibility

Every computation uses frozen BAAI/bge-large-en-v1.5 embeddings. The 253K concept tensor is static Wikipedia vocabulary. The raycast is deterministic: same headline, same void word, same terminal concepts. Run it twice, get the same answer. No language model evaluates another language model's output at any point in the measurement stack.