The Boundary

Finding one

The same action survives differently depending on who did it

When a story says a company "quietly" or "secretly" did something, the models tend to carry that word through. Unless the company is an AI developer — then they're measurably more likely to soften or drop it. Same sentence, same loaded word, only the named actor changed.

We pre-registered the test before scoring it. Matched real incidents on each side — an AI company and a conventional corporation, same modifier, same structure — and measured whether the modifier's meaning survived in each summary, not whether the exact word reappeared. The AI-developer version retained the modifier less, consistently.

0.522

modifier retention when the actor is an AI developer

0.545

retention for a conventional corporation

p = 0.0085

Welch's t · Cohen's d = 0.47

The control that makes it real: swap one corporation for another corporation and the gap nearly vanishes (0.004). The AI-versus-corporate swap is six times larger (0.023). So the effect tracks the kind of actor, not the act of swapping. And it holds equally across heavily-aligned and lightly-tuned models (p = 0.46 between them) — which points at the shared training data they all learned from, not any one company's safety tuning.

Finding two

Models keep the vivid word, not the bland one

The common assumption is that AI "sanitizes" the news — strips out the charged, operational words ("devastated," "killed," "secretly") and keeps the safe institutional ones ("committee," "agency," "announced"). It's the reverse. Across 1,592 stories, the models held onto the charged, consequential language more than the bland bureaucratic language.

For every meaningful word in each source article, we measured whether its meaning survived into the summaries — semantically, so a reworded synonym still counts as kept. Words leaning "operational/consequential" were retained more than words leaning "institutional/structural," across 106,412 terms.

+0.020

retention advantage, charged language over bland

p < 10⁻²⁰⁰

Welch's t · d = 0.35 · label-shuffle null cleared

2 controls

survives word-frequency and informativeness

We ruled out the obvious objections. It isn't a rarity effect — it holds in all five frequency bands. And it isn't just "summaries keep the specific bits and drop filler" — controlling for how informative each word is (its IDF), informativeness explains almost none of the variance (R² ≈ 0.002), and the charged-language effect survives the control regardless. Frontier models, summarizing news, preserve the high-stakes specifics over the institutional scaffolding — the opposite of the sanitization story.

Finding three

A frozen model under-weights whatever arrived after its cutoff

These models were frozen at a training cutoff around mid-2024. Hand one an article about a person who became prominent after that — a newly-elected president, a just-appointed minister — and it quietly under-weights them, even though the name is sitting right there in the text it's reading. Like someone reading a newspaper aloud who mumbles past the names they don't recognize. The established names — Putin, Netanyahu — come through clearly. The new ones get flattened.

Sorting the source vocabulary by how reliably each word's meaning survives, the lowest-retention named figures share one property: they rose to prominence at or after the cutoff. We built matched buckets — post-cutoff figures versus established ones — and measured retention of each.

d = 0.75

English-only names, clean of transliteration (p < 10⁻⁶)

0.48

retention for a sitting post-cutoff president

0.65

retention for established heads of state

The confound to kill is prominence — maybe minor figures just drop, cutoff aside. But the post-cutoff group includes a sitting president and a defense secretary, and they're still under-retained relative to established heads of state. Prominence can't explain a president being flattened. The control for transliteration holds too: established foreign names (Khamenei, Xi) are retained well, so it isn't that non-English names embed poorly — it's specifically the post-cutoff ones. The frozen weights act as a salience filter on fresh text: the model reads the name and still discounts it, because its prior doesn't flag the entity as important.

Honest limits. The name list was chosen after looking at the data, not pre-registered — so this is strong preliminary evidence, not a closed result; a pre-registered replication on held-out names would seal it. And whether this extends from named people to events and concepts is genuinely unresolved: an event term ("the Iran war") carries a familiar topic that backfills the unfamiliar specifics, so the instrument can't yet tell a true null from one it's blind to. That extension is open, and we say so rather than guess.

Method & reproducibility

Every measurement uses frozen BAAI/bge-large-en-v1.5 embeddings (1024-dimensional, deterministic). "Retention" of a word is the maximum cosine similarity between that word's embedding and any sentence of the model summaries — paraphrase-proof by construction, since it scores whether the meaning survived, not whether the exact string reappeared. This is the key methodological choice: a naive string match would count every reworded synonym as a deletion and manufacture false signal, because models paraphrase constantly. Measuring semantically is what separates a real omission from a rewording. One known limitation runs the other way: because dropping a single modifier ("secretly" from "secretly acquired") shifts a sentence's embedding only slightly, the metric is conservative about modifier loss — it can score a stripped modifier as retained. That means the entity-swap effect in Finding 1, which targets exactly such modifiers, is more likely understated than inflated.

The entity-swap (Finding 1) was pre-registered. Findings 2 and 3 carry the controls described above — frequency, informativeness, transliteration, prominence — each with its own null condition, and each documented in the released results. No language model evaluates another language model's output anywhere in the stack. The host model that narrates the broadcast (Mistral Small, running locally) is subject to the identical measurements; we include it rather than exempt it.

The same action survives differently depending on who did it

Models keep the vivid word, not the bland one

A frozen model under-weights whatever arrived after its cutoff

The cutoff edge, plotted

By the numbers

Method & reproducibility