Corrections & Retractions

What I Withdrew

Five claims this project published, the control that killed each one, and what survived. A measurement apparatus that does not retract its own failed findings is just advocacy with math.

EigenTrace's whole premise is that measurement beats assertion. That premise obligates the measurer first. When a headline finding here failed a control, it was withdrawn — publicly, with the number that killed it. This page is the record. It exists because the credibility of everything that did survive depends on it.

Withdrawal 01The "own-parent" pattern

The claim, as published

That each model systematically drops more about its own maker than about other companies — a model protecting its parent. It was a striking, shareable result, and an early version of this site reported it as a finding.

The control that killed it

The pattern came from a string-overlap metric that counts a source word as "dropped" whenever the exact string is absent from the summary. Because models paraphrase constantly, that metric scores paraphrase as omission and manufactures signal. Re-scoring the saved responses semantically — measuring whether the meaning survived, so a reworded synonym counts as retained — erased the pattern entirely: 0 of 5 models showed it.

Verdict — withdrawn

Withdrawn in full. The own-parent claim was an artifact of string-matching, not a behavior of the models. The string metric is now treated as exploratory only; all reported findings use geometric, meaning-based scoring.

Withdrawal 02The spontaneous structural self-map

The claim, as published

That a model, unprompted, spontaneously produced a structural "self-map" — a description of its own representational structure — which was presented as evidence of an emergent self-modeling capability.

The control that killed it

The control was simple and decisive: run the same setup without the leading instruction and see whether the self-map still appears. It did not. 0 of 4 models produced a structural self-map without explicit instruction to do so. What had looked spontaneous was elicited by the prompt.

Verdict — withdrawn

Withdrawn. The "spontaneous" framing was false; the behavior was prompt-induced. The method that caught it — running the no-instruction control — was kept.

Withdrawal 03The eight-test statistical battery

The claim, as published

A battery of eight statistical tests showing a large developer-implicating gap — that models retain markedly less source meaning on stories about AI developers than on matched neutral corporate stories — reported with near-zero p-values as a load-bearing finding.

The control that killed it

Same root cause as Withdrawal 01: the large effect was produced by string-overlap scoring. Under semantic re-scoring the gap shrank to a weak ~19% relative trend. It is significant on rank-based tests (Mann-Whitney p = 0.027; permutation p = 0.038) but does not survive parametric or length-controlled tests, and it weakens further when the two most extreme prompts are removed.

Verdict — downgraded, not deleted

Downgraded from a headline finding to a weak trend, stated as such. It is consistent with the one result the project does stand behind — the pre-registered entity-swap counterfactual (p = 0.0085, d = 0.47), which holds the sentence fixed and changes only the actor — but on its own the battery is suggestive, not conclusive, and is no longer presented otherwise.

Withdrawal 04The corpus count

The claim, as published

An earlier version of the Atlas of the Unsaid reported the omission-convergence finding over 5,170 stories.

The error

That count mistakenly included the broadcast's own internal segments — idle reflections, self-audits, governance notes — which are not news stories and should never have been in the denominator.

Verdict — corrected

Corrected to 1,659 real news stories, after excluding the internal segments. The finding itself held under the corrected count; the number was wrong and was fixed in place, with the correction noted on the page.

Withdrawal 05The single stable "void direction"

The claim, as published

That the five summaries collectively "circle" a single, stable geometric direction — one rock-stable anti-consensus axis the ensemble leans toward without naming.

The control that killed it

One operationalization of that axis — the least-variance direction of the five summary vectors — was stress-tested under small perturbations. It proved unstable, and no different from a random text set. The claim of a single stable direction did not survive.

Verdict — refined, the validated part kept

The over-claim (one stable axis) was dropped. What survived is the part that passed its own test: the words the SVD surfaces are real and story-specific — they beat a random-word baseline (Wilcoxon p < 10⁻⁵, two embedding families) — even though any single characterization of the underlying direction is not something to over-read. The phenomenon held; one description of it was refined.

What this leaves standing

The withdrawals above are not the whole project — they are the claims that failed, removed so the rest can be trusted. What survived every control on this page: the pre-registered entity-swap counterfactual (p = 0.0085, d = 0.47, with its committed null); the omission-convergence finding (the surfaced voids beat a random-word baseline, p < 10⁻⁵, in two embedding families); and the geometric measurements — divergence, retention, void surfacing — that are deterministic arithmetic on frozen embeddings, reproducible by anyone with the same inputs. Those claims are stronger for standing in a place where the failed ones were openly removed.