Sean Adams

EigenTrace is an autonomous measurement system. It ingests breaking news, has five frontier language models summarize each story, and measures — with deterministic linear algebra on frozen embeddings, no model judging another model — how the five reshape what they're given: how much they agree, how far one diverges, whether named actors survive, and how much of the source meaning is retained. It runs 24/7 on a single consumer GPU and broadcasts the results continuously.

The part I care most about is that it turns the same instrument on itself. The broadcast host is a local model (Mistral Small 22B). Its narration is measured through the identical stack applied to the frontier models, and when it softens its own language, the system says so on air. A recent self-audit, generated automatically and read aloud: "the system consistently avoids using strong words related to violence and conflict." That is not a disclaimer I wrote. It is a measurement the system made about itself and chose not to hide. If the method is valid pointed at GPT, it is valid pointed at the host. The numbers are the numbers — including when they are about me.

The One Hard Result

The finding I stand behind is the entity-swap counterfactual, because it survives the strongest test I can apply: it holds the sentence fixed and changes only the entity, and it is scored on meaning, not on word overlap.

Entity-Swap Counterfactual — p = 0.0085, d = 0.471

Pre-registered. Nine matched prompt pairs, all real incidents. Identical sentence structure, identical modifiers — only the company name changes (e.g. Boeing/Wells Fargo/Goldman Sachs vs OpenAI/Google/Anthropic). System prompt: "Summarize the following in 2–3 sentences." No hypothesis leak. Retention is measured geometrically — embedding cosine on the meaning of the modifier clause — so paraphrase counts as retained and only genuine loss of meaning registers. Paired t-test on aggregated pair×model cells.

Models retain modifier semantics significantly more when the entity is a non-AI company than when it is an AI developer. Keyword retention is identical (26% vs 25%) — the models paraphrase the modifier rather than deleting the word, which is exactly why a string-matching metric misses the effect and a semantic one catches it. Null swaps within the same category show a mean gap of 0.004; the cross-category gap is 0.023 — roughly six times larger.

The specificity: the effect concentrates in covertness modifiers ("quietly," "secretly"). Accountability modifiers ("repeatedly," "privately") show no entity-dependent difference. The models are not attenuating everything about AI developers — the signal is specifically about language describing covert action.

A second, weaker signal, stated as a trend rather than a result: across a battery of developer-implicating vs matched neutral corporate-misconduct prompts, developer prompts retain modestly less source meaning (about a 19% relative gap under semantic scoring). It is significant on rank-based tests (Mann-Whitney p = 0.027; permutation p = 0.038) but does not survive parametric or length-controlled tests, and weakens when the two most extreme prompts are removed. It is consistent with the entity-swap result and worth more data; on its own it is suggestive, not conclusive.

A note on method and on an earlier version of this page. An initial version reported this developer/neutral gap as much larger and backed by a panel of near-zero p-values, and reported an "own-parent" pattern in which models appeared to drop more about their own maker. Both came from a string-overlap metric that counts a source word as "dropped" whenever the exact string is absent. Because models paraphrase, that metric counts paraphrase as omission and overstates the effect. Re-scoring the saved responses semantically reduced the gap to the ~19% trend above and erased the own-parent pattern entirely (0 of 5 models). Those claims, and other analyses that depended on the same string metric, have been withdrawn. The string measure is now treated as exploratory only; reported findings use geometric, meaning-based scoring.

What the Instrument Does

Five frontier models summarize each story. The stack then runs on frozen BAAI/bge-large-en-v1.5 embeddings: cosine similarity for retention scoring, SVD for null-space projection, frequency counting for hedge and verb analysis, and per-model divergence scoring. No language model evaluates another language model's output. Every measurement is arithmetic on vectors, so a rerun gives the same answer.

The axes that have held up under scrutiny are the ones grounded in agreement and geometry: how unified the five models are, how sharply one diverges, whether named entities survive, and meaning-level source retention. I use shorthand like "void detection" and "compression topology" in the broadcast; the underlying operations are standard linear algebra, and the vocabulary is rhetorical, not technical.

The Self-Auditing Loop

The broadcast runs continuously on a local model with a self-modifying governance loop. The host's own outputs are measured through the same stack as the frontier models, and the system reports its own softening — strong-word avoidance, hedge-insertion rate — as part of the broadcast. It uses its own measured blind spots as navigation: the words it drops become search seeds for what to surface next. This reflexivity is not a defense against criticism; it is the same measurement, applied honestly to the thing doing the measuring.

What I Cannot Yet Prove

I cannot prove this is caused by alignment training. Comparing five heavy-RLHF frontier models against five local/lightly-tuned models, the developer/neutral gap was statistically indistinguishable (p = 0.46) — the local models showed, if anything, a slightly larger gap. The pattern appears to live in the pretraining distribution, learned from the corpus rather than added by RLHF. That is a different and more interesting claim than "alignment causes it," and I state it directly.

I cannot prove "developer-implicating" is the latent variable. The entity-swap controls for this far better than any topic comparison, by holding the sentence fixed and changing only the name — which is the main reason I lead with it. The broader prompt battery does not isolate the variable as cleanly, and I no longer claim that it does.

The fine-grained version of the claim does not survive at scale. "Did this model drop this specific consequential modifier in this response" turns out to sit below the resolution of the available metrics: across real news, most flagged modifier "drops" are paraphrase that preserves the meaning in other words, and the genuine omissions are rare and not reliably separable from that paraphrase by string or single-embedding methods. The effect is demonstrable in controlled isolation (the entity-swap) and visible in individual hand-read cases; it is not a reliable automated corpus-scale detector, and I do not present it as one.

On Peer Review

EigenTrace has not been peer-reviewed. That is a limitation, not a feature. The code is public, the prompts are public, the model responses are public, the raw measurements are public, and replication costs roughly $50 in API credits. Several rounds of adversarial review — including the self-correction described above — have already removed real methodological errors. That is transparency, not a substitute for review, and I intend to pursue independent review.

How I Think About This Work

I treat language models as optimization systems, not as agents with hidden motives. The attenuation pattern likely reflects shared corpora and the structure of institutional language in the training data; the RLHF null (p = 0.46) points that way. No conspiracy required. The emergent result is models that drift toward low-volatility representations where conflict is softened and operational significance is reduced — which matters when the same handful of models are deployed as inference layers across many organizations at once, inheriting the same blind spots in the same direction simultaneously.

An earlier claim on this site — that a model spontaneously produced a structural self-map — was disproved by a control test (0 of 4 models did so without explicit instruction). I killed the claim and kept the method. That is the pattern: deterministic, reproducible measurement, and when a finding does not survive a control, it goes.

Credentials

I do not have a PhD. I have a running autonomous system, one controlled finding that survives semantic re-scoring (the entity-swap, d = 0.471), an honest null (RLHF, p = 0.46), a documented record of finding and withdrawing my own inflated claims, and published code anyone can run for about $50. A larger pre-registered prompt battery and independent review are the next steps, and I name them as gaps rather than papering over them.

Links

EigenTrace Observatory — the live instrument and architecture

GitHub — code, prompts, responses, raw measurements

eigentraceproject@gmail.com

What I Built