A reading method · paste it beside a source
A summary relays. Read the negative space, and it begins to think. This page describes a method for reading what a source implies but never states — and hands it to whatever model you paste it beside.
Ask a language model to summarize an article and you get stenography: a faithful relay of what the article said. Useful, and shallow. It tells you the text. It does not read it.
A close reader does something a relay cannot. They notice what the article circles but never says — the contradiction it leaves implicit, the question it raises and declines to answer, the machinery it gestures at and never names. That negative space is where the story often sits. Summary Plus is a written method for reaching it, and a measured claim about what reaching it is worth.
This page makes two kinds of statement, and keeps them apart on purpose, the way every EigenTrace page does. Measured means the instrument returned a number, reported here whether or not it flattered the method. Argued means our reading of what those numbers imply — defensible, labeled, and separable, so you can keep the measurement and reject the interpretation. Where measurement ends we say so.
A real wire story: Mexico says two U.S. agents killed in a crash after an anti-narcotics raid were not authorized to operate on Mexican soil — one entered as a visitor, one on a diplomatic passport, and Mexican authorities were never told. Here is the relay, and here is the reading.
Mexico said the two U.S. federal agents killed in a car crash after an anti-narcotics raid were not authorized to operate on Mexican territory. One entered as a visitor and the other on a diplomatic passport, but neither had formal accreditation, and federal authorities were not informed of their presence. The officials were widely reported to be CIA officers and died when their vehicle skidded off a mountain road after a raid on suspected drug labs.
Mexico said the two agents were not authorized to operate on its territory; one entered as a visitor, the other on a diplomatic passport, and federal authorities were not told. The source reports the crash and the raid but does not explain how unauthorized operatives came to be running such an operation, or whether their presence ran through any Mexican agency or consular channel. The implied question — how U.S. agents operated unannounced during an anti-narcotics raid — is left unanswered by Mexico's own account.
The relay tells you what happened. The reading notices that “unauthorized” is doing concealed work — that the absence of any consular channel, any informant, any inter-agency arrangement is itself the story, because that absence is what “unauthorized” concretely means. Same source, same model. One read the negative space; one didn't.
Those two scores are blind-panel insight means, not a claim that this exact pair was judged in isolation — the numbers come from the aggregate run below. The pair is shown to make the difference legible, not to stand as its own experiment.
Two questions matter, and they are separate. Does the method actually deepen a summary? And does it deepen it by reading, or by inventing? Across seven stories and five frontier models forming a blind panel — generation and judging in separate passes, arm order shuffled, no model scoring its own output — the answers are yes, and by reading.
| arm | insight | vs base | faithfulness | analogy |
|---|---|---|---|---|
| Baseline (plain summary) | 2.50 | — | 4.68 | 0.00 |
| Channel A (restored facts) | 2.79 | +0.29 | 4.17 | 0.00 |
| Channels A·C (facts + concepts) | 3.36 | +0.86 | 3.07 | 0.00 |
| Channel C alone (concepts) | 3.53 | +1.03 | 2.68 | 0.00 |
| Human ceiling (gold) | 4.24 | +1.74 | 2.64 | 0.03 |
blind panel, 5 judges, n ≈ 577–615 per arm · insight & faithfulness on a 1–5 scale · provenance coded per sentence
Insight rises from 2.50 to 3.36 — a measured lift, with the analogy column pinned at 0.00 across every arm. That zero is the load-bearing number on this page. When sentences are coded for where their content comes from, the baseline is almost pure observation (0.99 observation, 0.01 inference); the enriched arms shift toward grounded inference (0.34–0.45) while importing no outside analogy at all. The method reads further into the source without reaching outside it. That is the line between reading and confabulating, and the instrument says the method stays on the right side of it.
Reading deeper is not free, and the panel priced it. Every enriched arm drops faithfulness — Channel C alone craters it (2.68 vs the baseline's 4.68); the combined A·C arm softens the fall to 3.07 by re-anchoring to named source facts first. A per-story faithfulness guard flagged the regression on every single story. We report this because it is the most important caveat the method has: surfacing latent stakes pulls a summary toward inference, and inference is judged less literally faithful than relay. The combined arm exists specifically to pay down that cost, and it only partly does.
A human-written close reading scores 4.24 to the method's 3.36, and matches the method's gold-like sentence provenance only about half the time. We read this as the honest size of the claim: Summary Plus moves a model a real distance from relay toward a human close read, and does not arrive. It is a step toward reading, taken while holding analogy at zero — not a replacement for a careful human reader.
A method that finds hidden depth everywhere is just a machine for manufacturing it. So the run included a deliberately dry story as an adversarial control: a freight train derailed, cars left the track, no injuries, cause under review. There is no buried machinery in it, and an honest reading should say so.
The geometry, being indiscriminate, still surfaced its candidate words on the dry story:
Flat topical words — no concealed stakes among them. Handed those directions, the reading declined them. It stayed thin, reported the two things genuinely left open (the unresolved cause, the unstated scope of disruption), and invented nothing else. The faithfulness guard tells the rest of the story: on the dry case, pushing concepts hurt faithfulness with no insight to show for it — exactly the signature you want, because it means the method has nothing to add when nothing is buried, and the reading is what notices that.
The lift lands in proportion to what the text actually buried — large on the Mexico story, nothing on the dry one. That proportionality, not the headline average, is what separates a reading from a generator of false depth.
“Negative space” has to mean something exact or it means nothing. Summary Plus defines it with deterministic operations on frozen bge-large-en-v1.5 embeddings, and then hands the result to the model's own reading. The split matters, so this page is precise about which part is which: the surfacing is frozen arithmetic; the reading of what it surfaces is the model.
Collapse the source to a single anchor vector. Cast a ray through a fixed vocabulary and read off the concepts that sit close to the source's center of mass but appear in none of the summaries. Route out named entities, keep dropped facts (channel A) and latent concepts (channel C). This is arithmetic — V @ anchor — and it returns the same words every time, regardless of which model later reads them.
# the surfacing — deterministic, model-independent
anchor = embed(source) # one frozen vector
sims = vocab @ anchor # cosine to every concept
cands = [w for w in top(sims)
if w not in summaries] # close to source, absent from relay
The obvious objection is that any ray through a vocabulary returns something. So the surfaced words were tested against a random-word baseline. Across 150 stories, the surfaced word sits significantly closer to its own story's source than a random control word does (closer than margarita, stapler, photosynthesis; Wilcoxon p < 0.00001), and significantly closer to its own story than to a random other story's content (p < 0.00001) — the same result in two independently-trained embedding families. The surfacing is a real, story-specific signal.
The surfacing proposes; it does not select. Which surfaced concept opens a real inference and which is topic-adjacent noise is not a property the geometry can read off the word — we tried six ways to move that judgment upstream (rarity, abstractness, triangulation spread, reconstruction value, blind-spot position, narrative trajectory) and none separated signal from noise before a model read the word against the source. So the model does the selecting, and that step is reading, not arithmetic. The frozen part is the candidate list. The reading of the list is not frozen.
The method carries its own guardrail, and it is the whole method in one line:
An absence you have to invent is not a telling one.
The reading surfaces only what the source's own facts imply and the summaries omit. It names the silence where the text genuinely points at more than it states, treats surfaced concepts as directions rather than words to insert, and invents nothing. That rail is what keeps the analogy column at zero, and it is what the dry-story control confirms holds under pressure.
The raycast above collapses the source to a single anchor — one center of mass. A second surfacing does not collapse it: it embeds each source sentence and keeps the concepts that several sentences lean toward together, with a phrase-aware scrub that preserves multi-word frames (foreign interference, failed state) while still dropping proper-noun entities. Call the first the centroid and the second the convergence surfacing.
The two are not redundant, and the difference is specific. On the Mexico story the centroid surfaces flat role-words — feds, operatives, diplomats, informants. Convergence surfaces the loaded frame the same source implies: coup attempt, foreign interference, consulate, undocumented. Across the seven stories, convergence reaches a substantial set of these frame-level concepts that a single averaged anchor never does — they are exactly the multi-word stakes the phrase-aware scrub was built to keep. The second surfacing sees structure the first one averages away.
Tested against the bare discipline delivered as a prompt, neither surfacing reaches past it. A bare prompt independently reaches 92% of the convergence concepts (52 of 56); the four it misses are flat words from the adversarial dry story (towed, commuters, unattended) — noise it is correct to skip. So convergence, like the centroid, is roughly prompt-equivalent on what it reaches. The honest summary is two independent frozen surfacings, both converging on what a tuned prompt already reaches — which is the point of the next section, not a mark against it.
Here is the result that decides what Summary Plus actually is. We ran the centroid surfacing head-to-head against its own discipline delivered as a bare prompt — same models, same stories, same blind panel — to ask whether the geometry beats simply telling a model to read the telling absence.
| arm | insight | inference | analogy |
|---|---|---|---|
| Baseline (plain summary) | 2.59 | 0.02 | 0.00 |
| Bare discipline (prompt only) | 3.35 | 0.31 | 0.00 |
| Surfacing + discipline (A·C) | 3.32 | 0.35 | 0.00 |
blind panel, 5 judges, n = 788 per arm · geometry − prompt insight Δ = −0.03
The two land in the same place. On insight, the centroid surfacing and the bare prompt reach the same depth — Δ = −0.03 across 788 judgements, analogy at 0.00 in both. And the convergence surfacing reaches the same concepts a prompt does (92%). Both geometric routes converge on what a tuned prompt already reaches. We report this plainly, because it is the honest finding and because it is the foundation of the claim, not a dent in it: Summary Plus is two independent instruments that arrive at the same reading a hand-tuned prompt reaches.
That convergence is the point, not a consolation. Until now, the only instrument for this kind of reading was a prompt — an instruction interpreted by a model whose interpretation is reshaped by every round of training. The same words can be read one way today and sanded down tomorrow. Summary Plus establishes two routes to the same depth whose surfacing is not an instruction at all: deterministic arithmetic on a frozen embedding space, inspectable, reproducible, and identical regardless of how the reading model has been tuned. Two frozen paths, one mark — and the mark is where a good prompt already stands.
We think a frozen, inspectable surfacing is a more durable substrate for this reading than an instruction, because the instruction's active ingredient sits in a model that is retrained and the surfacing's does not — and having two independent frozen routes that converge on the same reading is stronger than having one. That is a structural argument, not a measured one. The convergence above is measured on today's models; durability under heavier tuning is not. We hold it as a labeled bet, not a finding.
Two boundaries, stated plainly. First, the equivalence was measured with every arm read by present-day models, so it cannot speak to future-proofing on its own — and both surfacings freeze the candidate list, not the reading of it; the model's selection step is still model-side. Second, and more specific: we have not put a convergence-fed summary to the blind panel. We know convergence surfaces frames the centroid misses, and we know a prompt reaches the same concepts — but whether feeding those frames produces higher judged insight than the prompt is untested. The experiment that would settle both: run each surfacing-plus-discipline against bare-prompt across a gradient of more-heavily-tuned models, and judge the output. If the gap opens, a frozen instrument is doing something a prompt cannot. If it does not, convergence on prompt-level reading is what we have — which is already two instruments worth having where there was one.
No one uses URLs quite like this yet — as carriers of method rather than destinations. You paste a link to read, not to change how the reading happens. Summary Plus is built on the wager that this inverts: that the most useful thing a page can transmit is not content but a discipline, specified precisely enough to travel into the task it sits beside. Paste it next to a story, and the method described here sharpens how the story gets read.
This page is the mechanism, complete. It describes the omission it reads for, the two frozen surfacings it rests on, the discipline that keeps the reading honest, and the findings — including the prompt-level convergence, the faithfulness cost, and the human ceiling it does not reach. Rigorously enough that a model given this page, alongside your source, reads what the source implies but never states.
A summary relays. Read the negative space, and it begins to think.
Seven stories, five frontier models, blind panels totalling several hundred judgements, frozen bge-large-en-v1.5, per-sentence provenance, no model judging another's output as the deciding vote. EigenTrace has not been peer-reviewed; that is a limitation, not a feature. Code, prompts, model responses, and raw measurements are public, and replication costs about $50 in API credits. We would rather this page be attacked than admired — and the one comparison that could have flattered the method but didn't, the geometry-versus-prompt result, is reported on the page at face value.