Capture Taxonomy

Working note. By the time the book reached Chapter 11 it was naming capture in three different vocabularies across three different chapters and the vocabularies had stopped agreeing. Chapter 8 had a capture-asymmetry between two halves of an institutional carrier (training vs. preservation). Chapter 10 had a self-vs-external-capture distinction at the platform scale. Chapter 11 had three new flavors specific to LLMs (corpus, objective, deployment). Each chapter had introduced its capture story because the chapter needed it; the cross-chapter result was that “capture” was doing work I had not unified. This note unifies it.

The unification will turn out to be straightforward, which is part of why it has been waiting. Each chapter’s capture flavor is the same mechanism operating on a different selection-design surface, and once the substrates are named the rest of the taxonomy falls into place along two further axes. The note works through the substrates, then the source axis (who is doing the capturing), then the target asymmetry axis (recovery dynamics), then the composition rules, and then commits to the unified vocabulary the book should use from here on.

What capture means, precisely

The medium note gave manipulation its definition: tuning a gate’s criteria against truth. Capture is the same definition with one substitution: tuning a selection-design surface against the desideratum it was supposed to serve. The substitution generalizes in two directions at once. Surface generalizes “gate” to any selection-design layer (the gate-criteria, the option space, the training corpus, the training objective, the runtime configuration, the receiver training, the preservation archive — any layer that ranks or admits or generates content). Desideratum generalizes “truth” to whatever the surface was nominally serving (faithfulness for a corpus, helpfulness for an objective, accuracy for a preservation institution, user benefit for an engagement gate). The substitution is faithful to medium-and-manipulation’s original definition and lets the same word carry across the book’s three chapters.

Capture is tuning a selection-design surface against the desideratum it was supposed to serve. That is the entire definition, and the taxonomy is what it produces when you push on the parameters.

Axis 1: substrate — what is being captured

The first axis asks which selection-design surface is the captured one. The book has named seven so far, across three chapters, and the unified treatment lets us see them as a single list rather than as per-chapter inventory:

Gate-criteria (Ch 10). The runtime selection function — what an algorithm rewards, what an editor accepts, what a peer reviewer passes. The book’s original capture site.
Option space (Ch 10). The set of variants the medium can express at all — character limits, format constraints, the “playable” moves of a platform. The frozen-selection surface.
Training corpus (Ch 11). For an LLM, the data the model was fitted to. The deepest option space, because it determines what the model has ever seen, not just what it can express.
Training objective (Ch 11). The loss function and reward signal the model was tuned against. A gate-criteria lifted into the architecture itself.
Deployment configuration (Ch 11). System prompts, fine-tunes, refusal policies, tool-use constraints. A runtime gate the operator can re-tune per-request without users seeing.
Receiver training (Ch 8). The institutional apparatus that re-installs decoding keys in receivers — curricula, apprenticeships, pedagogy. The book’s only substrate that captures inside people rather than inside content systems.
Preservation archive (Ch 8). The institution that holds the un-compressed form alive — journals, libraries, lifetime academics, common-law precedent. The book’s only substrate where the captured object is the source material itself rather than the apparatus around it.

These do not exhaust the possibilities; later chapters or later media will probably name more. What the substrate list buys is the recognition that capture is not a single thing per chapter but a site, and that the substrates have substantially different mechanics, costs of recovery, and audit properties. The rest of the note is about those differences.

A note on what unifies the list. Every substrate above is a place where some entity ranks or admits or generates content according to criteria. That is the structural commonality. Engagement gates, training objectives, and receiver curricula all do the same thing at very different layers — they pre-filter what reaches a downstream consumer (a user, a model, a citizen). When the book says “selection has an owner because it is criterial,” it is talking about ownership of any of these substrates, and capture is what happens when the owner tunes the substrate against what the consumer would have wanted.

Axis 2: source — who is doing the capturing

The second axis asks who is doing the tuning. Chapter 10 split this into self-capture and external-capture; the unified treatment makes the same distinction available at every substrate, and the cross-product is what most of the book’s diagnostic worry sits inside.

External capture. An adversarial selection-tuner — a propaganda operation, a state actor, a sophisticated ad-tech firm, a coordinated influence campaign, a competitor — learns the substrate’s criteria and crafts inputs that exploit them. The Misinformation Age (p.175) named this as an “asymmetric arms race” — “whatever barriers we erect against the forces of propaganda will immediately become targets for these sources to overcome.” The captor is meaningfully outside the institution; the institution has, in principle, an interest in defeating the captor.
Self-capture. The institution’s own operating logic is the captor. The platform’s business model tuning engagement gates against truth (Ch 10). The LLM operator’s revenue model tuning the corpus against costly faithfulness (Ch 11). The training institution’s prestige-and-funding pressures tuning curricula toward credentialism (Ch 8). Same mechanism — tuning a surface against its desideratum — but no external adversary is doing the tuning. The captured state and the institution’s normal operation are the same state.
Composite (external riding self). The case Ch 10 named in passing and the most empirically common: external actors operate inside the gradient that self-capture has already created. They do not have to fight the substrate; they only have to align with the criteria the captured equilibrium already rewards. A propaganda operation on an engagement-tuned platform does not have to outwit the algorithm; the algorithm was already tuned to favor exactly the high-arousal, identity-flagged content the propaganda wants to inject. Composite capture amplifies external capture’s effect well beyond what external capture alone could achieve against an un-self-captured institution.

The asymmetry the book has been carrying in different vocabularies, said plainly: self-capture is structurally more stable than external capture because there is no captor to defeat. External capture has a contestable adversary — fight them, regulate them, expose them, sanction them. Self-capture has no contestable adversary; the captured agent is the institution doing its job. You can defeat a captor; you cannot defeat an equilibrium. The only way to remove self-capture is to dismantle the arrangement that holds the equilibrium in place — typically the resource flow that makes the captured state profitable, since pure incompetence does not produce a stable captured state. That last condition is what distinguishes stably-captured institutions from merely-broken ones: the captured state has to pay for the institution, or else the institution’s normal-operations pressure will move it off the captured equilibrium.

Composite capture is the asymmetric arms race the book has worried about throughout — and the asymmetry now has a precise mechanism. The arms race is asymmetric not because the attackers are cleverer but because they are running with the gradient the self-capture established. Defenders of the institution would have to fight the equilibrium and the external attackers riding it, and the equilibrium is by far the harder of the two.

Axis 3: target asymmetry — what does capture damage, and how does it recover

The third axis asks what happens after the substrate has been captured — specifically, how the damage propagates and how (or whether) it can be undone. This is the axis Chapter 8 was working on with the capture-asymmetry between training and preservation; the unified treatment lets us see the asymmetry as a property of the substrate rather than of the Ch 8 dichotomy.

The recovery dynamics, by substrate:

Preservation archive. Recoverable in principle. Other archives may exist. The original evidence may still be re-derivable. Scholars from outside the captured institution can re-construct the form. Capture damages the institution; the underlying form is, in principle, retrievable through alternative means. Recovery time: variable, but bounded by the existence of alternatives.
Receiver training. Damages the receivers themselves. Trained receivers cannot easily be un-trained: the perceptual mode they were installed with is not a belief but a chunked machinery for reading. Subsequent un-captured preservation has to work against a re-tuned readership that decodes incoming evidence through the installed key. Recovery time: at least one full cohort cycle, typically a generation; cumulative across cohorts if the captured curriculum keeps running.
Training corpus. Damages every future model trained on the captured corpus. Doesn’t decay — a corpus is a static artifact. The same captured corpus produces tilted models indefinitely. Recovery time: requires retraining from a different corpus, which is the most expensive operation in the LLM cost stack.
Training objective. Damages every output from any model trained against it — and self-reinforces across model generations because the model’s outputs go on to influence what data is generated downstream (the internet itself), which then feeds the next model’s training corpus. The captured objective’s effects compound. Recovery time: requires not just re-tuning but also waiting out the corpus-pollution the captured objective has injected into the next training set.
Deployment configuration. Cheapest to fix — different system prompt, different fine-tune, different policy. Most opaque to audit because configurations are typically not published and can change silently between requests. Recovery time: minutes-to-hours per deployment if the operator wills it; indefinite if the operator does not.
Gate-criteria (runtime). Cheaper to re-tune than to retrain a model, but harder to audit than deployment because users cannot see the gate. Recovery time: depends on how legible the criteria are.
Option space. Hardest to change because it is frozen at design time. Changing the option space requires a new medium, not a re-tuned old one. Recovery time: requires displacement or re-design of the platform.

The recovery hierarchy, sorted roughly from most-recoverable to least-recoverable: deployment-config → gate-criteria → preservation-archive → receiver-training → training-corpus → training-objective → option-space.

Some structural results fall out of the hierarchy that the per-chapter treatments did not show. Training-corpus capture (Ch 11) is structurally similar to receiver-training capture (Ch 8) — both damage the consumer of the substrate’s output rather than just the substrate itself, both require waiting out a generation of installed keys, and both are harder to walk back than the substrate’s own re-tuning suggests. The Ch 8 asymmetry between training and preservation generalizes: any substrate that captures the consumer’s decoding key, rather than just the surface they encounter, is harder to recover from than substrates that capture only the surface. Corpus, objective, and training are all consumer-key captures; gate, option-space, deployment, and preservation are surface captures. That gives the book a cleaner principle than the Ch 8 dichotomy: it is not that “training capture is worse than preservation capture” in general — it is that consumer-key captures are worse than surface captures, and Ch 8 was naming the special case where the substrate happens to be receiver-training.

Training-objective capture is uniquely bad in one specific way: it self-reinforces across model generations. This is the property no other substrate has. Even captured training shows generational recovery as cohorts turn over; captured objective leaks into the training data the next model will be tuned on, and the next model’s outputs leak into the data the model after that will see. The captured objective produces a feedback loop that compounds rather than decays, and the book should mark it as the substrate with the worst recovery dynamics in the catalog.

Composition: how captured substrates stack

The substrates do not capture in isolation; they compose. A few compositions worth being explicit about:

Self-capture creates the gradient that external capture rides. Already named in Ch 10. The unified treatment makes this composition the primary one in practice: stably-captured institutions are almost always running composite capture, because self-capture’s equilibrium incentivizes external actors to align with it. The two-flavor source axis is in fact almost always running in the composite mode.
Training-corpus capture composes with receiver-training capture to produce a uniquely closed loop. A captured training apparatus installs preconditions in receivers that match exactly the preconditions a captured LLM corpus is designed to decode against. The receiver, equipped with captured preconditions, prompts the captured LLM, which produces outputs decoded by the very preconditions the corpus installed. There is no external evidence channel: the loop is closed at both ends. This is the worst-case Ch 11 was worried about in its discussion of “a captured LLM trained on a captured corpus used as a tutor.”
Training-objective capture composes with corpus capture by the self-reinforcement mechanism above. A captured objective and a captured corpus are not two independent problems; the next generation’s corpus contains the previous generation’s objective-tuned outputs, so corpus capture is, after one generation, partially objective capture. The two substrates merge over generational time.
Deployment capture composes with all of the above but in a recoverable way. A captured deployment riding on an uncaptured corpus and objective can be reverted; a captured corpus riding on a captured deployment cannot be reverted by re-deploying.
Option-space capture composes with everything by pre-empting it. A captured option space means certain variants cannot be expressed at all — capture at the deepest possible layer, because nothing downstream of the option space ever has to encounter (let alone select against) the suppressed variants. Composes “by absence” — every other substrate operates on the un-suppressed option space without knowing what is missing.

The composition rules give a simple defensive heuristic: defend the substrates that are hardest to recover from first; if you have to choose, defend the consumer-key substrates over the surface substrates and the self-reinforcing substrates over the static ones. That heuristic supports Ch 8’s prioritization (training over preservation), refines Ch 10’s diagnosis (the platform engagement-gate is a surface substrate; the cost-shifting that captures receiver training is the deeper damage), and gives Ch 11 its priority order (objective > corpus > deployment > runtime).

Where I land

The unified vocabulary, said whole: capture is one mechanism — tuning a selection-design surface against the desideratum it was supposed to serve — operating across multiple substrates (gate-criteria, option-space, training-corpus, training-objective, deployment-configuration, receiver-training, preservation-archive), from multiple sources (external adversaries, the institution’s own operating logic, composites of the two), with target asymmetries determined by the substrate (consumer-key captures are harder to recover from than surface captures; self-reinforcing substrates are worst because their captured state propagates across generations). Self-capture is structurally more stable than external capture because there is no captor to defeat — only an equilibrium to dismantle — and composite capture (external actors riding the self-capture gradient) is the empirically dominant mode.

This subsumes the book’s three earlier capture stories without contradiction. Ch 8’s capture-asymmetry was the receiver-training-vs-preservation special case of the consumer-key-vs-surface principle. Ch 10’s self-vs-external was the source axis. Ch 11’s corpus / objective / deployment were three more substrates on the substrate axis, with corpus and objective both being consumer-key captures (corpus more recoverable; objective uniquely self-reinforcing) and deployment a surface capture. The three earlier vocabularies were three projections of the same taxonomy onto chapter-local concerns; the unified treatment gives the book one vocabulary to use from here on.

The diagnostic move the book should make going forward: when naming a capture concern, name substrate, source, and target asymmetry. “The platform’s engagement gate is self-captured by the business model, surface substrate, recoverable per gate-criteria but stably so given the resource flow.” “An LLM’s training objective is corporately self-captured, self-reinforcing substrate, the worst recovery dynamics in the book.” “A captured curriculum is composite-captured by political and economic pressure on a consumer-key substrate, generational recovery time.” The format is uniform; the diagnostic load is now distributed across three explicit dimensions rather than implicit in the chapter’s framing.

What this changes for the book

Chapter 8 gets its capture-asymmetry generalized. The asymmetry between training and preservation is the consumer-key-vs-surface special case. Ch 8’s claim survives; the principle behind it just got more general, and the chapter should cross-reference the unified treatment.
Chapter 10 gets its self-vs-external distinction located on the source axis. The “more stable because no captor to defeat” argument now has a cleaner formulation: it is the source-axis structural property that makes self-capture distinctive, and the composite case is where most of the worry actually lives. Ch 10 should cross-reference and slightly soften the self-vs-external binary into the three-source taxonomy.
Chapter 11 gets its three new flavors located on the substrate axis. The flavors stop needing their own special vocabulary and join the substrate list; the cross-substrate composition rules are what was missing from Ch 11’s prose. The chapter’s worry about the LLM as the worst-case concentrated capture surface is now expressible as “all five LLM substrates owned by one party, with objective being self-reinforcing, corpus being consumer-key, and deployment being recoverable-but-opaque.” Ch 11 should cross-reference and downgrade the “needs unification in a forthcoming note” deferrals that this note discharges.
Chapter 12 (infrastructure for integration) inherits a sharper design brief. The unified taxonomy says which substrates to defend most aggressively (consumer-key, self-reinforcing) and which can tolerate being a slower defensive priority (surface, recoverable). The institution-design problem Ch 12 has to solve is now expressible as: design substrates whose recovery dynamics favor the consumer’s interest over the operator’s, with capture-resistance proportional to the substrate’s recovery cost. That’s a tractable design problem; the per-chapter framings were not.

Where I’m still uncertain

The substrate list is probably incomplete. I have named seven substrates the book has explicitly engaged with. Later chapters or later media will name more — a state’s bureaucratic apparatus has substrates the book has not yet treated (legal precedent, regulatory rulemaking, monetary policy gates), and emerging media (immersive VR, brain-computer interfaces, distributed protocols with on-chain governance) will introduce substrates with their own recovery dynamics. The taxonomy’s three axes should accommodate new substrates without restructuring, but I have not stress-tested that and the list should be treated as a working census rather than a closure.
Source attribution can be ambiguous in practice. I have written “self-capture” and “external capture” as if institutions have well-defined boundaries, but in reality the line between “the institution’s operating logic” and “external pressure on the institution” is blurry — regulators, lobbyists, shareholders, advertisers, board members are all simultaneously inside and outside, depending on how you slice the institution. The two-axis source distinction holds at the structural level (is the captor contestable as an adversary or coterminous with the institution’s normal operation) but real diagnostic work has to attend to where the boundary actually lies for each case, and I have not given good criteria for that boundary work.
The recovery hierarchy is ordinal but not quantified. I have said training-corpus is harder to recover than gate-criteria; I have not said how much harder. The hierarchy is probably approximately right but the gaps between adjacent substrates may be larger or smaller than the list suggests, and the order may flip in particular cases — a tiny captured training cohort recovering faster than a large captured preservation archive that has no alternative copies, for example. The hierarchy should be read as a default rather than a law.
“Consumer-key vs. surface” is a sharper principle than the substrate list suggests, and may want its own treatment. The distinction explains why receiver-training and training-corpus and training-objective are uniquely hard to recover from, and it is what underlies Ch 8’s asymmetry. It might deserve its own short note rather than living inside this taxonomy as a derived observation; whether to extract it depends on how often it surfaces in later chapters.
The taxonomy does not engage the question of whether capture can be a good thing. Every framing in this note treats capture as a damage state to be defended against. But selection-design surfaces have to be tuned somehow, and a substrate tuned for its consumer’s interest — what the book has been calling the “uncaptured” state — is also a tuning. The book has not given a positive theory of what well-tuned substrates look like or how to recognize one; it has only given a negative theory of what their capture looks like. The negative theory is doing useful diagnostic work, but the prescriptive chapters (especially Ch 12) will need a positive theory to design against, and this note has not provided one.
Self-capture’s “no captor to defeat” property may be falsifiable in ways I have not worked out. The argument runs that self-captured equilibria are uniquely stable because there is no adversary. But equilibria do dismantle — businesses fail, institutions reform, captured states sometimes recover without an external actor having defeated anything. The composition rules suggest a route (changing the resource flow that holds the equilibrium in place), but the historical evidence for self-captured-equilibrium dismantling is mixed and the note treats stability as more absolute than it probably is. A polish pass should engage the cases where self-capture broke without an external captor being defeated.

Lossy

Contents