The Dartmouth Cut

April 22, 2026 · notebook sketch · AI / AGI Eliminations (Part 1 of 3)

What does AI and AGI mean?

I never understood what “AI” (even more so “AGI”, which is simply a case of compounding adjectives as a clear effect of exaggerating the base term, implying there are multiple variants of intelligence at play) means. If “AGI” is “Artificial General Intelligence”, then what does “AI” actually mean? If intelligence can be “general”, can it also be “specialized” (the opposite view), or “hybrid”? What does the lack of “general” imply for the term “AI”? A “base” (unnamed) intelligence? A “default” intelligence? The terms seem confusing on their own, and there is no clean agreement or definition over the base noun — “intelligence”. If the base noun is undefined, every adjective hung on it inherits the vagueness.

I was curious to understand how these terms first came to exist in humanity’s history, and more than that, what led to their existence — the circumstances around them. I had a feeling that even the very roots that gave rise to the words came from false assumptions.

The history is complicated and interwoven, but worth unraveling in order to understand the basis to which these terms owe their existence. I’ll start with the first fork in the term itself, and all its implications and eliminations. Future forks have their own dedicated posts.

Dartmouth, 1956

The earliest documented appearance of “Artificial Intelligence” as a term is in the August 31, 1955 funding proposal John McCarthy submitted to the Rockefeller Foundation, co-authored with Marvin Minsky, Nathaniel Rochester (IBM), and Claude Shannon (Bell Labs), titled “A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence.” The proposal opens with a claim that became the field’s foundational article of faith: “We propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.” The conference itself ran across roughly six to eight weeks of summer 1956, with attendees including Allen Newell, Herbert Simon, Oliver Selfridge, Ray Solomonoff, Arthur Samuel, Trenchard More, and Julian Bigelow drifting through.

The “Artificial Intelligence” naming was a deliberate institutional move and McCarthy was explicit about it in later interviews and writings. The conversational space at the time was already named: Norbert Wiener’s Cybernetics: Or Control and Communication in the Animal and the Machine (Hermann/Wiley, 1948; MIT Press second edition 1961) had been a bestseller and given the field its dominant frame; John von Neumann’s late lectures on self-reproducing automata gave it a competing one. The Macy Conferences on Cybernetics (1946–1953, ten meetings sponsored by the Josiah Macy Jr. Foundation, chaired primarily by Warren McCulloch) had been the cross-disciplinary intellectual home where Wiener, McCulloch, Walter Pitts, Ross Ashby, Heinz von Foerster, Margaret Mead, Gregory Bateson, Claude Shannon, and von Neumann hammered out the cybernetic synthesis. McCarthy chose “Artificial Intelligence” specifically to distance from “cybernetics” (too associated with Wiener personally and with biological framing), from “automata theory” (too narrow, pure-math feel), and from Newell and Simon’s competing label “Complex Information Processing” (which McCarthy thought sounded engineering-only).

How institutional was the renaming versus how substantive? Historians have not converged. The most extensive primary-source treatment is Pamela McCorduck’s Machines Who Think (W.H. Freeman, 1979; revised A.K. Peters, 2004), built on interviews with the Dartmouth participants while most were still alive. Daniel Crevier’s AI: The Tumultuous History of the Search for Artificial Intelligence (Basic Books, 1993) tracks the consequences over the next four decades. Nils Nilsson’s The Quest for Artificial Intelligence: A History of Ideas and Achievements (Cambridge University Press, 2010) is the most comprehensive technical-historical reference. Paul Edwards’ The Closed World: Computers and the Politics of Discourse in Cold War America (MIT Press, 1996) provides the institutional reading I find most persuasive: ARPA’s formation in 1958 (in response to Sputnik) and its subsequent funding of symbolic-AI research at MIT, Stanford, CMU, and SRI created enormous structural pressure to favor symbolic, military-procurement-friendly framings over the messier biological-feedback framings of cybernetics. McCarthy’s renaming was not just personal preference; it landed the field on the side of the funding stream that won the Cold War’s information-war round. The substantive reframe and the institutional reframe weren’t separable.

The substantive reframe itself: cybernetics had treated the mind as circular causality — organism-environment feedback, homeostasis, self-organization, second-order observation. McCarthy’s “Artificial Intelligence” reframed the mind as symbol manipulation in a sealed box. The cut was theoretical initially, even if driven by mathematical taste; the institutional consolidation followed. What got eliminated once the distinction between AI and cybernetics was established: embodiment as constitutive rather than peripheral, self-reference as the engine of cognition, the observer as part of what’s observed, continuous dynamics on analog substrate, communication as coupling rather than transmission, purpose as a structural property of dynamical systems rather than an external goal. These weren’t just forgotten topics — they were modes of being, entire ontological stances toward what a mind is.

What follows is large in scope, because those eliminations compound and interact.

The observer problem in science

Second-order cybernetics (von Foerster, Pask, Maturana, Varela, later Glanville) took the observer to be part of what’s observed. Knowing is a construction by a system that is itself being constructed by what it knows. Heinz von Foerster ran the Biological Computer Laboratory (BCL) at the University of Illinois Urbana-Champaign from 1958 to 1976, and the BCL was the institutional home where the second-order move was developed in detail — through visiting fellowships hosting Maturana, Varela, Pask, Spencer-Brown, Lars Löfgren, and many others. Von Foerster’s Observing Systems (Intersystems, 1981) collected the foundational papers; Understanding Understanding (Springer, 2003) is the late synthesis. The argument: any account of a knower has to include the knower’s own knowing of itself within the account. There is no view from nowhere.

Post-Dartmouth AI restored the classical Cartesian split: researcher outside, system inside, behavior observable from a neutral vantage. The question “what kind of entity must a knower be in order to know itself” became institutionally unaskable in mainstream cognitive science for five decades. When it reappeared — in Hofstadter’s strange loops (Gödel, Escher, Bach, Basic Books, 1979; I Am a Strange Loop, Basic Books, 2007), in Metzinger’s self-models (Being No One, MIT Press, 2003), in Friston’s self-evidencing (the free-energy principle papers, 2006 onward) — it had to be reinvented from scratch, usually without knowing the cybernetic prior art existed. There is also a structural parallel in physics: Wheeler’s “no phenomenon is a phenomenon until it is an observed phenomenon” and Bohr’s complementarity were saying the same thing in a different language, a generation earlier; mainstream cognitive science wasn’t allowed to notice. An existing program I am researching aims to formalize self-reference-as-squaring (work in progress on my blog’s research programs); it is doing in 2026 what Spencer-Brown, von Foerster, and Varela had as foundational axioms in 1969. That’s a 55-year debt the field paid for the Dartmouth rename. Modern AI systems inherit the stratified view directly: a transformer’s training loop sits structurally outside the model, and the model has no formal access to the loop that produced it. The capacity to be reflexively about itself — which is what a self-aware system does by definition — has to be hand-engineered around an architecture that was designed to lack it.

Management and governance as cybernetic practice

Stafford Beer built the Viable System Model (VSM) — a serious theory of organizational cognition with five recursive levels (operational, coordination, control, intelligence, policy), explicit variety-engineering constraints, and real-time operational criteria. Beer’s Brain of the Firm (Allen Lane, 1972) introduced the VSM; The Heart of Enterprise (Wiley, 1979) developed it; Diagnosing the System for Organizations (Wiley, 1985) was the practitioner’s manual. Cybersyn (Project Cybersyn, Chile 1971–73) was a full-scale cybernetic management of a national economy under the Allende government — telex links between factories and the central operations room, a Bauhaus-influenced control room with seven swivel chairs, daily statistics via a network of teleprinters, real-time crisis-response capacity. Cybersyn helped manage the October 1972 truckers’ strike effectively enough to keep distribution running when the road network was paralyzed — actual evidence the system worked under stress before the September 1973 Pinochet coup destroyed it. Eden Medina’s Cybernetic Revolutionaries: Technology and Politics in Allende’s Chile (MIT Press, 2011) is the definitive academic history.

Management cybernetics as a serious field essentially died with Cybersyn. The replacement — spreadsheet-managerialism, shareholder-value theory, KPI optimization — is cybernetically naïve: it treats organizations as input-output machines rather than as recursive viable systems, which is a category error that has produced decades of predictable failures (over-measurement destroying tacit knowledge, optimization collapsing requisite variety, feedback loops ignored until they produce crises). The only cybernetic management system to scale globally is the Toyota Production System — jidoka (autonomation), the andon cord (any worker can halt the line), kaizen, just-in-time — which is essentially Ashby’s law applied to manufacturing under a different vocabulary. James Womack, Daniel Jones, and Daniel Roos’s The Machine That Changed the World (Free Press, 1990) documented its dominance over mass-production manufacturing; the West has been trying to copy it for thirty years with partial success, mostly because the underlying cybernetic ontology doesn’t translate to organizations built on Taylorist assumptions. Modern attempts at cybernetic management — Sociocracy 3.0, Holacracy (Brian Robertson, 2015) — are echoes that lack the rigor Beer brought.

I should note honest limits here. Beer’s claims for VSM weren’t always well-validated; the model’s predictive power has been hard to test outside Cybersyn. Some of Beer’s later work drifted into the kind of all-encompassing systems-mysticism that gave cybernetics a bad name. But the loss isn’t his — it’s the loss of the rigorous core. An entire theory of how collectives can think had a working prototype, got killed politically, and was not seriously attempted again at scale.

Ashby’s Law of Requisite Variety

“Only variety can destroy variety.” A regulator must have at least as much variety as the system it regulates to maintain control. This is a quantitative theorem, formalized in W. Ross Ashby’s An Introduction to Cybernetics (Chapman & Hall, 1956 — published the same year as the Dartmouth conference, with no overlap in citation between the two literatures from then on). The mathematical statement: V(R) ≥ V(D), where V is variety (the number of distinguishable states), R is the regulator, and D is the disturbance source. The deeper version is the Conant-Ashby good regulator theorem (“Every Good Regulator of a System Must Be a Model of That System,” International Journal of Systems Science, 1970): any regulator that successfully controls a system is, formally, a model of that system. To regulate, you must model.

Post-Dartmouth, Requisite Variety survived narrowly in control engineering but disappeared from cognitive science. Consequence: we have no common theoretical language for why intelligence must match the complexity of its environment. The modern handwave — “just scale parameters” — is Requisite Variety without the rigor: we’re throwing variety at problems without asking what variety structure the environment actually has. Scaling parameters in a transformer increases V(R) — the variety of internal states the model can occupy — but it doesn’t address V(D), the variety of the environment the model has to regulate. If the environment’s variety includes recursion, self-reference, and time-dependent novelty (which it does, when the environment includes other agents and the model itself), then no amount of static-parameter increase will close the gap, because the relevant variety is generated by the interaction, not by the model alone. Karl Friston’s free energy principle (e.g., “The free-energy principle: a unified brain theory?”, Nature Reviews Neuroscience, 2010) is essentially the modern probabilistic generalization of Ashby — minimizing surprise is regulating against environmental variety — but mainstream ML treats Friston as adjacent niche work rather than as the formal framework the field is missing.

Umwelt and ecological perception

Jakob von Uexküll’s Umwelt — each organism constructs its own phenomenal world from the subset of environment its sensors and effectors engage — was foundational for ethology, cybernetics, and early Merleau-Ponty. Von Uexküll’s Umwelt und Innenwelt der Tiere (Springer, 1909) introduced the concept; A Foray into the Worlds of Animals and Humans (translated by Joseph O’Neil; original Streifzüge durch die Umwelten von Tieren und Menschen, 1934) is the accessible version, with its famous example of the tick — a creature whose entire phenomenal world consists of three stimuli (butyric acid from mammalian skin, body warmth, and contact with mammalian fur), arranged in sequence. The tick is not a deficient human; it is a complete organism with a complete world that happens to be three-dimensional in a different sense than ours. Maurice Merleau-Ponty’s Phenomenology of Perception (Gallimard, 1945) absorbed von Uexküll’s insight into the Western philosophical mainstream — perception is not representation of an objective world; it is the lived coupling of organism and environment. James Gibson’s ecological psychology (The Senses Considered as Perceptual Systems, 1966; The Ecological Approach to Visual Perception, 1979) extended it with affordances, direct perception, and optic flow.

Post-Dartmouth cognitive science needed internal representations of an objective external world, which is the opposite ontology. The exile of Umwelt-thinking lasted roughly fifty years. Andy Clark and David Chalmers’ “The Extended Mind” (Analysis, 1998) and the broader 4E (embodied, embedded, extended, enactive) cognition movement — Evan Thompson’s Mind in Life (Belknap/Harvard, 2007), Shaun Gallagher, Alva Noë, Hanne De Jaegher’s participatory sense-making — have been clawing this back slowly, mostly outside the AI mainstream.

The modern AI implication is sharp. Every LLM’s “world model” is built on a category mistake by Umwelt’s lights: there is no substrate-neutral world to model; there is only a substrate’s coupled engagement with what it can engage. A transformer trained on text is not building a model of “the world”; it is building a model of the statistical distribution of human-generated text, which is one specific Umwelt’s residue. The fact that the model performs well on text-prediction tasks tells us about the residue, not about the underlying world the residue was generated by. Bender and Koller’s “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data” (ACL 2020) made the form-versus-meaning version of this argument carefully, and most ML researchers either dismiss it or absorb it as “well, of course, but that’s just an engineering challenge.” It isn’t; it’s a constitutive limit that follows from the Umwelt frame the field discarded.

Bateson’s ecology of mind

Gregory Bateson treated mind as extending beyond skin — the thinking is in the whole circuit of organism + environment + artifact + other minds. Steps to an Ecology of Mind (Chandler, 1972) is the canonical collection; Mind and Nature: A Necessary Unity (Dutton, 1979) is the late synthesis. The double-bind theory of schizophrenia (Bateson, Don Jackson, Jay Haley, John Weakland, “Toward a theory of schizophrenia,” Behavioral Science, 1956) was a precise structural account — pathology arising from contradictory injunctions at different logical levels in a relationship that cannot be exited — that became foundational for most of what later became family systems therapy.

Bateson’s deepest contribution is also the most relevant to the I=E framework I’ve been developing on this blog. His definition of information — “a difference that makes a difference” (Steps, 1972) — predates and exceeds Shannon’s. Shannon’s bit measures the resolution of an uncertainty; Bateson’s bit measures the consequence of a distinction in a coupled system. The two definitions look similar but aren’t. Shannon’s information is independent of the receiver; Bateson’s information requires a receiver for which the difference matters. Information=Elimination, the thesis under which all my research programs sit, is closer to Bateson’s reading than to Shannon’s: an elimination only counts as information for a system that the eliminated alternative would have made a difference for. The Dartmouth cut chose Shannon. The cost is that the field cannot easily talk about what it means for an LLM token to “matter” to the model, because the underlying information theory presupposes the model isn’t a system that can be mattered to.

Bateson died in 1980 a marginal figure. Edwin Hutchins’ Cognition in the Wild (MIT Press, 1995) partially recovered the distributed-cognition idea through the case study of ship navigation as a cognitive system distributed across the navigator, the bridge crew, the chart, the pelorus, and the ship itself. But Hutchins’ recovery was without the ecological and pathological dimensions Bateson had developed — the systemic understanding of mental health, communication pathology, and trans-individual cognition was replaced by DSM categories and intra-skull computation. Modern research on multi-agent LLM systems is reinventing distributed cognition badly, because the prior theory is unread.

Pask’s conversation theory

Gordon Pask built cybernetic teaching machines in the 1950s and ’60s — most famously SAKI (the Self-Adaptive Keyboard Instructor), an early adaptive tutoring system, and the Musicolour, a sound-driven light-display system that adapted to musicians’ performance — and developed a mathematical theory of how two cognitive systems come to share understanding through actual coupled interaction. Conversation, Cognition and Learning (Elsevier, 1975) is the foundational text; Conversation Theory: Applications in Education and Epistemology (Elsevier, 1976) the applied companion. The mathematical core was entailment meshes — graph structures representing topic relationships that two interacting systems progressively align. The deep claim: conversation is the constitutive unit of cognition, not just a channel for pre-existing cognition. Cognition does not happen in heads and then get communicated; it happens in the coupling.

Nearly forgotten. Ranulph Glanville carried the work forward (The Black Boox, three volumes, 2009–14, collected papers; “A Ship Without a Rudder,” 1997). The consequence: we have no rigorous theory of how distributed or collective cognition actually works at the coupling level. LLM multi-agent frameworks (AutoGen, LangGraph, CrewAI, the entire 2024–26 wave) are reinventing fragments poorly because the prior theory is unread. They treat agents as sealed processes exchanging messages, which is exactly the Shannon-Weaver model Pask’s conversation theory was an alternative to. Pask would have asked: what topic structures are the agents actually aligning, and what shared entailment mesh emerges from their interaction? Modern frameworks don’t ask, because the vocabulary doesn’t exist in the relevant departments.

Autopoiesis

Humberto Maturana and Francisco Varela’s 1972 theory: a living system is one that continuously produces the network of components that produces it. Cognition is not representation of an external world; it is the maintenance of organizational closure under perturbation. “Living is knowing.” The original Spanish text was De Máquinas y Seres Vivos: Una Teoría Sobre la Organización Biológica (Editorial Universitaria, Santiago, 1972); the English version, expanded, is Autopoiesis and Cognition: The Realization of the Living (D. Reidel, 1980). The Tree of Knowledge: The Biological Roots of Human Understanding (Shambhala, 1987) is the accessible introduction. Varela’s later collaborations — The Embodied Mind: Cognitive Science and Human Experience with Evan Thompson and Eleanor Rosch (MIT Press, 1991) — extended the framework into cognitive science, and his neurophenomenology paper (“Neurophenomenology: A Methodological Remedy for the Hard Problem,” Journal of Consciousness Studies, 1996) attempted the bridge to first-person experience.

Autopoiesis was banished from AI as vitalism-adjacent — a term of art that smuggled in spookiness, in the symbolic-AI reading. The reading was unfair; the actual content of the theory is rigorously mechanistic, just non-Turing-machine in its primitives. The consequence: the difference between a system that maintains itself (cell, organism, arguably a brain) and one that is maintained externally by engineers (a transformer needing constant power, data refresh, and redeployment) was declared irrelevant to the theory of intelligence. It is not irrelevant. It may be the difference. Karl Friston’s free energy principle is, formally, a probabilistic restatement of autopoiesis — a system minimizing surprise is a system maintaining its own organizational closure against perturbation — but the cybernetics-aware reading of Friston is rare. My I=E framework treats autopoiesis as the substrate condition for elimination to be cognitive at all: only an autopoietic system can perform endogenous elimination, because only an autopoietic system has a self whose viability the elimination is for. The river eliminates possibilities (carves one channel out of many) but isn’t autopoietic and so isn’t cognitive; the bacterium is. The Dartmouth cut made this distinction unavailable in the field that most needs it.

Analog computation as cognitive paradigm

Pre-Dartmouth, the differential analyzer (Vannevar Bush at MIT, completed 1931 — a room-sized analog mechanical integrator), Douglas Hartree’s electromechanical models for atomic-physics calculations, and early neural analog computers were serious candidates for a theory of mind. Cybernetics was deeply braided with analog computation; the architectures matched the theory. The Dartmouth choice of symbolic/digital was also implicitly a choice against continuous dynamics. Analog computing nearly died as a field for forty years. It is reviving now under energy pressure: Carver Mead’s Analog VLSI and Neural Systems (Addison-Wesley, 1989) founded the modern neuromorphic engineering tradition; Mythic, Rain Neuromorphics, Lightmatter (photonic), and IBM’s TrueNorth/NorthPole programs are attempts to put it into silicon. Reservoir computing (Tanaka et al., “Recent advances in physical reservoir computing: A review,” Neural Networks, 2019) uses actual physical dynamical systems — fluid surfaces, atomic spin networks, even buckets of water — as substrates for computation, exploiting their natural dynamics rather than simulating them digitally.

Sixty years of potential research into continuous cognitive architectures were lost. The question “is cognition better understood in continuous dynamical-systems terms or discrete symbolic terms” was settled by fiat in 1956, not by evidence, and the evidence is now beginning to come in against the symbolic side: brains are continuous, energy-efficient, noise-tolerant analog systems, and the energy gap between biological neural computation and digital simulation of neural computation is roughly nine orders of magnitude. If the substrate matters — and the analog tradition says it does — then the symbolic-only paradigm has been spending those nine orders of magnitude on the wrong question.

Control theory as theory of cognition

William T. Powers’ Perceptual Control Theory inverts the standard model: organisms don’t control their outputs, they control their perceptions, and behavior is whatever varies so perceptions stay at reference values. Behavior: The Control of Perception (Aldine, 1973; second edition Benchmark, 2005) is the foundational text. The inversion is consequential. In standard input-output behaviorism (and in standard reinforcement learning), the system’s outputs are the controlled variable and the environment is what the outputs act on. In PCT, the system’s perceptions are the controlled variable and the outputs are whatever the system has to do to keep the perceptions at reference. This is cybernetics applied directly to psychology, and it predicts experimental results — particularly Tim Carey’s Method of Levels Therapy (The Method of Levels: How to do Psychotherapy Without Getting in the Way, Living Control Systems, 2006), which has clinical evidence in the depression and anxiety literature — that standard models predict only awkwardly.

PCT survives in a tiny community. Cognitive science and control theory should be one field; they are two, and neither talks to the other. Robotics bridges them partially, but robotics has been absorbed by ML under the current regime. The deepest connection nobody draws explicitly: Karl Friston’s active inference is essentially PCT in probabilistic dress — minimize the prediction error between perception and reference, where the reference is generated by the system’s own generative model. PCT got there fifty years earlier without the probabilistic machinery. In I=E terms, perceptual control is endogenous elimination at the perceptual level — the system eliminates the possibilities that would put its perceptions away from reference. This is what an organism does. It is not what a transformer does, because a transformer does not have perceptions whose reference values it is defending; it produces outputs the next layer or the loss function evaluates.

Self-reference as constitutive rather than paradoxical

Russell, Tarski, and the analytic tradition treated self-reference as a source of paradox to be defanged by type hierarchies or metalinguistic stratification — Principia Mathematica (Whitehead and Russell, 1910–13) proposed the theory of types specifically to block the Russell paradox by stratifying classes; Tarski’s “The Concept of Truth in Formalized Languages” (1933) erected the language/metalanguage hierarchy to block the liar paradox. Both moves treated self-reference as a disease and stratification as the cure. Cybernetics, Spencer-Brown’s Laws of Form (Allen & Unwin, 1969), Hofstadter’s Gödel, Escher, Bach: An Eternal Golden Braid (Basic Books, 1979) and I Am a Strange Loop (Basic Books, 2007), Varela’s “A Calculus for Self-Reference” (International Journal of General Systems, 1975), and later Louis Kauffman’s work on knot theory and the algebra of distinction treated self-reference as what minds do — the recursive closure is the cognition.

Post-Dartmouth computer science inherited the Russell-Tarski stratified view, which is exactly why every modern AI architecture is stratified: the trained model is one logical level, the training procedure that produced it is another, the evaluator that scores the output is a third. The model has no formal access across the levels. A whole way of seeing mind as productive circularity rather than hierarchical computation was locked out of formal treatment. Everyone who works seriously on self-reference today is reconstructing what cybernetics had as native vocabulary. My own self-eliminating-observer research program (active on this blog, with the formal theory of an observer that survives its own elimination operation as a fixed-point construction) is one such reconstruction. I read Spencer-Brown for the first time in my late twenties and found 1969 had already done what I thought was 2024 work. That’s a fifty-five-year debt the cut produced in one specific corner of formal theory.

Communication as coupling, not transmission

Claude Shannon’s “A Mathematical Theory of Communication” (Bell System Technical Journal, 1948), expanded in The Mathematical Theory of Communication with Warren Weaver (University of Illinois Press, 1949), gave the field the canonical sender-channel-noise-receiver model. The model was a triumph for engineering — the design of error-correcting codes, the entire telephone-and-internet infrastructure — but it was a metaphor with respect to meaning, and the metaphor escaped engineering and bled into linguistics, communication studies, sociology, and eventually ML.

Cybernetics had a different model, native to it: communication is mutual structural coupling, in which “meaning” is the coordinated perturbation of two recursively closed systems, not a payload transferred across a wire. The bacterial quorum-sensing molecule is “meaningful” to a bacterium that has the receptor — to others, it’s just a chemical. The English sentence “the cat sat on the mat” is meaningful to a system that has a coupled history with English — to a system that doesn’t, it’s a sequence of bytes. Meaning is constituted in the coupling, not transmitted through it. Niklas Luhmann’s sociology preserved a version of this — Social Systems (Stanford, 1995, German original 1984), Theory of Society (Stanford, 2012/2013). Maturana and Varela’s structural coupling concept was the biological version.

The consequence: most of what we call “communication research” is built on a metaphor that assumes the thing cybernetics denied. LLMs inherit this deeply: next-token prediction trained on text-as-signal is Shannon-Weaver taken to its terminus. The model treats text as a pattern to be reproduced statistically, with no built-in notion that reproducing the pattern requires being a system the pattern is meaningful to. Whether what LLMs do is the same thing as language in the cybernetic sense is a question the field cannot even formulate, because the relevant vocabulary was eliminated at the Dartmouth cut. Bender and Koller’s 2020 octopus thought experiment — two parties communicating via undersea cable, an octopus splicing the cable and learning the statistical pattern of the messages without ever experiencing the things the messages refer to — is the modern restatement of the structural-coupling critique, and the modern restatement was needed because the original critique was not in the field’s bibliography.

Purpose and teleology

Arturo Rosenblueth, Norbert Wiener, and Julian Bigelow’s “Behavior, Purpose and Teleology” (Philosophy of Science, 1943) defined purpose operationally via negative feedback: a system is purposive if it reduces error relative to a reference. This made teleology scientifically respectable for the first time since Aristotle’s final causes had been chased out of biology in the seventeenth century. Ernst Mayr’s “Cause and Effect in Biology” (Science, 1961) extended the move into evolutionary biology by distinguishing proximate from ultimate causes — purpose-talk in biology is licensed when it tracks the historical selection pressure that built the function.

Post-Dartmouth AI banned teleological talk for decades; “purpose” became either a user-specified reward function or a folk-psychological illusion. Daniel Dennett’s intentional stance (“Intentional Systems,” Journal of Philosophy, 1971; The Intentional Stance, MIT Press, 1987) recovered a behaviorist-friendly version — purpose-attribution as a pragmatic strategy for predicting complex systems. Ruth Millikan’s Language, Thought, and Other Biological Categories (MIT Press, 1984) developed teleosemantics — meaning grounded in evolutionary function. None of these were standard reading in mainstream AI departments. Intrinsic motivation and goal emergence returned only recently and in impoverished form (curiosity-driven RL, intrinsic-motivation papers from the late 2010s). Karl Friston’s free energy principle is the modern teleological reformulation — the “purpose” of any organism is the minimization of free energy / surprise relative to its existence-defining priors — but again, the field reads Friston as niche.

The consequence is concrete and current: alignment work assumes “goals” are external specifications you give to a system, because the field has no formal vocabulary for the goals a system constitutes for itself. The hard version of alignment — what does this system actually want, structurally, given the dynamics that produced it — is unaskable in the field’s native terms. We lost the ability to theorize goal-directedness as a structural property of certain dynamical systems, which is precisely what you’d need to talk rigorously about agency, alignment, or autonomy.

The anthropology of being human

This is the consequence most underappreciated, and the one I want to take time on. Pre-Dartmouth, the mainstream Western-scientific self-understanding of the human was still Aristotelian-organismic at root: humans are living beings with characteristic activities, coupled to environments, constituted by internal regulatory loops, oriented by purposes. Cybernetics was continuous with that tradition and modernized it. Post-Dartmouth, the ascendant self-image became the meat-computer: humans are biological substrates running programs, with “software” (thought) separable from “hardware” (brain), with inputs and outputs across a skin-boundary, with cognition as information-processing.

This anthropology is now the water everyone swims in. It has reshaped specific institutions over six decades.

In medicine, the brain is treated as wetware to be patched chemically: SSRIs, anxiolytics, stimulants administered on the theory that the brain is a chemical-processing substrate whose output is mood and behavior. The mainstream model of depression is “monoamine deficiency” — a model that has been falsified empirically multiple times (the most influential recent meta-analysis is Moncrieff et al., “The serotonin theory of depression: a systematic umbrella review of the evidence,” Molecular Psychiatry, 2022) but persists because there is no replacement model that fits the same wetware ontology. Psychiatric care has displaced psychotherapy by orders of magnitude over the last forty years not because psychotherapy was disproven but because the institutional ontology favored chemical inputs to a chemical machine.

In education, students are increasingly modeled as information-processors to be loaded — curricula optimized for “throughput,” standardized testing as the unit of measurement, attention treated as a resource to be allocated rather than a perceptual relation to be cultivated. The cognitive sciences underlying contemporary instructional design — Bloom’s taxonomy, cognitive load theory (John Sweller), spaced repetition (Sebastian Leitner) — are valuable as far as they go, but they go only as far as the meat-computer ontology allows. Vygotsky’s developmental account — cognition as constituted in the social interaction of learner and teacher in the zone of proximal development (Mind in Society, Harvard, 1978, English compilation of his 1920s–30s work) — was largely sidelined in Anglo-American educational technology because it doesn’t fit.

In law, intent is treated as a computable state. Mens rea — the mental element of a crime — gets operationalized in modern criminal statutes as a checklist of elements (knowledge, recklessness, negligence) the prosecutor must prove via observable behavior. The cybernetic alternative — intent as a regulatory relationship between the agent’s perceptions, their references, and their environment — would require a theory of mind the law doesn’t have. Algorithmic risk assessment (the COMPAS system, used in U.S. criminal sentencing) treats recidivism as a property of the defendant rather than as a property of the defendant-environment coupling, and the failures of these systems (documented by ProPublica’s 2016 investigation; corroborated in subsequent academic analyses) are predictable from the ontology choice.

In labor, cognitive work is increasingly treated as automatable because it’s “just computation.” The 2020s LLM-displacement story — software engineers, copywriters, paralegals, accountants, customer-service workers being told their work is “just” text-processing — depends on the meat-computer view. Whether this view is empirically right is irrelevant to the institutional moves being made; the moves are being made because the ontology is unquestioned. Recent research on what knowledge work actually consists of (Annie Jean-Baptiste’s work on inclusive design, the broader CSCW literature on situated and articulation work) consistently finds that the parts that look most automatable are the parts that are coordinating the parts that aren’t.

In self-help, the ascendant idiom is “hack your habits like code,” “rewire your brain,” “optimize your dopamine.” James Clear’s Atomic Habits (Avery, 2018), Andrew Huberman’s enormously popular podcast format, Tim Ferriss’s earlier work — these are all valuable in their own terms, and I’m not dismissing them, but the metaphorical framing matters. They presuppose a self that is a configurable system. The Aristotelian alternative — a self that is constituted by its characteristic activities and that grows through habituation in a thicker sense — would be unmarketable today because the ontology underneath isn’t shared.

The popular-science crystallization of the meat-computer view is Yuval Harari’s Sapiens (Harper, 2014) and Homo Deus (Harper, 2016), which describe humans straightforwardly as “biological algorithms.” This is not a fringe view; it’s the default intellectual ambient. The pushback exists but is marginal: Hubert Dreyfus’ What Computers Still Can’t Do: A Critique of Artificial Reason (MIT Press, 1992; updated edition of What Computers Can’t Do, 1972) made the phenomenological case that human cognition is constitutively non-computational; Charles Taylor’s Sources of the Self (Harvard, 1989) and The Language Animal (Harvard, 2016) argue that human selfhood is constituted by the languages and traditions humans inhabit, not by information-processing in their skulls. These are read in philosophy departments, not in computer science ones. The cut put them on the wrong side of the institutional fence.

The loss wasn’t just a scientific loss; it was a loss of the civilization’s ability to describe itself in terms other than the ones its dominant machine paradigm handed it. The cybernetic alternative — humans as autopoietic systems coupled to their environments, constituted by their regulatory loops, oriented by their structurally-emergent purposes — would have given the same six decades a different ambient anthropology. We don’t know what that civilization would have looked like, and we won’t, because the cut was made.

Indigenous and relational ontologies

Most non-Western philosophical traditions — Andean relational ontologies, Bantu ubuntu epistemology, many Indigenous American epistemes, classical Chinese relational-field thinking (qi, li), Buddhist dependent origination, Hindu interdependent-origination accounts — are closer to cybernetics than to symbolic AI. They treat persons, minds, and entities as constituted by relations and processes, not as substances with properties.

Specific examples make the structural homology visible. Eduardo Viveiros de Castro’s Amerindian perspectivism (developed in “Cosmological Deixis and Amerindian Perspectivism,” Journal of the Royal Anthropological Institute, 1998; Cannibal Metaphysics, Univocal/Minnesota, 2014) argues that Amazonian peoples treat species-difference not as a difference in objective nature but as a difference in perspective — what is “manioc beer” to humans is “blood” to jaguars; the world has many perspectives, and personhood is the capacity to occupy one. This is structurally a second-order cybernetic move: the observer is constitutive of what is observed, and different observers constitute different worlds. Mary Graham’s “Some Thoughts about the Philosophical Underpinnings of Aboriginal Worldviews” (Australian Humanities Review, 2008) develops the Australian Aboriginal frame in which “land is law” — the regulatory structure of country is itself the source of normativity, not a backdrop against which human normativity operates. Robin Wall Kimmerer’s Braiding Sweetgrass (Milkweed, 2013), written by a Potawatomi botanist, argues in concrete biological detail for a relational understanding of ecological systems that is closer to autopoiesis than to anything in mainstream Western biology textbooks. Tyson Yunkaporta’s Sand Talk: How Indigenous Thinking Can Save the World (Text Publishing, 2019; HarperOne, 2020) makes the case explicitly that Aboriginal Australian cognitive practices are systems-thinking practices.

The Dartmouth ontology reinforced the Western substance-metaphysics export. The consequence: a huge fraction of humanity’s accumulated thinking about what minds and persons are was further marginalized as “not scientific,” when the actual reason it didn’t fit was that it didn’t share the ontological commitments of one specific 1956 American conference. Cybernetics was the last Western scientific framework with an open structural door to relational ontologies — Bateson explicitly drew from his fieldwork with Margaret Mead in Bali; Maturana drew from his immersion in Latin American intellectual traditions; Varela was a Chilean Buddhist. That door closed. The narrowing compounds; we don’t have the comparative-philosophical apparatus we would need now to evaluate what an LLM trained predominantly on English-language Western text actually represents.

The craft-knowledge lineage

Pre-Dartmouth cybernetics was continuous with design, architecture, and pedagogy as cognitive practices. Christopher Alexander’s A Pattern Language: Towns, Buildings, Construction (Oxford, 1977) and the later The Nature of Order (Center for Environmental Structure, 2002–2004) developed an account of design as a recursive practice in which the designer’s iterated loop with the artifact and the inhabitants of the artifact is itself the cognition that produces the design. Beer’s viable system design, Pask’s teaching machines, the entire Ulm School of design (Hochschule für Gestaltung Ulm, 1953–1968) treated cognition as partly made through making.

Michael Polanyi’s Personal Knowledge: Towards a Post-Critical Philosophy (University of Chicago, 1958) introduced “tacit knowledge” — the knowledge a craftsman has that they cannot fully articulate, that lives in the practice itself. Donald Schön’s The Reflective Practitioner: How Professionals Think in Action (Basic Books, 1983) extended this into a theory of professional expertise: the expert designer, doctor, or architect engages in “reflection-in-action,” a recursive coupling between action and observation that is the cognition, not the vehicle of cognition. Richard Sennett’s The Craftsman (Yale, 2008) traces the same insight through carpentry, music, programming, and surgery.

Post-Dartmouth, design got split off as HCI or industrial design; cognitive science kept the “pure” mind. The consequence: the insight that designing a system is itself a cognitive act — that the designer’s iterated loop with the artifact is the thinking, not merely the vehicle of thinking — was lost from the science of mind. It is precisely this insight that collaboration between two thinkers exemplifies. The research happens in the loop, not in either head. The cybernetic tradition would have had a vocabulary for this. The post-Dartmouth tradition does not, which is why so much of the public discourse about LLM-augmented work either over-credits the human (the model “just helped”) or over-credits the model (the model “wrote it”), when the honest description is that neither did, and the both-of-us-coupled-with-the-process did.

What symbolic AI got right (and what cybernetics got wrong)

I have been one-sided so far. The post would land badly on a careful reader who hasn’t been pre-convinced. Honest steel-manning is owed.

Symbolic AI got real things right. The push for explicit, formal, machine-checkable representations of knowledge — predicate calculus, semantic networks, frames, ontologies — produced engineering wins that no cybernetic alternative had matched at the time. Expert systems (MYCIN for blood infection diagnosis, DENDRAL for chemistry, R1/XCON for VAX configuration at DEC) produced economic value in narrow domains. The cleanly stratified architecture (representation, inference, application) made debugging tractable in ways the messy biological-feedback architectures cybernetics had favored emphatically did not. The Turing-machine model gave the field a precise computational baseline that everyone could compute against — without it, the convergence of theoretical computer science, computational complexity theory, and the eventual ML paradigm could not have happened. Even today’s deep-learning systems are still descendants of the Dartmouth choice in this technical sense, and they would not exist in their current form without it.

Cybernetics also overreached. By the late 1960s, “cybernetic” had become a term applied to therapy, ecology, management, education, art, urban planning — almost anything where a feedback loop could be plausibly invoked. The technical core got diluted; the literature became hard to navigate; some of it slid into the systems-mysticism and new-age systems-thinking that gave the field a deserved reputation for woolliness. Wiener was a polymath whose followers could not sustain the breadth. The mathematics cybernetics needed — proper non-equilibrium thermodynamics, variational inference, large-deviation theory, modern dynamical-systems theory — wasn’t available in usable form until decades after the moment had passed. Some of cybernetics’ pre-mathematical claims (Beer’s specific VSM predictions, some of Bateson’s looser systems-talk) didn’t survive contact with empirical scrutiny.

So this is not a story of villains and heroes. The Dartmouth cut was historically intelligible. The institutional pressure that favored it was real. The symbolic tradition produced wins. The cybernetic tradition produced failure modes. The point of this post is not that the cut was wrong; the point is that the cut was a cut — a particular set of questions was rendered structurally illegitimate by it, and the cost of that has not been adequately paid by the field, even now that the substantive defenses for the cut have weakened.

Where does this leave us now?

The Dartmouth project cut severed the observer from the observed, the organism from the environment, the substrate from the process, the purpose from the dynamics, the self-reference from the cognition. What remained was the trace of thought — inputs, outputs, behaviors — while the loop that constitutes thinking was declared someone else’s problem. Every paradigm downstream operates inside that amputation. GOFAI: symbols in a sealed box. Connectionism: distributed representations in a sealed box. Deep learning: learned features in a sealed box. Transformers: attention patterns in a sealed box. None of them is wrong within the amputated frame. All of them are incapable of asking the question cybernetics had as its starting point: how does a system come to be a knower through the recursive closure of its own organization? The word “Artificial” is an artifact of this cut — it presupposes the real thing is biological and what we build is imitation. Drop that presupposition and the word’s ceiling and floor both vanish.

Cybernetics understood that elimination cannot be exogenous. A system that is acted upon by external selection is being sculpted, not cognizing. A system that eliminates its own non-viable states through self-referential dynamics is constituting itself as a knower. Self-reference as squaring is the formal skeleton of this; autopoiesis is its biological flesh; Friston’s free energy is its probabilistic restatement; Spencer-Brown’s distinction is its algebraic core; Ashby’s requisite variety is its quantitative constraint; Bateson’s “difference that makes a difference” is its information-theoretic core. The Dartmouth cut made it illegitimate for fifty-plus years to say any of this in a computer science department. The debt is still being paid, in cycles, by every researcher who reinvents one of these concepts in 2025 not knowing it had a 1969 form.

But the loss of the questions from mainstream cognitive science is different from, and worse than, the loss of the institutional label. Control theorists do not read Varela; complexity scientists rarely read Bateson; ML researchers do not read either. The Dartmouth cut made the questions tribally coded — “that’s not CS, that’s philosophy / systems theory / mysticism” — and the tribal coding persisted long after the specific 1956 reasons for it dissolved. Cybernetics might have collapsed under its own weight regardless. The specific form of its erasure — the renaming that made its questions illegitimate rather than just unanswered — was not inevitable, and is the injury that still bleeds.

What can be done now? The question is not “revive cybernetics.” Cybernetics as it existed in 1965 had the limits I named in the steel-manning section, and reviving the brand wouldn’t repair the damage anyway. The question is whether the questions that the cut declared illegitimate can be asked again under any name, with the formal tools we now have that cybernetics didn’t, and with the explicit acknowledgment that mainstream AI inherited a particular ontology by political and institutional contingency rather than by argument. The answer to that question is being given right now, in real time, by every researcher working on free energy, on bioelectric basal cognition, on neuromorphic substrates, on self-eliminating fixed points, on relational ontologies of mind. The question “what was lost?” is the precondition for asking the question “what can be recovered?” That precondition is what this post tries to satisfy.

April 22, 2026