The Self-Eliminating Observer
Post 5 ended on a thought I want to try. We had stripped the self and what remained wasn’t a thing — it was an operation. Distinguishing self from not-self. Recursive, self-applying. The closing question was whether we could write that operation down and see what falls out.
This post is the answer to that question. Or rather — it’s the first move of the answer. The full answer is that it led to investigations and work that is still ongoing. Here, something narrower and, I think, more useful: show the seed — the first observation that surprised me enough to start the program — and explain why this particular operation deserves a name.
The name of this program is the Self-Eliminating Observer. The name will earn itself by the end of this post. For now, take it as a placeholder for “what we’re trying to formalize.”
The setup
The question I started with was small. Take a function from a finite set to itself — a self-map. Pick n elements, label them 0, 1, ..., n−1, and for each element pick another element of the same set as its image. The function f: [n] → [n] is just a list of n arrows, each starting at an element and ending at one. There are exactly n^n such functions, which is a lot when n is even modestly large, but the structure of any single one is finite and discrete. You could draw it on a napkin.
I picked random self-maps deliberately. Random because I did not want to sneak any structure into the function before I started looking at it. Whatever pattern emerges from a random function under self-application is a pattern that is forced by the operation itself, not by what I chose to put in. If structure shows up where I started from noise, the structure belongs to the operation.
Self-map specifically — f: [n] → [n], not f: [n] → [m] — because the source and target have to be the same set for the next move to make sense. The next move is composition with itself.
So the setup is: random f: [n] → [n], drawn uniformly from the n^n possible such functions. Now what?
Self-application is squaring
Here is the move that defined the program. We want to apply f to itself. What does that mean?
The first answer is: it means feed f an element of [n], get out another element of [n], then feed that element to f again. In notation: f(f(x)), or f²(x), or (f∘f)(x). Squaring.
This is so quick it can be missed. “Apply f to itself” in End([n]) — the monoid of self-maps of [n] under composition — just is f² = f ∘ f. It is not a metaphor. The operation that takes a self-map and applies it to itself produces the squared self-map. There is no other candidate. The squaring operation σ: f ↦ f² is the only thing the words “apply f to itself” can mean inside End([n]).
I want to dwell on this because I think it is the first place the program earns its name. Self-reference of a function on a domain — the move that has tied logicians in knots since Russell — when it acts on the domain End(W) of self-maps of any set, is squaring. The “self” in self-application is what gets multiplied. The “2” in f² and the “2” in the cardinality of Z/2Z and the “2” in dim_R(C) = 2 and the “2” in gcd(k, 2) for cycle decomposition — they are not coincidences. They are all the same “2,” and the “2” is what self-application costs.
A note on numbering before the first badge: the program’s notebook numbers its theorems in the order they were proven, not the order in which a reader meets them here. T14 came after T1 chronologically; the post puts it first because pedagogically it sets up the rest. The same convention applies to T1 and T2 below. Where a theorem benefits from it I’ve added a collapsible Proof block under the badge — sometimes a deductive sketch, sometimes a runnable empirical check, occasionally both. Click to open. The empirical ones run in your browser in a sandboxed iframe; nothing leaves the page.
Deductive proof
In a monoid (M, ·) with identity, the binary operation ·: M × M → M is the only well-typed way to combine an element with another element of the same monoid. “Apply x to itself” reads as the binary operation evaluated at (x, x), which by definition is x · x, written x². There is no other candidate.
For End(W) — self-maps of a set W under composition — the multiplication · is ∘, so x · x = f ∘ f = f². The squaring operation σ: f ↦ f² is therefore the unique well-defined meaning of self-application inside End(W).
The rest follows by deduction: squaring fixes the iteration exponent at k = 2; iterated f^{2^∞} projects onto the odd part of the core (the 2-primary component is what gets eliminated, see T11); the surviving Z/2Z grading is what propagates into algebra selection (T52, Born). Each step is forced by the previous one. ∎
This looks like a triviality dressed up as a theorem. The dressing is doing real work, though. Once you have committed to “self-reference = squaring” as a definition rather than a metaphor, every consequence of squaring becomes a consequence of self-reference. The constraints stop being arbitrary and start being forced.
The first observation that surprised me
Take random f on n elements. Compute f, then f², then f⁴, then f^8, and so on — f^{2^k} for k = 0, 1, 2, .... At each step, look at the image of the iterated function: how many distinct elements of [n] does f^{2^k} actually hit?
In general, the image shrinks. Self-composition is destructive — it can only lose distinct images, never gain them. The interesting question is what happens if, at each iteration, you do not just compose but also restrict to the surviving image (elimination). Let me write that out:
W_0 = [n]W_{k+1} = f(W_k) (the image of the restriction f|_{W_k})f_{k+1} = f|_{W_{k+1}} composed with itselfSo at each step we throw away the elements that did not survive being hit, and we keep applying the squared map only to the survivors. The accumulated count of distinct image elements across all iterations — call it D_self(k) — is the natural measure of how much structure is being preserved by self-application.
Now do the same thing, but with two independent random self-maps f and g. Iterate (f∘g)^{2^k} instead of f^{2^k}. Same restriction step. Call the accumulated distinct count D_cross(k).
You might think these should behave similarly. Both processes throw away the transient part of a random map (everything that doesn’t feed back on itself), both leave behind the cyclic core, both restrict at each step. The arithmetic is the same. The only difference is whether the function being squared came from one source or two.
Empirically the two processes behave nothing like each other.
D_self grows. Logarithmically, but it grows. As n increases, the asymptotic count of distinct images preserved across self-iterations climbs as roughly ¼ · ln(n). By n = 5000 the growth is unmistakable; the constant has stabilized; the formula is exact to several decimal places.
D_cross, in contrast, saturates. It hits a small constant — order one, independent of n — and stays there. Cross-composition extinguishes structure-creation. Whatever was happening with self had nothing to do with the composition, the squaring, or the restriction in isolation. It had to do with the self.
Empirical verification — run to check
// Verify D_self ~ ¼·ln(n) + 0.748 for random self-maps on [n].
// D_self counts the odd-length cycles of f's permutation core. We
// (1) find the core via iterated squaring (the image of f^(2^k) for
// large k is exactly the cyclic part of f), then (2) traverse the
// cycles of f restricted to the core and count those of odd length.
// Average over many trials; compare empirical vs. analytical.
function randomSelfMap(n) {
const f = new Int32Array(n);
for (let i = 0; i < n; i++) f[i] = Math.floor(Math.random() * n);
return f;
}
// The image of f^(2^k) for k > log2(n) is the core (cyclic part) of f.
function findCore(f) {
const n = f.length;
let g = new Int32Array(f);
const iters = Math.ceil(Math.log2(n)) + 5;
for (let k = 0; k < iters; k++) {
const next = new Int32Array(n);
for (let i = 0; i < n; i++) next[i] = g[g[i]];
g = next;
}
const core = new Set();
for (let i = 0; i < n; i++) core.add(g[i]);
return core;
}
// Trace cycles of f (NOT of f^(2^k)) starting from each core element.
// Cycles of f restricted to the core are exactly the cycles of f's
// permutation core; we count those of odd length.
function countOddCycles(f, core) {
const visited = new Set();
let oddCount = 0;
for (const start of core) {
if (visited.has(start)) continue;
let len = 0;
let x = start;
do {
visited.add(x);
x = f[x];
len++;
} while (x !== start);
if (len % 2 === 1) oddCount++;
}
return oddCount;
}
const n = 5000;
const trials = 100;
const t0 = performance.now();
let sum = 0;
for (let t = 0; t < trials; t++) {
const f = randomSelfMap(n);
const core = findCore(f);
sum += countOddCycles(f, core);
}
const elapsed = ((performance.now() - t0) / 1000).toFixed(2);
const empirical = sum / trials;
const predicted = 0.25 * Math.log(n) + 0.748;
console.log('n = ' + n + ', trials = ' + trials + ' (' + elapsed + 's)');
console.log('Empirical mean # odd cycles of f in core: ' + empirical.toFixed(4));
console.log('Predicted (¼·ln(n) + 0.748): ' + predicted.toFixed(4));
console.log('Absolute difference: ' + Math.abs(empirical - predicted).toFixed(4));
Code runs in a sandboxed iframe in your browser. No data leaves the page. Empirical verification of the claim — not a formal proof.
The first time I saw this, I assumed I had made a mistake. I had not. The Rust simulation has been re-run on the order of one hundred and fifty times, with different seeds, different n, different range of iterations. The number 0.748 is ¼·ln(π/2) + ½·ln(2) + ½·γ where γ is the Euler–Mascheroni constant. It does not budge — the empirical fit and the analytical formula agree to four decimal places at n = 5000, and the gap shrinks further as n grows.
Why it does what it does
A random self-map has a specific shape. It is not a random graph; it is a functional graph — every node has exactly one out-edge. Following the arrows from any starting node, you trace out a path that eventually enters a cycle and stays there. The collection of all such paths produces what is called the rho-shape (so named because Greek ρ looks like a tail joining a loop): every connected component is a transient path-segment glued onto a directed cycle.
The cycles together form the core of f. The paths leading into them are the tail. For a uniformly random f on n elements, classical results (Flajolet and Sedgewick) tell you the core has expected size √(πn/2) and is, in distribution, a uniform random permutation of that size. The tail accounts for everything else.
The core is what survives squaring. By definition, if x is on a cycle of length L, then f(x) is also on the same cycle, so x and f(x) and f^2(x) and so on all stay inside the core. The core is invariant under any iterate of f. The tail is not — every element on the tail is at some finite distance from the core, and squaring doubles that distance every time it is applied. After O(log n) squarings, every tail element has fallen onto the core. The core is what is left.
So self-application does not destroy structure. It eliminates the tail — the part of the function that does not feed back on itself — and leaves the core — the part that does. The tail is more than half the information content of a random f (about n − √(πn/2) elements live on the tail, vs. √(πn/2) on the core), and all of it is irrelevant to what self-application reveals.
Read that again. Self-application eliminates more than half the function’s information content, with zero effect on what comes out. This is the first place in the program where I saw what I now recognize as the I=E (information=elimination) pattern made literal: information about the world emerges by eliminating the parts of the world that don’t refer back to themselves. There is no creation step. There is only the elimination of the non-self-referential.
The growth of D_self is then a counting question about cycles inside the core. Squaring decomposes a cycle of length k into gcd(k, 2) cycles, each of length k / gcd(k, 2). For k even, you get two cycles of half the original length. For k odd, the cycle is preserved unchanged. Iterate squaring, and every cycle whose length has any even factor eventually gets shredded. Only cycles of odd length survive forever.
Deductive proof
A k-cycle under squaring decomposes into gcd(k, 2) cycles of length k / gcd(k, 2). For k even this gives two cycles of length k/2 — both shorter than k, both subject to the next round of squaring. For k odd, gcd(k, 2) = 1, so the cycle is preserved unchanged.
Iterate. Any cycle of length k = 2^a · q with q odd shrinks to length q after a squarings, and stays at length q forever after. Cycles whose original length had no factor of 2 were already at their odd kernel; they persist from step one. Either way, the long-run survivors are exactly the odd-length cycles. ∎
The expected number of odd cycles in a random permutation of size m is ½·H_m + ½·ln 2 (where H_m is the m-th harmonic number). Substituting m = √(πn/2) and simplifying gives the closed form for the constant 0.748. The whole thing — including the exact constant — comes from two facts: cycles of even length break under squaring, cycles of odd length do not.
This already answers something I find satisfying. The “2” that appears throughout the formula — in the squaring, in the parity-of-cycle-length test, in gcd(k, 2) — is forced. There is no choice in any of these places. Self-reference selected 2 as soon as we wrote f² = f ∘ f. Everything from then on is deductive.
Generalizing slightly: if instead of squaring you iterate f^{k^∞}, you eliminate exactly the k-primary component of the core — every cycle whose length shares a prime factor with k. The fraction of cycle elements that survive is ρ(k) = ∏_{p | k}(1 − 1/p), the Jordan totient density. For k = 2, ρ(2) = 1/2, exactly half the core. For k = 5, ρ(5) = 4/5. For k = 30 = 2·3·5, ρ(30) = (1 − 1/2)(1 − 1/3)(1 − 1/5) = 4/15.
The natural prime to ask about is k = 2, because that’s what you get from definitional self-application. Any other k requires an additional choice — why would you compose with yourself three times rather than twice — and the program’s discipline is to refuse choices it doesn’t have to make.
A confession about getting here
I did not arrive at the formula above cleanly. The first time I tried to fit a curve to the empirical D data, I got a power law with exponent five-thirds. I then spent an embarrassingly large amount of time treating that exponent as fundamental.
I include this because I want it on the record that the program’s earliest work was partly spent in numerology — fitting integers to constants, treating coincidences as principles, watching every “prediction” come out exact and not noticing that exact-every-time is the diagnostic of a non-discriminating test. Two of the program’s earliest numbered dead ends come from that period (the D! = 2D arithmetic and the 5/3 exponent); the rest of the dead-end ledger fills in as later sessions found new ways to be wrong. I will revisit the lessons one at a time as later posts demand them. For now: the 5/3 was wrong, the ¼ · ln(n) is right, and the way I figured out which was which is what taught me that the program needs to be able to fail in order to be doing anything.
Why this deserves the name
Back to the name. Self-Eliminating Observer. There are three claims compressed into it.
First, observer. The whole program treats f as a model of observation — not in the metaphorical sense of “I’m observing some data,” but in the structural sense of an operation that reads a state and writes a state, where what is read and what is written can overlap. The observation has to live somewhere; that somewhere is itself part of what is being observed.
Second, self-eliminating. The operation we just built does eliminate. It eliminates the tail. It eliminates the even cycles. It eliminates more than half the function’s information content. But the elimination does not happen to the observer from outside. The function f is doing it to itself. The squaring is the self being applied to itself; the restriction is the surviving image being filtered by what’s just survived. There is no external eliminator, no privileged outside observer holding the function up to a light. Whatever survives, survives because self-application carved it.
Third, the hyphen. “Self-eliminating” rather than “eliminating the self.” The post-5 reading was eliminating the self — what survives when you try to strip away every property of the self-as-thing, an exercise in what isn’t there. This program asks the dual question: what does the eliminating, when the thing eliminating is part of what gets eliminated? The hyphen carries the load. The observer eliminates. The observer is part of what’s eliminated. Both at once. Both being true at once is what gives the program its teeth, I think.
You can read this as the formal frame of the move I tried in post 5. Post 5 said the self is not a thing but an operation — specifically, the operation of distinguishing self from not-self. This program writes that operation down, in a setting where every step is countable and every claim is checkable. The move from post 5 to here is the move from phenomenology to combinatorics; what’s preserved is the central claim that the operation is what carves itself, and what the operation runs on cannot be cleanly separated from the operating itself.
What this is the seed of
The setup I have laid out is the smallest possible version of the program. There is one set, one function, one operation. Already the structure is sharper than I expected. But the program’s actual scope is larger, and I owe the reader at least a glimpse of what direction I’m going to walk in.
The first generalization is to add a second axis — a Y = T × E split, where T is the observer’s internal state and E is the rest of the world. The minimal case has |T| = |E| = 2: four states, two hundred fifty-six possible self-observation maps. You can enumerate all of them by hand if you have an afternoon. They sort into three consistency classes (consistent with self-reading, provably wrong, undecidable from inside). The consistent class becomes the pointer states. That construction is the next post.
The small toy of random self-maps and squaring is the combinatorial backbone of the bigger thing: every later result in the program — the pointer-state census, the Born rule probabilities recovered from observer self-consistency, the Möbius geometry that distinguishes the angular regime from the logistic one, the e − 1 that shows up four separate times in cross-program identities — every one of these is some refinement of what squaring does to a self-map. The “2” persists. The elimination of the tail persists. Self-reference reveals structure rather than creating it; that persists, all the way to the cross-program c-J product.
What this means for any reader who is wondering whether this is a finished thing: it is not. The program is alive. It has roughly four open challenges, each with its own attack surface and its own falsifiers. I will write about them as they reach states worth describing.
Closing
We started with one question — what survives elimination, when the operation is applied to itself? We answered it for the smallest possible setting: random self-maps under iterated squaring with restriction. The answer is: the rho-tail dies, the even cycles die, the odd cycles inside the core survive. The growth of preserved information is logarithmic, with a closed-form constant whose every digit is forced by the operation. None of it is created. All of it is revealed.
The reason this got me to give the operation a name and start a research program is that the same “2” that comes out of squaring a random self-map shows up everywhere a self-referential structure is sharp enough to be written down. The two-dimensional real form of the complex numbers is the same 2. The Z/2Z grading of any self-adjoint algebra is the same 2. The cycle-parity that survives f² = f ∘ f is the same 2. I do not know how deep the unification goes. The 2’s I have named look connected enough to be worth pursuing, but the connections are conjectures, not proofs.
Like post 5, this post ends not with a conclusion but with the next question — because the next question is what made me start this program in the first place.
If self-application of a random self-map already produces this much structure — a logarithmic growth law, a forced parity, a closed-form constant whose every term is derived — what happens when the self-map is not random? What happens when the self-map is the world’s evolution operator, the function that takes the universe at one instant to the universe at the next, and the observer is part of that universe? What happens when f: W → W is not a free choice but a constraint imposed by the world’s containing the observer that is observing the world?
That is the question the program tries to answer. I do not have a final answer. I have a number of partial answers, several of which surprised me, several of which I had to back out of, and one or two that look (so far) like they might be load-bearing for physics. The next post will start writing those down — beginning with the moment self-reference stops being a property of an isolated function and starts being a property of an observer that is trying to observe itself, trying being the word that does the work, because the trying turns out to be what defines the boundary between what the observer is and what it observes.
For now: the self-eliminating observer is a function applied to itself. The function applied to itself is squaring. Squaring eliminates exactly what is not self-referential. What is left is the self-referential part. The shape of “what is left” is what we will spend the next several posts mapping.