lisp-machine-bootstrap: all software domains flip in days-weeks

- Every subdomain for bootstrapping the Lisp Machine is software: RISC-V ISA, SBCL runtime, ACL2 logic, CIC type theory, compiler optimization, device drivers. Every one flips. - Fastest sequence: Day 1 ingestion (LLM + human review), Day 1-2 profiling (benchmark sweep), Day 2-3 active probing (synthetic microcode routines), Day 3-7 transfer + sufficiency (ACL2 verifies new dispatch routines, zero LLM tokens) - Result: self-driving Lisp Machine in under a month with one human review session and a Tenstorrent P150
2026-05-21 18:58:26 +00:00
parent 2d0d6d478a
commit 80daaa4830
1 changed files with 109 additions and 0 deletions
--- a/ideas/passepartout-economics.org
+++ b/ideas/passepartout-economics.org
@@ -850,6 +850,115 @@ only checks for safety (no shell commands, no file deletions)
 and passes through everything else to the LLM. The system is
 honest about its frontier.

+*** Bootstrapping the Lisp Machine: all domains are software
+
+For the concrete goal of bootstrapping a self-driving Lisp Machine,
+every domain involved is software — the most codifiable domain in
+existence. Code has formal specifications, documented ISAs,
+deterministic behavior, and objectively testable correctness.
+Every subdomain required for the bootstrap flips.
+
+| Subdomain | Knowledge type | Flip timeline | Why |
+|-----------|---------------|---------------|-----|
+| RISC-V ISA, Tenstorrent Tensix dispatch | Structural (ISA spec, API docs) + performance (profiling) | Days | Published spec, deterministic hardware, benchmark harness characterizes real behavior |
+| SBCL runtime internals (GC, type dispatch, threading) | Structural (source code) + performance (latency profiles) | Days | Full source available, system can instrument itself |
+| ACL2 metafunctions and macro layers | Structural (the logic is ACL2's own) | Instant | The theorem language is already the system's native logic — no translation step |
+| FPGA/Verilog descriptions (if FPGA path) | Structural (VHDL/Verilog semantics) + performance (timing analysis, power) | Weeks | Published language semantics, but synthesis is slower and bitstream verification is harder than RISC-V |
+| CIC prover kernel | Structural (type theory rules — these ARE formal) | Days | Mathematics is the most codified domain. ACL2 already does structural verification. Building a CIC kernel that ACL2 verifies is well-understood work. |
+| Operating system interfaces, device drivers | Structural (syscall API, register maps) + empirical (test results) | Weeks | Published interfaces, deterministic behavior, but hardware quirks require empirical probing |
+| Compiler optimization | Structural (IR semantics, optimization passes) + performance (benchmark before/after) | Weeks | Published semantics, objective quality metric (faster = better), benchmark harness measures |
+
+Every single subdomain flips. The only variable is calendar days
+to accumulate the knowledge.
+
+*** The fastest acquisition sequence
+
+Optimized for minimal wall-clock time to a self-driving Lisp Machine:
+
+**Day 1: Ingestion day**
+- LLM translates: RISC-V ISA spec, SBCL source, Tenstorrent API docs,
+  ACL2 reference, CIC type theory rules. All structural knowledge
+  enters the fact store in one parallel pass.
+- ACL2 verifies consistency across all ingested domains.
+- Human expert reviews the 5% of rules Screamer flagged as uncertain.
+  One session, a few hours.
+
+**Day 1-2: Profiling day**
+- Benchmark harness sweeps all 72 Tensix cores: measure instruction
+  latency, memory bandwidth, GC pause distribution, dispatch overhead.
+- Each measurement is a fact with `:provenance :benchmark`.
+- The benchmark harness is itself verified by ACL2 (it runs inside
+  a controlled sandbox, bounded time, no side effects on production data).
+
+**Day 2-3: Active probing day**
+- The system generates synthetic microcode routines: short programs
+  that exercise specific instructions, specific GC patterns, specific
+  dispatch paths.
+- It loads them onto spare Tensix cores, measures actual latency, and
+  compares against the spec.
+- Discrepancies become facts: `(:entity "core-42" :relation :dispatch-latency
+  :value "14 cycles" :source :measured :expected "12 cycles" :provenance :probe)`.
+- After ~1,000 probes, the system knows the hardware's actual behavior
+  better than the published spec does.
+
+**Day 3-7: Transfer and sufficiency**
+- ACL2's existing knowledge about induction, rewriting, and termination
+  transfers directly to verifying microcode routines (same logic, different
+  subject matter).
+- Screamer aligns microcode verification patterns with existing gate
+  verification patterns — both are structural proofs over finite state.
+- The benchmark facts give ACL2 a concrete cost model. ACL2 can now
+  prove not just correctness but also "this microcode routine is at
+  least 10% faster than the current implementation."
+- Sufficiency flip for microcode generation: the system proposes new
+  dispatch routines, ACL2 verifies them, Screamer checks against
+  hardware constraints, the gate stack blocks anything unsafe. Zero LLM
+  tokens for the optimization loop.
+
+**Week 2-4: Self-optimizing system**
+- The system profiles its own gate verification latency (already
+  instrumented via telemetry, Phase v0.66.0).
+- It identifies the hot path: "fact-query accounts for 34% of verify time."
+- It generates a new dispatch routine for fact-query, targets the
+  nearest idle Tensix core, loads it, benchmarks, and commits if
+  faster.
+- The ontology now includes facts about its own optimization history:
+  `(:entity "fact-query-dispatch-v3" :relation :speedup-baseline
+  :value "1.34x" :provenance :self-optimize)`.
+
+**After the flip: purely symbolic optimization**
+
+The LLM is no longer needed for any optimization proposal. The system
+profiles, proposes, verifies, tests, and commits entirely within the
+symbolic engine. The LLM remains only for the boundary: interpreting
+a human's high-level goal ("make the system faster") into a structured
+optimization target, and formatting the benchmark report for human
+readability. Those calls shrink toward zero as the system internalizes
+common optimization goals as gate rules.
+
+*** The surprising result for bootstrapping specifically
+
+Because every subdomain of the Lisp Machine bootstrap is software,
+and software is the most codifiable domain, the entire bootstrap
+can flip in **days to weeks** with a single human review session.
+
+The bottleneck is not knowledge acquisition. It is not the LLM's
+coding ability. It is the initial human review of the 5% of
+ambiguous rules that Screamer flags — a session measured in hours,
+not weeks.
+
+The Tenstorrent approach makes this even faster because the
+microcode is software (RISC-V assembly), not hardware (FPGA
+bitstream). The system can propose, load, test, and roll back
+a new dispatch routine in seconds. An FPGA path would add
+synthesis time (minutes to hours per iteration), stretching
+the bootstrap from days to months.
+
+A system with a Tenstorrent P150, the AGPL Passepartout code,
+a RISC-V cross-compiler, and one patient human who reviews the
+contrastive queries can achieve a self-driving Lisp Machine in
+under a month.
+
 Large refactoring projects (extract module, rename API, split monolith)
 are the hardest test for any AI agent. Current approaches (Claude Code,
 Copilot) handle them probabilistically — every step costs tokens, and