lisp-machine-bootstrap: all software domains flip in days-weeks

- Every subdomain for bootstrapping the Lisp Machine is software: RISC-V ISA, SBCL runtime, ACL2 logic, CIC type theory, compiler optimization, device drivers. Every one flips. - Fastest sequence: Day 1 ingestion (LLM + human review), Day 1-2 profiling (benchmark sweep), Day 2-3 active probing (synthetic microcode routines), Day 3-7 transfer + sufficiency (ACL2 verifies new dispatch routines, zero LLM tokens) - Result: self-driving Lisp Machine in under a month with one human review session and a Tenstorrent P150
2026-05-21 18:58:26 +00:00
parent 2d0d6d478a
commit 80daaa4830
1 changed files with 109 additions and 0 deletions
--- a/ideas/passepartout-economics.org
+++ b/ideas/passepartout-economics.org
@@ -850,6 +850,115 @@ only checks for safety (no shell commands, no file deletions)
 and passes through everything else to the LLM. The system is
 honest about its frontier.
 *** Bootstrapping the Lisp Machine: all domains are software
 For the concrete goal of bootstrapping a self-driving Lisp Machine,
 every domain involved is software — the most codifiable domain in
 existence. Code has formal specifications, documented ISAs,
 deterministic behavior, and objectively testable correctness.
 Every subdomain required for the bootstrap flips.
 | Subdomain | Knowledge type | Flip timeline | Why |
 |-----------|---------------|---------------|-----|
 | RISC-V ISA, Tenstorrent Tensix dispatch | Structural (ISA spec, API docs) + performance (profiling) | Days | Published spec, deterministic hardware, benchmark harness characterizes real behavior |
 | SBCL runtime internals (GC, type dispatch, threading) | Structural (source code) + performance (latency profiles) | Days | Full source available, system can instrument itself |
 | ACL2 metafunctions and macro layers | Structural (the logic is ACL2's own) | Instant | The theorem language is already the system's native logic — no translation step |
 | FPGA/Verilog descriptions (if FPGA path) | Structural (VHDL/Verilog semantics) + performance (timing analysis, power) | Weeks | Published language semantics, but synthesis is slower and bitstream verification is harder than RISC-V |
 | CIC prover kernel | Structural (type theory rules — these ARE formal) | Days | Mathematics is the most codified domain. ACL2 already does structural verification. Building a CIC kernel that ACL2 verifies is well-understood work. |
 | Operating system interfaces, device drivers | Structural (syscall API, register maps) + empirical (test results) | Weeks | Published interfaces, deterministic behavior, but hardware quirks require empirical probing |
 | Compiler optimization | Structural (IR semantics, optimization passes) + performance (benchmark before/after) | Weeks | Published semantics, objective quality metric (faster = better), benchmark harness measures |
 Every single subdomain flips. The only variable is calendar days
 to accumulate the knowledge.
 *** The fastest acquisition sequence
 Optimized for minimal wall-clock time to a self-driving Lisp Machine:
 **Day 1: Ingestion day**
 - LLM translates: RISC-V ISA spec, SBCL source, Tenstorrent API docs,
  ACL2 reference, CIC type theory rules. All structural knowledge
  enters the fact store in one parallel pass.
 - ACL2 verifies consistency across all ingested domains.
 - Human expert reviews the 5% of rules Screamer flagged as uncertain.
  One session, a few hours.
 **Day 1-2: Profiling day**
 - Benchmark harness sweeps all 72 Tensix cores: measure instruction
  latency, memory bandwidth, GC pause distribution, dispatch overhead.
 - Each measurement is a fact with `:provenance :benchmark`.
 - The benchmark harness is itself verified by ACL2 (it runs inside
  a controlled sandbox, bounded time, no side effects on production data).
 **Day 2-3: Active probing day**
 - The system generates synthetic microcode routines: short programs
  that exercise specific instructions, specific GC patterns, specific
  dispatch paths.
 - It loads them onto spare Tensix cores, measures actual latency, and
  compares against the spec.
 - Discrepancies become facts: `(:entity "core-42" :relation :dispatch-latency
  :value "14 cycles" :source :measured :expected "12 cycles" :provenance :probe)`.
 - After ~1,000 probes, the system knows the hardware's actual behavior
  better than the published spec does.
 **Day 3-7: Transfer and sufficiency**
 - ACL2's existing knowledge about induction, rewriting, and termination
  transfers directly to verifying microcode routines (same logic, different
  subject matter).
 - Screamer aligns microcode verification patterns with existing gate
  verification patterns — both are structural proofs over finite state.
 - The benchmark facts give ACL2 a concrete cost model. ACL2 can now
  prove not just correctness but also "this microcode routine is at
  least 10% faster than the current implementation."
 - Sufficiency flip for microcode generation: the system proposes new
  dispatch routines, ACL2 verifies them, Screamer checks against
  hardware constraints, the gate stack blocks anything unsafe. Zero LLM
  tokens for the optimization loop.
 **Week 2-4: Self-optimizing system**
 - The system profiles its own gate verification latency (already
  instrumented via telemetry, Phase v0.66.0).
 - It identifies the hot path: "fact-query accounts for 34% of verify time."
 - It generates a new dispatch routine for fact-query, targets the
  nearest idle Tensix core, loads it, benchmarks, and commits if
  faster.
 - The ontology now includes facts about its own optimization history:
  `(:entity "fact-query-dispatch-v3" :relation :speedup-baseline
  :value "1.34x" :provenance :self-optimize)`.
 **After the flip: purely symbolic optimization**
 The LLM is no longer needed for any optimization proposal. The system
 profiles, proposes, verifies, tests, and commits entirely within the
 symbolic engine. The LLM remains only for the boundary: interpreting
 a human's high-level goal ("make the system faster") into a structured
 optimization target, and formatting the benchmark report for human
 readability. Those calls shrink toward zero as the system internalizes
 common optimization goals as gate rules.
 *** The surprising result for bootstrapping specifically
 Because every subdomain of the Lisp Machine bootstrap is software,
 and software is the most codifiable domain, the entire bootstrap
 can flip in **days to weeks** with a single human review session.
 The bottleneck is not knowledge acquisition. It is not the LLM's
 coding ability. It is the initial human review of the 5% of
 ambiguous rules that Screamer flags — a session measured in hours,
 not weeks.
 The Tenstorrent approach makes this even faster because the
 microcode is software (RISC-V assembly), not hardware (FPGA
 bitstream). The system can propose, load, test, and roll back
 a new dispatch routine in seconds. An FPGA path would add
 synthesis time (minutes to hours per iteration), stretching
 the bootstrap from days to months.
 A system with a Tenstorrent P150, the AGPL Passepartout code,
 a RISC-V cross-compiler, and one patient human who reviews the
 contrastive queries can achieve a self-driving Lisp Machine in
 under a month.
 Large refactoring projects (extract module, rename API, split monolith)
 are the hardest test for any AI agent. Current approaches (Claude Code,
 Copilot) handle them probabilistically — every step costs tokens, and