lisp-machine-bootstrap: all software domains flip in days-weeks

- Every subdomain for bootstrapping the Lisp Machine is software:
  RISC-V ISA, SBCL runtime, ACL2 logic, CIC type theory, compiler
  optimization, device drivers. Every one flips.
- Fastest sequence: Day 1 ingestion (LLM + human review), Day 1-2
  profiling (benchmark sweep), Day 2-3 active probing (synthetic
  microcode routines), Day 3-7 transfer + sufficiency (ACL2 verifies
  new dispatch routines, zero LLM tokens)
- Result: self-driving Lisp Machine in under a month with one
  human review session and a Tenstorrent P150
This commit is contained in:
Hermes
2026-05-21 18:58:26 +00:00
parent 2d0d6d478a
commit 80daaa4830

View File

@@ -850,6 +850,115 @@ only checks for safety (no shell commands, no file deletions)
and passes through everything else to the LLM. The system is and passes through everything else to the LLM. The system is
honest about its frontier. honest about its frontier.
*** Bootstrapping the Lisp Machine: all domains are software
For the concrete goal of bootstrapping a self-driving Lisp Machine,
every domain involved is software — the most codifiable domain in
existence. Code has formal specifications, documented ISAs,
deterministic behavior, and objectively testable correctness.
Every subdomain required for the bootstrap flips.
| Subdomain | Knowledge type | Flip timeline | Why |
|-----------|---------------|---------------|-----|
| RISC-V ISA, Tenstorrent Tensix dispatch | Structural (ISA spec, API docs) + performance (profiling) | Days | Published spec, deterministic hardware, benchmark harness characterizes real behavior |
| SBCL runtime internals (GC, type dispatch, threading) | Structural (source code) + performance (latency profiles) | Days | Full source available, system can instrument itself |
| ACL2 metafunctions and macro layers | Structural (the logic is ACL2's own) | Instant | The theorem language is already the system's native logic — no translation step |
| FPGA/Verilog descriptions (if FPGA path) | Structural (VHDL/Verilog semantics) + performance (timing analysis, power) | Weeks | Published language semantics, but synthesis is slower and bitstream verification is harder than RISC-V |
| CIC prover kernel | Structural (type theory rules — these ARE formal) | Days | Mathematics is the most codified domain. ACL2 already does structural verification. Building a CIC kernel that ACL2 verifies is well-understood work. |
| Operating system interfaces, device drivers | Structural (syscall API, register maps) + empirical (test results) | Weeks | Published interfaces, deterministic behavior, but hardware quirks require empirical probing |
| Compiler optimization | Structural (IR semantics, optimization passes) + performance (benchmark before/after) | Weeks | Published semantics, objective quality metric (faster = better), benchmark harness measures |
Every single subdomain flips. The only variable is calendar days
to accumulate the knowledge.
*** The fastest acquisition sequence
Optimized for minimal wall-clock time to a self-driving Lisp Machine:
**Day 1: Ingestion day**
- LLM translates: RISC-V ISA spec, SBCL source, Tenstorrent API docs,
ACL2 reference, CIC type theory rules. All structural knowledge
enters the fact store in one parallel pass.
- ACL2 verifies consistency across all ingested domains.
- Human expert reviews the 5% of rules Screamer flagged as uncertain.
One session, a few hours.
**Day 1-2: Profiling day**
- Benchmark harness sweeps all 72 Tensix cores: measure instruction
latency, memory bandwidth, GC pause distribution, dispatch overhead.
- Each measurement is a fact with `:provenance :benchmark`.
- The benchmark harness is itself verified by ACL2 (it runs inside
a controlled sandbox, bounded time, no side effects on production data).
**Day 2-3: Active probing day**
- The system generates synthetic microcode routines: short programs
that exercise specific instructions, specific GC patterns, specific
dispatch paths.
- It loads them onto spare Tensix cores, measures actual latency, and
compares against the spec.
- Discrepancies become facts: `(:entity "core-42" :relation :dispatch-latency
:value "14 cycles" :source :measured :expected "12 cycles" :provenance :probe)`.
- After ~1,000 probes, the system knows the hardware's actual behavior
better than the published spec does.
**Day 3-7: Transfer and sufficiency**
- ACL2's existing knowledge about induction, rewriting, and termination
transfers directly to verifying microcode routines (same logic, different
subject matter).
- Screamer aligns microcode verification patterns with existing gate
verification patterns — both are structural proofs over finite state.
- The benchmark facts give ACL2 a concrete cost model. ACL2 can now
prove not just correctness but also "this microcode routine is at
least 10% faster than the current implementation."
- Sufficiency flip for microcode generation: the system proposes new
dispatch routines, ACL2 verifies them, Screamer checks against
hardware constraints, the gate stack blocks anything unsafe. Zero LLM
tokens for the optimization loop.
**Week 2-4: Self-optimizing system**
- The system profiles its own gate verification latency (already
instrumented via telemetry, Phase v0.66.0).
- It identifies the hot path: "fact-query accounts for 34% of verify time."
- It generates a new dispatch routine for fact-query, targets the
nearest idle Tensix core, loads it, benchmarks, and commits if
faster.
- The ontology now includes facts about its own optimization history:
`(:entity "fact-query-dispatch-v3" :relation :speedup-baseline
:value "1.34x" :provenance :self-optimize)`.
**After the flip: purely symbolic optimization**
The LLM is no longer needed for any optimization proposal. The system
profiles, proposes, verifies, tests, and commits entirely within the
symbolic engine. The LLM remains only for the boundary: interpreting
a human's high-level goal ("make the system faster") into a structured
optimization target, and formatting the benchmark report for human
readability. Those calls shrink toward zero as the system internalizes
common optimization goals as gate rules.
*** The surprising result for bootstrapping specifically
Because every subdomain of the Lisp Machine bootstrap is software,
and software is the most codifiable domain, the entire bootstrap
can flip in **days to weeks** with a single human review session.
The bottleneck is not knowledge acquisition. It is not the LLM's
coding ability. It is the initial human review of the 5% of
ambiguous rules that Screamer flags — a session measured in hours,
not weeks.
The Tenstorrent approach makes this even faster because the
microcode is software (RISC-V assembly), not hardware (FPGA
bitstream). The system can propose, load, test, and roll back
a new dispatch routine in seconds. An FPGA path would add
synthesis time (minutes to hours per iteration), stretching
the bootstrap from days to months.
A system with a Tenstorrent P150, the AGPL Passepartout code,
a RISC-V cross-compiler, and one patient human who reviews the
contrastive queries can achieve a self-driving Lisp Machine in
under a month.
Large refactoring projects (extract module, rename API, split monolith) Large refactoring projects (extract module, rename API, split monolith)
are the hardest test for any AI agent. Current approaches (Claude Code, are the hardest test for any AI agent. Current approaches (Claude Code,
Copilot) handle them probabilistically — every step costs tokens, and Copilot) handle them probabilistically — every step costs tokens, and