lisp-machine-bootstrap: all software domains flip in days-weeks

- Every subdomain for bootstrapping the Lisp Machine is software:
  RISC-V ISA, SBCL runtime, ACL2 logic, CIC type theory, compiler
  optimization, device drivers. Every one flips.
- Fastest sequence: Day 1 ingestion (LLM + human review), Day 1-2
  profiling (benchmark sweep), Day 2-3 active probing (synthetic
  microcode routines), Day 3-7 transfer + sufficiency (ACL2 verifies
  new dispatch routines, zero LLM tokens)
- Result: self-driving Lisp Machine in under a month with one
  human review session and a Tenstorrent P150
This commit is contained in:
Hermes
2026-05-21 18:58:26 +00:00
parent 2d0d6d478a
commit 80daaa4830

View File

@@ -850,6 +850,115 @@ only checks for safety (no shell commands, no file deletions)
and passes through everything else to the LLM. The system is
honest about its frontier.
*** Bootstrapping the Lisp Machine: all domains are software
For the concrete goal of bootstrapping a self-driving Lisp Machine,
every domain involved is software — the most codifiable domain in
existence. Code has formal specifications, documented ISAs,
deterministic behavior, and objectively testable correctness.
Every subdomain required for the bootstrap flips.
| Subdomain | Knowledge type | Flip timeline | Why |
|-----------|---------------|---------------|-----|
| RISC-V ISA, Tenstorrent Tensix dispatch | Structural (ISA spec, API docs) + performance (profiling) | Days | Published spec, deterministic hardware, benchmark harness characterizes real behavior |
| SBCL runtime internals (GC, type dispatch, threading) | Structural (source code) + performance (latency profiles) | Days | Full source available, system can instrument itself |
| ACL2 metafunctions and macro layers | Structural (the logic is ACL2's own) | Instant | The theorem language is already the system's native logic — no translation step |
| FPGA/Verilog descriptions (if FPGA path) | Structural (VHDL/Verilog semantics) + performance (timing analysis, power) | Weeks | Published language semantics, but synthesis is slower and bitstream verification is harder than RISC-V |
| CIC prover kernel | Structural (type theory rules — these ARE formal) | Days | Mathematics is the most codified domain. ACL2 already does structural verification. Building a CIC kernel that ACL2 verifies is well-understood work. |
| Operating system interfaces, device drivers | Structural (syscall API, register maps) + empirical (test results) | Weeks | Published interfaces, deterministic behavior, but hardware quirks require empirical probing |
| Compiler optimization | Structural (IR semantics, optimization passes) + performance (benchmark before/after) | Weeks | Published semantics, objective quality metric (faster = better), benchmark harness measures |
Every single subdomain flips. The only variable is calendar days
to accumulate the knowledge.
*** The fastest acquisition sequence
Optimized for minimal wall-clock time to a self-driving Lisp Machine:
**Day 1: Ingestion day**
- LLM translates: RISC-V ISA spec, SBCL source, Tenstorrent API docs,
ACL2 reference, CIC type theory rules. All structural knowledge
enters the fact store in one parallel pass.
- ACL2 verifies consistency across all ingested domains.
- Human expert reviews the 5% of rules Screamer flagged as uncertain.
One session, a few hours.
**Day 1-2: Profiling day**
- Benchmark harness sweeps all 72 Tensix cores: measure instruction
latency, memory bandwidth, GC pause distribution, dispatch overhead.
- Each measurement is a fact with `:provenance :benchmark`.
- The benchmark harness is itself verified by ACL2 (it runs inside
a controlled sandbox, bounded time, no side effects on production data).
**Day 2-3: Active probing day**
- The system generates synthetic microcode routines: short programs
that exercise specific instructions, specific GC patterns, specific
dispatch paths.
- It loads them onto spare Tensix cores, measures actual latency, and
compares against the spec.
- Discrepancies become facts: `(:entity "core-42" :relation :dispatch-latency
:value "14 cycles" :source :measured :expected "12 cycles" :provenance :probe)`.
- After ~1,000 probes, the system knows the hardware's actual behavior
better than the published spec does.
**Day 3-7: Transfer and sufficiency**
- ACL2's existing knowledge about induction, rewriting, and termination
transfers directly to verifying microcode routines (same logic, different
subject matter).
- Screamer aligns microcode verification patterns with existing gate
verification patterns — both are structural proofs over finite state.
- The benchmark facts give ACL2 a concrete cost model. ACL2 can now
prove not just correctness but also "this microcode routine is at
least 10% faster than the current implementation."
- Sufficiency flip for microcode generation: the system proposes new
dispatch routines, ACL2 verifies them, Screamer checks against
hardware constraints, the gate stack blocks anything unsafe. Zero LLM
tokens for the optimization loop.
**Week 2-4: Self-optimizing system**
- The system profiles its own gate verification latency (already
instrumented via telemetry, Phase v0.66.0).
- It identifies the hot path: "fact-query accounts for 34% of verify time."
- It generates a new dispatch routine for fact-query, targets the
nearest idle Tensix core, loads it, benchmarks, and commits if
faster.
- The ontology now includes facts about its own optimization history:
`(:entity "fact-query-dispatch-v3" :relation :speedup-baseline
:value "1.34x" :provenance :self-optimize)`.
**After the flip: purely symbolic optimization**
The LLM is no longer needed for any optimization proposal. The system
profiles, proposes, verifies, tests, and commits entirely within the
symbolic engine. The LLM remains only for the boundary: interpreting
a human's high-level goal ("make the system faster") into a structured
optimization target, and formatting the benchmark report for human
readability. Those calls shrink toward zero as the system internalizes
common optimization goals as gate rules.
*** The surprising result for bootstrapping specifically
Because every subdomain of the Lisp Machine bootstrap is software,
and software is the most codifiable domain, the entire bootstrap
can flip in **days to weeks** with a single human review session.
The bottleneck is not knowledge acquisition. It is not the LLM's
coding ability. It is the initial human review of the 5% of
ambiguous rules that Screamer flags — a session measured in hours,
not weeks.
The Tenstorrent approach makes this even faster because the
microcode is software (RISC-V assembly), not hardware (FPGA
bitstream). The system can propose, load, test, and roll back
a new dispatch routine in seconds. An FPGA path would add
synthesis time (minutes to hours per iteration), stretching
the bootstrap from days to months.
A system with a Tenstorrent P150, the AGPL Passepartout code,
a RISC-V cross-compiler, and one patient human who reviews the
contrastive queries can achieve a self-driving Lisp Machine in
under a month.
Large refactoring projects (extract module, rename API, split monolith)
are the hardest test for any AI agent. Current approaches (Claude Code,
Copilot) handle them probabilistically — every step costs tokens, and