lisp-machine-bootstrap: all software domains flip in days-weeks
- Every subdomain for bootstrapping the Lisp Machine is software: RISC-V ISA, SBCL runtime, ACL2 logic, CIC type theory, compiler optimization, device drivers. Every one flips. - Fastest sequence: Day 1 ingestion (LLM + human review), Day 1-2 profiling (benchmark sweep), Day 2-3 active probing (synthetic microcode routines), Day 3-7 transfer + sufficiency (ACL2 verifies new dispatch routines, zero LLM tokens) - Result: self-driving Lisp Machine in under a month with one human review session and a Tenstorrent P150
This commit is contained in:
@@ -850,6 +850,115 @@ only checks for safety (no shell commands, no file deletions)
|
||||
and passes through everything else to the LLM. The system is
|
||||
honest about its frontier.
|
||||
|
||||
*** Bootstrapping the Lisp Machine: all domains are software
|
||||
|
||||
For the concrete goal of bootstrapping a self-driving Lisp Machine,
|
||||
every domain involved is software — the most codifiable domain in
|
||||
existence. Code has formal specifications, documented ISAs,
|
||||
deterministic behavior, and objectively testable correctness.
|
||||
Every subdomain required for the bootstrap flips.
|
||||
|
||||
| Subdomain | Knowledge type | Flip timeline | Why |
|
||||
|-----------|---------------|---------------|-----|
|
||||
| RISC-V ISA, Tenstorrent Tensix dispatch | Structural (ISA spec, API docs) + performance (profiling) | Days | Published spec, deterministic hardware, benchmark harness characterizes real behavior |
|
||||
| SBCL runtime internals (GC, type dispatch, threading) | Structural (source code) + performance (latency profiles) | Days | Full source available, system can instrument itself |
|
||||
| ACL2 metafunctions and macro layers | Structural (the logic is ACL2's own) | Instant | The theorem language is already the system's native logic — no translation step |
|
||||
| FPGA/Verilog descriptions (if FPGA path) | Structural (VHDL/Verilog semantics) + performance (timing analysis, power) | Weeks | Published language semantics, but synthesis is slower and bitstream verification is harder than RISC-V |
|
||||
| CIC prover kernel | Structural (type theory rules — these ARE formal) | Days | Mathematics is the most codified domain. ACL2 already does structural verification. Building a CIC kernel that ACL2 verifies is well-understood work. |
|
||||
| Operating system interfaces, device drivers | Structural (syscall API, register maps) + empirical (test results) | Weeks | Published interfaces, deterministic behavior, but hardware quirks require empirical probing |
|
||||
| Compiler optimization | Structural (IR semantics, optimization passes) + performance (benchmark before/after) | Weeks | Published semantics, objective quality metric (faster = better), benchmark harness measures |
|
||||
|
||||
Every single subdomain flips. The only variable is calendar days
|
||||
to accumulate the knowledge.
|
||||
|
||||
*** The fastest acquisition sequence
|
||||
|
||||
Optimized for minimal wall-clock time to a self-driving Lisp Machine:
|
||||
|
||||
**Day 1: Ingestion day**
|
||||
- LLM translates: RISC-V ISA spec, SBCL source, Tenstorrent API docs,
|
||||
ACL2 reference, CIC type theory rules. All structural knowledge
|
||||
enters the fact store in one parallel pass.
|
||||
- ACL2 verifies consistency across all ingested domains.
|
||||
- Human expert reviews the 5% of rules Screamer flagged as uncertain.
|
||||
One session, a few hours.
|
||||
|
||||
**Day 1-2: Profiling day**
|
||||
- Benchmark harness sweeps all 72 Tensix cores: measure instruction
|
||||
latency, memory bandwidth, GC pause distribution, dispatch overhead.
|
||||
- Each measurement is a fact with `:provenance :benchmark`.
|
||||
- The benchmark harness is itself verified by ACL2 (it runs inside
|
||||
a controlled sandbox, bounded time, no side effects on production data).
|
||||
|
||||
**Day 2-3: Active probing day**
|
||||
- The system generates synthetic microcode routines: short programs
|
||||
that exercise specific instructions, specific GC patterns, specific
|
||||
dispatch paths.
|
||||
- It loads them onto spare Tensix cores, measures actual latency, and
|
||||
compares against the spec.
|
||||
- Discrepancies become facts: `(:entity "core-42" :relation :dispatch-latency
|
||||
:value "14 cycles" :source :measured :expected "12 cycles" :provenance :probe)`.
|
||||
- After ~1,000 probes, the system knows the hardware's actual behavior
|
||||
better than the published spec does.
|
||||
|
||||
**Day 3-7: Transfer and sufficiency**
|
||||
- ACL2's existing knowledge about induction, rewriting, and termination
|
||||
transfers directly to verifying microcode routines (same logic, different
|
||||
subject matter).
|
||||
- Screamer aligns microcode verification patterns with existing gate
|
||||
verification patterns — both are structural proofs over finite state.
|
||||
- The benchmark facts give ACL2 a concrete cost model. ACL2 can now
|
||||
prove not just correctness but also "this microcode routine is at
|
||||
least 10% faster than the current implementation."
|
||||
- Sufficiency flip for microcode generation: the system proposes new
|
||||
dispatch routines, ACL2 verifies them, Screamer checks against
|
||||
hardware constraints, the gate stack blocks anything unsafe. Zero LLM
|
||||
tokens for the optimization loop.
|
||||
|
||||
**Week 2-4: Self-optimizing system**
|
||||
- The system profiles its own gate verification latency (already
|
||||
instrumented via telemetry, Phase v0.66.0).
|
||||
- It identifies the hot path: "fact-query accounts for 34% of verify time."
|
||||
- It generates a new dispatch routine for fact-query, targets the
|
||||
nearest idle Tensix core, loads it, benchmarks, and commits if
|
||||
faster.
|
||||
- The ontology now includes facts about its own optimization history:
|
||||
`(:entity "fact-query-dispatch-v3" :relation :speedup-baseline
|
||||
:value "1.34x" :provenance :self-optimize)`.
|
||||
|
||||
**After the flip: purely symbolic optimization**
|
||||
|
||||
The LLM is no longer needed for any optimization proposal. The system
|
||||
profiles, proposes, verifies, tests, and commits entirely within the
|
||||
symbolic engine. The LLM remains only for the boundary: interpreting
|
||||
a human's high-level goal ("make the system faster") into a structured
|
||||
optimization target, and formatting the benchmark report for human
|
||||
readability. Those calls shrink toward zero as the system internalizes
|
||||
common optimization goals as gate rules.
|
||||
|
||||
*** The surprising result for bootstrapping specifically
|
||||
|
||||
Because every subdomain of the Lisp Machine bootstrap is software,
|
||||
and software is the most codifiable domain, the entire bootstrap
|
||||
can flip in **days to weeks** with a single human review session.
|
||||
|
||||
The bottleneck is not knowledge acquisition. It is not the LLM's
|
||||
coding ability. It is the initial human review of the 5% of
|
||||
ambiguous rules that Screamer flags — a session measured in hours,
|
||||
not weeks.
|
||||
|
||||
The Tenstorrent approach makes this even faster because the
|
||||
microcode is software (RISC-V assembly), not hardware (FPGA
|
||||
bitstream). The system can propose, load, test, and roll back
|
||||
a new dispatch routine in seconds. An FPGA path would add
|
||||
synthesis time (minutes to hours per iteration), stretching
|
||||
the bootstrap from days to months.
|
||||
|
||||
A system with a Tenstorrent P150, the AGPL Passepartout code,
|
||||
a RISC-V cross-compiler, and one patient human who reviews the
|
||||
contrastive queries can achieve a self-driving Lisp Machine in
|
||||
under a month.
|
||||
|
||||
Large refactoring projects (extract module, rename API, split monolith)
|
||||
are the hardest test for any AI agent. Current approaches (Claude Code,
|
||||
Copilot) handle them probabilistically — every step costs tokens, and
|
||||
|
||||
Reference in New Issue
Block a user