refactoring: semantic equivalence boundary, self-driving Lisp Machine
- ACL2 proves semantic equivalence for Passepartout's own Lisp code today; for other languages via logical specification modeling - CIC prover (future) extends to dependent-type-level equivalence across language boundaries - Self-driving threshold: when system can synthesize and load its own FPGA microcode or RISC-V dispatch from within the running image - Tenstorrent P150 (72 RISC-V cores) is particularly interesting: microcode is RISC-V software, not FPGA hardware — system writes, compiles, loads, benchmarks its own core dispatch logic
This commit is contained in:
@@ -594,7 +594,133 @@ context would not accept an unverified upgrade anyway.
|
|||||||
signed and verified against the hardware root of trust before
|
signed and verified against the hardware root of trust before
|
||||||
applying.
|
applying.
|
||||||
|
|
||||||
** Large refactoring in a neurosymbolic planner
|
** Large refactoring in a neurosymbolic planner — semantic equivalence
|
||||||
|
|
||||||
|
*** The workflow
|
||||||
|
|
||||||
|
ACL2 proves semantic equivalence of programs written in its own
|
||||||
|
logic — which includes Passepartout's own source code. When the
|
||||||
|
system refactors its own skills, ACL2 can prove the new function
|
||||||
|
produces the same outputs for all inputs as the old one. This is
|
||||||
|
standard ACL2 practice (verifying compiler optimizations, sort
|
||||||
|
algorithm replacements).
|
||||||
|
|
||||||
|
For other languages (Python, Java, JavaScript), the path is:
|
||||||
|
1. Model the critical subset (API surface, contracts, data
|
||||||
|
transformations) in ACL2 as a logical specification
|
||||||
|
2. Prove the specification is preserved across the refactoring
|
||||||
|
3. The actual implementation stays in the target language —
|
||||||
|
ACL2 proves the structural contract, not the runtime behavior
|
||||||
|
|
||||||
|
The CIC prover upgrade (Lean-in-Lisp, planned as future work)
|
||||||
|
would extend this to dependent-type-level equivalence proofs
|
||||||
|
across language boundaries — verifying that a Rust API binding
|
||||||
|
correctly wraps a C library, or that a Python refactoring
|
||||||
|
preserves the type-level contract of the original.
|
||||||
|
|
||||||
|
** The self-driving Lisp Machine on FPGA or Tenstorrent
|
||||||
|
|
||||||
|
A Tenstorrent P150 (~72 RISC-V Tensix cores on a PCIe card) or
|
||||||
|
a mid-range FPGA (AMD Alveo, Intel Agilex) offers enough
|
||||||
|
hardware to run a full Passepartout image with Lisp microcode
|
||||||
|
acceleration. The host Linux system provides boot, I/O, and
|
||||||
|
thermal management; the accelerator card provides the Lisp
|
||||||
|
execution fabric.
|
||||||
|
|
||||||
|
*** What it can do today
|
||||||
|
|
||||||
|
- **Run the full symbolic engine.** ACL2, Screamer, VivaceGraph,
|
||||||
|
and the fact store are pure Lisp — they run on any Lisp backend.
|
||||||
|
The RISC-V cores on a Tenstorrent or the soft-core on an FPGA
|
||||||
|
provide enough compute for real-time gate verification and
|
||||||
|
constraint solving.
|
||||||
|
|
||||||
|
- **Hot-reload skills and macro layers.** The Lisp image loads
|
||||||
|
skills, tangles Org files, compiles ACL2 books, and registers
|
||||||
|
metafunctions — all without reboot. The FPGA fabric can be
|
||||||
|
reprogrammed with new microcode in milliseconds.
|
||||||
|
|
||||||
|
- **Manage its own knowledge base.** The fact store grows and
|
||||||
|
evolves. Gate rules are proposed by the LLM and verified by
|
||||||
|
ACL2. Ontology versions are tracked. The system knows what
|
||||||
|
it knows and what changed.
|
||||||
|
|
||||||
|
- **Roll back failed upgrades.** Merkle snapshots provide
|
||||||
|
instant undo for both software state and FPGA configuration.
|
||||||
|
|
||||||
|
*** What it needs to cross the threshold to self-driving
|
||||||
|
|
||||||
|
The system is not yet fully self-driving because three things
|
||||||
|
still require external intervention:
|
||||||
|
|
||||||
|
1. **The LLM dependency.** The 10% I/O translation (natural
|
||||||
|
language → structured goal, structured result → natural
|
||||||
|
language) requires an LLM. A small local model (Phi-4,
|
||||||
|
Qwen 2.5) on the host or card can serve this. The symbolic
|
||||||
|
engine handles everything else. Once sufficiency flips
|
||||||
|
(Phase 4), even the LLM is rarely needed.
|
||||||
|
|
||||||
|
2. **Hardware driver development.** The FPGA microcode (tagged
|
||||||
|
memory, hardware GC, Lisp dispatch in hardware) is currently
|
||||||
|
written by humans. The system could eventually propose new
|
||||||
|
microcode patterns from profiling data — "your GC accounts
|
||||||
|
for 12% of runtime; here is a hardware GC barrier that
|
||||||
|
reduces it to 3%" — but the synthesis and verification of
|
||||||
|
hardware descriptions (VHDL, Verilog) requires a separate
|
||||||
|
toolchain.
|
||||||
|
|
||||||
|
3. **The initial bootstrap.** The first FPGA load, the first
|
||||||
|
Linux boot, the first Lisp image — these are done by a
|
||||||
|
human or a pre-existing system. Once bootstrapped, the
|
||||||
|
system manages itself. The threshold is crossed when the
|
||||||
|
system can design, compile, and load its own FPGA microcode
|
||||||
|
from within the running image.
|
||||||
|
|
||||||
|
*** The threshold
|
||||||
|
|
||||||
|
The self-driving threshold is crossed when the system can
|
||||||
|
synthesize and load its own FPGA microcode or Tensix dispatch
|
||||||
|
programs from within the running Lisp image. At that point:
|
||||||
|
|
||||||
|
- The system profiles its own gate verification latency
|
||||||
|
- It proposes a new microcoded instruction for the hot path
|
||||||
|
- It compiles Verilog from ACL2-verified specifications
|
||||||
|
- It reprograms the FPGA fabric via PCIe DMA from within SBCL
|
||||||
|
- It benchmarks the new instruction against the old one
|
||||||
|
- If throughput improves, the new microcode becomes permanent
|
||||||
|
- If not, it rolls back and tries another approach
|
||||||
|
|
||||||
|
This is not science fiction — it is the natural extension of
|
||||||
|
an architecture that already hot-reloads its own code, tracks
|
||||||
|
its own performance telemetry, and verifies its own changes
|
||||||
|
before committing them. The hardware description language is
|
||||||
|
the last abstraction boundary.
|
||||||
|
|
||||||
|
*** What stops it from being full science fiction
|
||||||
|
|
||||||
|
| Barrier | Status | Path |
|
||||||
|
|---------|--------|------|
|
||||||
|
| LLM dependency | Phase 4 flip reduces it to near-zero | Already designed |
|
||||||
|
| Hardware microcode synthesis | Most speculative | Requires hardware DSL verified by ACL2, then compiled to FPGA bitstream |
|
||||||
|
| Initial bootstrap | One-time human action | After first load, system manages itself |
|
||||||
|
| Power and thermal | Handled by host Linux | Unchanged |
|
||||||
|
| PCIe DMA from SBCL | Feasible with sb-alien + libpcie | Needs driver, but well-understood |
|
||||||
|
|
||||||
|
The Tenstorrent approach is particularly interesting because
|
||||||
|
its Tensix cores are *already* RISC-V processors. The microcode
|
||||||
|
is not FPGA logic — it's a RISC-V program. The system can write
|
||||||
|
RISC-V assembly, compile it with the RISC-V toolchain, load it
|
||||||
|
onto the Tensix cores, and benchmark the result. This is
|
||||||
|
dramatically simpler than FPGA synthesis because it's software,
|
||||||
|
not hardware.
|
||||||
|
|
||||||
|
A Tenstorrent P150 running Passepartout would be: 72 RISC-V
|
||||||
|
cores running Lisp microcode, one core dedicated to the ACL2
|
||||||
|
prover, one to Screamer, the rest to gate verification and
|
||||||
|
fact store operations. The host Linux system handles I/O and
|
||||||
|
the LLM. The system designs its own core dispatch logic,
|
||||||
|
loads it onto idle cores, and verifies the result with ACL2
|
||||||
|
before committing.
|
||||||
|
|
||||||
Large refactoring projects (extract module, rename API, split monolith)
|
Large refactoring projects (extract module, rename API, split monolith)
|
||||||
are the hardest test for any AI agent. Current approaches (Claude Code,
|
are the hardest test for any AI agent. Current approaches (Claude Code,
|
||||||
|
|||||||
Reference in New Issue
Block a user