refactoring: semantic equivalence boundary, self-driving Lisp Machine

- ACL2 proves semantic equivalence for Passepartout's own Lisp code
  today; for other languages via logical specification modeling
- CIC prover (future) extends to dependent-type-level equivalence
  across language boundaries
- Self-driving threshold: when system can synthesize and load its
  own FPGA microcode or RISC-V dispatch from within the running image
- Tenstorrent P150 (72 RISC-V cores) is particularly interesting:
  microcode is RISC-V software, not FPGA hardware — system writes,
  compiles, loads, benchmarks its own core dispatch logic
This commit is contained in:
Hermes
2026-05-21 18:47:49 +00:00
parent 852fcae4a6
commit f9085a4690

View File

@@ -594,7 +594,133 @@ context would not accept an unverified upgrade anyway.
signed and verified against the hardware root of trust before signed and verified against the hardware root of trust before
applying. applying.
** Large refactoring in a neurosymbolic planner ** Large refactoring in a neurosymbolic planner — semantic equivalence
*** The workflow
ACL2 proves semantic equivalence of programs written in its own
logic — which includes Passepartout's own source code. When the
system refactors its own skills, ACL2 can prove the new function
produces the same outputs for all inputs as the old one. This is
standard ACL2 practice (verifying compiler optimizations, sort
algorithm replacements).
For other languages (Python, Java, JavaScript), the path is:
1. Model the critical subset (API surface, contracts, data
transformations) in ACL2 as a logical specification
2. Prove the specification is preserved across the refactoring
3. The actual implementation stays in the target language —
ACL2 proves the structural contract, not the runtime behavior
The CIC prover upgrade (Lean-in-Lisp, planned as future work)
would extend this to dependent-type-level equivalence proofs
across language boundaries — verifying that a Rust API binding
correctly wraps a C library, or that a Python refactoring
preserves the type-level contract of the original.
** The self-driving Lisp Machine on FPGA or Tenstorrent
A Tenstorrent P150 (~72 RISC-V Tensix cores on a PCIe card) or
a mid-range FPGA (AMD Alveo, Intel Agilex) offers enough
hardware to run a full Passepartout image with Lisp microcode
acceleration. The host Linux system provides boot, I/O, and
thermal management; the accelerator card provides the Lisp
execution fabric.
*** What it can do today
- **Run the full symbolic engine.** ACL2, Screamer, VivaceGraph,
and the fact store are pure Lisp — they run on any Lisp backend.
The RISC-V cores on a Tenstorrent or the soft-core on an FPGA
provide enough compute for real-time gate verification and
constraint solving.
- **Hot-reload skills and macro layers.** The Lisp image loads
skills, tangles Org files, compiles ACL2 books, and registers
metafunctions — all without reboot. The FPGA fabric can be
reprogrammed with new microcode in milliseconds.
- **Manage its own knowledge base.** The fact store grows and
evolves. Gate rules are proposed by the LLM and verified by
ACL2. Ontology versions are tracked. The system knows what
it knows and what changed.
- **Roll back failed upgrades.** Merkle snapshots provide
instant undo for both software state and FPGA configuration.
*** What it needs to cross the threshold to self-driving
The system is not yet fully self-driving because three things
still require external intervention:
1. **The LLM dependency.** The 10% I/O translation (natural
language → structured goal, structured result → natural
language) requires an LLM. A small local model (Phi-4,
Qwen 2.5) on the host or card can serve this. The symbolic
engine handles everything else. Once sufficiency flips
(Phase 4), even the LLM is rarely needed.
2. **Hardware driver development.** The FPGA microcode (tagged
memory, hardware GC, Lisp dispatch in hardware) is currently
written by humans. The system could eventually propose new
microcode patterns from profiling data — "your GC accounts
for 12% of runtime; here is a hardware GC barrier that
reduces it to 3%" — but the synthesis and verification of
hardware descriptions (VHDL, Verilog) requires a separate
toolchain.
3. **The initial bootstrap.** The first FPGA load, the first
Linux boot, the first Lisp image — these are done by a
human or a pre-existing system. Once bootstrapped, the
system manages itself. The threshold is crossed when the
system can design, compile, and load its own FPGA microcode
from within the running image.
*** The threshold
The self-driving threshold is crossed when the system can
synthesize and load its own FPGA microcode or Tensix dispatch
programs from within the running Lisp image. At that point:
- The system profiles its own gate verification latency
- It proposes a new microcoded instruction for the hot path
- It compiles Verilog from ACL2-verified specifications
- It reprograms the FPGA fabric via PCIe DMA from within SBCL
- It benchmarks the new instruction against the old one
- If throughput improves, the new microcode becomes permanent
- If not, it rolls back and tries another approach
This is not science fiction — it is the natural extension of
an architecture that already hot-reloads its own code, tracks
its own performance telemetry, and verifies its own changes
before committing them. The hardware description language is
the last abstraction boundary.
*** What stops it from being full science fiction
| Barrier | Status | Path |
|---------|--------|------|
| LLM dependency | Phase 4 flip reduces it to near-zero | Already designed |
| Hardware microcode synthesis | Most speculative | Requires hardware DSL verified by ACL2, then compiled to FPGA bitstream |
| Initial bootstrap | One-time human action | After first load, system manages itself |
| Power and thermal | Handled by host Linux | Unchanged |
| PCIe DMA from SBCL | Feasible with sb-alien + libpcie | Needs driver, but well-understood |
The Tenstorrent approach is particularly interesting because
its Tensix cores are *already* RISC-V processors. The microcode
is not FPGA logic — it's a RISC-V program. The system can write
RISC-V assembly, compile it with the RISC-V toolchain, load it
onto the Tensix cores, and benchmark the result. This is
dramatically simpler than FPGA synthesis because it's software,
not hardware.
A Tenstorrent P150 running Passepartout would be: 72 RISC-V
cores running Lisp microcode, one core dedicated to the ACL2
prover, one to Screamer, the rest to gate verification and
fact store operations. The host Linux system handles I/O and
the LLM. The system designs its own core dispatch logic,
loads it onto idle cores, and verifies the result with ACL2
before committing.
Large refactoring projects (extract module, rename API, split monolith) Large refactoring projects (extract module, rename API, split monolith)
are the hardest test for any AI agent. Current approaches (Claude Code, are the hardest test for any AI agent. Current approaches (Claude Code,