hermes-brain/projects/cl-modernization/lisp-provers-and-rust-comparison.org

:PROPERTIES:
:CREATED:  [2026-06-03 Tue]
:ID:       85f963a7-a10f-45cc-ace6-6edfeefee762
:END:
#+title: Lisp, Provers, and vs Rust
#+filetags: :ACL2:HOL:LLM:bootstrapping:lisp:rust:theorem-proving:

* Lisp vs Rust — Assessed Gap

** Inherent limits of Lisp

Two properties cannot be eliminated without changing what Lisp is:

- GC pauses — everything is a tagged pointer on the heap; mitigatable (generational, concurrent collectors) but not eliminatable
- No memory model / no Send/Sync — threads share everything by default; concurrency correctness is a discipline enforced by code review or a prover, not by the compiler

These are genuine costs, but they are contingent on hardware, not laws of nature. The Symbolics Genera OS was a full operating system written in Lisp — device drivers, window system, TCP/IP, filesystem — running on the Ivory processor with hardware tag checking and single-cycle CONS allocation. Lisp was a systems language; the chip was designed for it. The reason Lisp's GC pauses are costly on modern silicon is that CPUs are optimized for C's memory model: 64-byte cache lines for structs, prefetchers for linear access, TLBs for contiguous layout. A RISC-V Lisp extension (~50K gates) eliminates the mismatch entirely.

Without dedicated silicon, Lisp on commodity hardware has larger memory overhead and less predictable pause times than C or Rust on allocation-heavy workloads. For Passepartout's use cases — knowledge management, protocol synthesis, agent coordination, interactive systems — modern concurrent generational GCs (single-digit millisecond pauses) are acceptable.

** Fixable gaps (implementation and ecosystem, not language philosophy)

- Package manager and build system like Cargo (ASDF + Quicklisp is from 2005)
- LSP-level tooling that surfaces SBCL's type inference
- Static binaries and deployment (save-lisp-and-die + tree-shaker exists, not standard)
- Standard library modernization (hash sets, priority queues, JSON, HTTP, async — Clojure proved this)
- Ecosystem scale (network effect, fixable with sustained effort)

** Common misconceptions about Lisp's limits

- The type system is not a fundamental limitation. Coalton embeds sound Hindley-Milner typing inside CL, preserving homoiconicity. Opt-in per-file.
- Unpredictable performance is not a language defect — it is a consequence of macro-heavy programming styles. Write in a disciplined subset with type declarations and SBCL compiles to within 2x of C on hot paths. The compiler is deterministic.
- Macros do not cause unpredictability; they enable styles that confuse the optimizer. The choice to use those styles is the programmer's.
- Unicode handling, pattern matching, async I/O, immutable data structures, module systems, error messages — all library or convention work, not language capability.
- The numeric tower (auto-promoting integers, ratios, floats, complex) is a genuine trade-off: mathematically correct, but blocks SIMD and the overflow assumptions that make Rust's arithmetic fast. Choose based on domain.

* Prover Architecture

An autonomous prover can close the gap between Lisp's flexibility and Rust's guarantees /without/ making Lisp rigid. The prover sits between the programmer and the compiler:

#+BEGIN_EXAMPLE
[Lisp code] → [Passepartout prover] → [SBCL compiler] → [binary]
                   ↑
         [Verified: race-free, type-safe, bounded memory, constant-factor performance]
#+END_EXAMPLE

The prover proves:
- Memory safety without a borrow checker — by analyzing call graphs and data flow
- Performance predictability — by unfolding macros, constructing SSA form, and reporting what the optimizer will produce
- Type soundness across untyped CL code — by inferring types, flagging contradictions, and converging toward provably-correct programs

This preserves Lisp's defining property: the language stays fluid. The prover is an external constraint, not a compiler-enforced limitation. The programmer can always bypass it for fast prototyping.

** Impact by language

- Rust has the most to lose. Its entire value proposition is compile-time safety. An AI prover that verifies the same properties about Lisp code makes the borrow checker a solved problem. Rust keeps its advantage only as long as the prover is slower and more expensive than a type system that runs in milliseconds.
- Python and JS have the most surface to gain. A prover could give them type soundness, no null pointers, and thread safety without changing the languages — transpiling through a verified Lisp intermediate.
- Lisp is uniquely positioned because of homoiconicity: the prover works on S-expressions, which /are/ the AST. No parse/unparse gap. The prover and the programmer work on the same artifact.

** ACL2 as foundation

ACL2 is the right foundation but not the complete solution:
- It is first-order — cannot quantify over functions, cannot prove meta-level properties about the evaluator itself
- It is interactive — a human must guide the proof; lemmas must be manually provided
- It is a subset of CL — pure, no side effects, no CLOS, no I/O, no threading
- It does not scale to everyday production code without massive human effort

But ACL2 proves the architecture is viable: a Lisp-based prover verifying Lisp programs is natural, not forced.

** Bootstrapping a HOL prover via ACL2 + Screamer

The HOL kernel (HOL Light: ~400 lines of OCaml, 10 primitive inference rules) is a well-known artifact — an engineering task, not a research problem. The bootstrap path:

1. Transcribe the HOL kernel into pure CL (~500-800 lines)
2. ACL2 verifies the kernel implements the inference rules correctly
3. Screamer provides the proof search engine (backtracking maps to proof tree exploration)
4. HOL proves meta-level properties: macro soundness, evaluator equivalence, compositionality of skills, safety of self-modification
5. HOL theorems are reflected back as ACL2 rewrite rules — automation compounds

Trust chain: SBCL runtime → ACL2 (verified by its own meta-circular proof) → HOL kernel (verified by ACL2) → any HOL theorem.

The HOL kernel is an ideal LLM task — small, fully specified, stateless, with unambiguous correctness criteria. ACL2 filters hallucinations. Iteration converges in 2-3 attempts.

** Comparison with Lean

Lean is the standard for interactive theorem proving in mathematics. Passepartout will not compete with it for that purpose:
- mathlib4 has over 100,000 theorems. A bootstrapped HOL prover starts at zero.
- Lean has a community of mathematicians; a Lisp prover will not attract one.
- Lean's elaborator is the result of decades of type theory research; a HOL kernel plus Screamer does not approach it.

Where Passepartout's prover wins: it verifies /running code/, not abstract mathematics. It is embedded in a self-modifying agent. Its design optimizes for program properties (memory safety, race freedom, macro soundness), not mathematical expressiveness. Convergence on the mathematical side is possible if the agent automates proof discovery at sufficient speed — but that question is unanswered and is one Passepartout is designed to answer.

* The self-writing machine and proofs

The self-writing machine does not need to prove itself from the void. It proves each next step using everything proven before. The writer (LLM) is allowed to be heuristic. The prover only needs to be conservative and accurate. Every time a human signs off on a proof, it becomes future automation.

Reference:
- [[id:971cd9e7-2cc5-4743-8042-2469dbe4078f][CL Modernization]] — the project that builds the prover and modernizes the CL ecosystem
- [[id:dddd52a7-adb8-470e-a459-614ade5f76af][Closing the Lisp Gap]] — detailed performance and ecosystem analysis
- [[id:1c3ec48b-446c-50d2-b53e-126a81f5143f][Passepartout]] — the architecture that houses the prover