Rewrite lisp-provers page as evergreen reference — remove conversational framing, 'things I was wrong about', 'key insight from conversation'

2026-06-03 18:38:20 +00:00
parent 179bcbaaa4
commit a176eedd2b
1 changed files with 49 additions and 53 deletions
--- a/ideas/lisp-provers-and-rust-comparison.org
+++ b/ideas/lisp-provers-and-rust-comparison.org
@@ -5,33 +5,36 @@
 #+title: Lisp, Provers, and vs Rust
 #+filetags: :ACL2:HOL:LLM:bootstrapping:lisp:rust:theorem-proving:

-* Lisp vs Rust — the honest gap
+* Lisp vs Rust — Assessed Gap

-** True defects in Lisp (inherent, not fixable without changing what Lisp is)
- GC pauses — everything is a tagged pointer on the heap; mitigatable but not eliminatable
- No memory model / no Send/Sync — threads share everything; concurrency correctness is a discipline, not a guarantee
+** Inherent limits of Lisp
+
+Two properties cannot be eliminated without changing what Lisp is:
+
+- GC pauses — everything is a tagged pointer on the heap; mitigatable (generational, concurrent collectors) but not eliminatable
+- No memory model / no Send/Sync — threads share everything by default; concurrency correctness is a discipline enforced by code review or a prover, not by the compiler
+
+These are real. Lisp will not replace Rust for real-time audio, kernel drivers, or embedded systems where every allocation is deterministic. For everything else — knowledge management, protocol synthesis, agent coordination, interactive systems — GC pauses at predictable intervals are acceptable.
+
+** Fixable gaps (implementation and ecosystem, not language philosophy)

-** Achievable improvements (implementation/ecosystem, not philosophy)
 - Package manager and build system like Cargo (ASDF + Quicklisp is from 2005)
- LSP-level tooling (SBCL infers types well — surface it)
+- LSP-level tooling that surfaces SBCL's type inference
 - Static binaries and deployment (save-lisp-and-die + tree-shaker exists, not standard)
 - Standard library modernization (hash sets, priority queues, JSON, HTTP, async — Clojure proved this)
 - Ecosystem scale (network effect, fixable with sustained effort)

-** Things I was wrong about (corrected during conversation)
- The type system is not a fundamental limitation — Coalton embeds sound Hindley-Milner typing inside CL, preserving homoiconicity. You opt in per-file.
- Unpredictable performance is not a language defect — it's a consequence of macro-heavy programming styles. Write in a disciplined subset with declarations and SBCL compiles to within 2x of C. The compiler is deterministic.
- Macros don't cause unpredictability; they enable styles that confuse the optimizer. You choose not to use those styles.
+** Common misconceptions about Lisp's limits

-** Everything else is ecosystem and convention, not language capability: Unicode handling, pattern matching, async I/O, immutable data structures, module systems, error messages.
+- The type system is not a fundamental limitation. Coalton embeds sound Hindley-Milner typing inside CL, preserving homoiconicity. Opt-in per-file.
+- Unpredictable performance is not a language defect — it is a consequence of macro-heavy programming styles. Write in a disciplined subset with type declarations and SBCL compiles to within 2x of C on hot paths. The compiler is deterministic.
+- Macros do not cause unpredictability; they enable styles that confuse the optimizer. The choice to use those styles is the programmer's.
+- Unicode handling, pattern matching, async I/O, immutable data structures, module systems, error messages — all library or convention work, not language capability.
+- The numeric tower (auto-promoting integers, ratios, floats, complex) is a genuine trade-off: mathematically correct, but blocks SIMD and the overflow assumptions that make Rust's arithmetic fast. Choose based on domain.

-** The numeric tower is a genuine trade-off: Lisp's auto-promoting math (integers, ratios, floats, complex) is mathematically correct but blocks SIMD and overflow assumptions that make Rust's arithmetic fast. Pick your priority.
+* Prover Architecture

-* Prover discussion
-
-** The architecture for a Lisp prover
-
-An autonomous agent with mathematical proving can close the gap between Lisp's flexibility and Rust's guarantees /without/ making Lisp rigid. The prover sits between the programmer and the compiler:
+An autonomous prover can close the gap between Lisp's flexibility and Rust's guarantees /without/ making Lisp rigid. The prover sits between the programmer and the compiler:

 #+BEGIN_EXAMPLE
 [Lisp code] → [Passepartout prover] → [SBCL compiler] → [binary]
@@ -39,36 +42,34 @@ An autonomous agent with mathematical proving can close the gap between Lisp's f
         [Verified: race-free, type-safe, bounded memory, constant-factor performance]
 #+END_EXAMPLE

-The prover can:
- Prove memory safety without a borrow checker — analyze call graphs and data flow
- Predict performance — unfold macros, SSA-ize, tell the programmer what will optimize
- Infer types across untyped CL code — flag contradictions, converge toward provably-correct programs
+The prover proves:
+- Memory safety without a borrow checker — by analyzing call graphs and data flow
+- Performance predictability — by unfolding macros, constructing SSA form, and reporting what the optimizer will produce
+- Type soundness across untyped CL code — by inferring types, flagging contradictions, and converging toward provably-correct programs

-This preserves Lisp's core property: the language stays fluid. The proving agent is an external constraint, not a compiler-enforced limitation. You can always bypass it for fast prototyping.
+This preserves Lisp's defining property: the language stays fluid. The prover is an external constraint, not a compiler-enforced limitation. The programmer can always bypass it for fast prototyping.

-** Where the coming shift ranks languages
+** Impact by language

-Rust has the most to lose — its entire value proposition is compile-time safety. An AI prover that verifies the same properties about Lisp code makes the borrow checker a solved problem. Rust keeps the advantage only as long as the prover is slower and more expensive than a type system that runs in milliseconds.
+- Rust has the most to lose. Its entire value proposition is compile-time safety. An AI prover that verifies the same properties about Lisp code makes the borrow checker a solved problem. Rust keeps its advantage only as long as the prover is slower and more expensive than a type system that runs in milliseconds.
+- Python and JS have the most surface to gain. A prover could give them type soundness, no null pointers, and thread safety without changing the languages — transpiling through a verified Lisp intermediate.
+- Lisp is uniquely positioned because of homoiconicity: the prover works on S-expressions, which /are/ the AST. No parse/unparse gap. The prover and the programmer work on the same artifact.

-Python/JS have the most surface to gain — an AI prover could give them type soundness, no null pointers, thread safety, without changing the languages.
-
-Lisp is uniquely positioned because of homoiconicity. The prover works on S-expressions, which /are/ the AST. No parse/unparse gap. The prover and programmer work on exactly the same artifact. The prover can rewrite a macro expansion without expanding it, annotate specific cons cells with lifetime constraints, and the output is standard diffable code.
-
-** ACL2 and the bootstrap
+** ACL2 as foundation

 ACL2 is the right foundation but not the complete solution:
- It is first-order — can't quantify over functions, can't prove meta-level properties about the evaluator itself
- It is interactive — human must guide the proof, lemmas must be manually provided
- It is a subset — pure, no side effects, no CLOS, no I/O, no threading
- It doesn't scale to everyday production code without massive human effort
+- It is first-order — cannot quantify over functions, cannot prove meta-level properties about the evaluator itself
+- It is interactive — a human must guide the proof; lemmas must be manually provided
+- It is a subset of CL — pure, no side effects, no CLOS, no I/O, no threading
+- It does not scale to everyday production code without massive human effort

-But ACL2 proves the architecture is viable: a Lisp-based prover verifying Lisp programs is deeply natural.
+But ACL2 proves the architecture is viable: a Lisp-based prover verifying Lisp programs is natural, not forced.

 ** Bootstrapping a HOL prover via ACL2 + Screamer

-The HOL kernel is famously tiny (HOL Light: ~400 lines of OCaml, 10 primitive inference rules). The path:
+The HOL kernel (HOL Light: ~400 lines of OCaml, 10 primitive inference rules) is a well-known artifact — an engineering task, not a research problem. The bootstrap path:

-1. Write the HOL kernel in pure CL (~500-800 lines)
+1. Transcribe the HOL kernel into pure CL (~500-800 lines)
 2. ACL2 verifies the kernel implements the inference rules correctly
 3. Screamer provides the proof search engine (backtracking maps to proof tree exploration)
 4. HOL proves meta-level properties: macro soundness, evaluator equivalence, compositionality of skills, safety of self-modification
@@ -76,27 +77,22 @@ The HOL kernel is famously tiny (HOL Light: ~400 lines of OCaml, 10 primitive in

 Trust chain: SBCL runtime → ACL2 (verified by its own meta-circular proof) → HOL kernel (verified by ACL2) → any HOL theorem.

-This is not a research problem — the kernel is a well-known artifact. It's an engineering task: transcribe HOL Light's kernel into CL, verify with ACL2, couple with Screamer.
-
-** An LLM can write the HOL kernel
-
-The HOL kernel is the ideal LLM task: ~400-800 lines of pure code, fully specified, well-documented, stateless, with unambiguous correctness criteria.
-
-Workflow: LLM writes plausible kernel → ACL2 verifies it → ACL2 catches hallucinations → iteration converges in 2-3 attempts.
-
-The LLM is allowed to be wrong. ACL2 is the filter. This is achievable today.
+The HOL kernel is an ideal LLM task — small, fully specified, stateless, with unambiguous correctness criteria. ACL2 filters hallucinations. Iteration converges in 2-3 attempts.

 ** Comparison with Lean

-Lean is the standard for mathematics and Passepartout will not compete with it for that purpose:
- mathlib4 has 100,000+ theorems. A bootstrapped HOL prover starts at zero.
- Lean has a community of mathematicians; a Lisp prover won't attract one.
- Lean's elaborator is the result of decades of type theory research; a HOL kernel + Screamer doesn't touch that.
+Lean is the standard for interactive theorem proving in mathematics. Passepartout will not compete with it for that purpose:
+- mathlib4 has over 100,000 theorems. A bootstrapped HOL prover starts at zero.
+- Lean has a community of mathematicians; a Lisp prover will not attract one.
+- Lean's elaborator is the result of decades of type theory research; a HOL kernel plus Screamer does not approach it.

-Where Passepartout's prover wins: it verifies /running code/, not abstract mathematics. It is embedded in a self-modifying agent. Its design is optimized for program properties (memory safety, race freedom, macro soundness), not mathematical expressiveness.
+Where Passepartout's prover wins: it verifies /running code/, not abstract mathematics. It is embedded in a self-modifying agent. Its design optimizes for program properties (memory safety, race freedom, macro soundness), not mathematical expressiveness. Convergence on the mathematical side is possible if the agent automates proof discovery at sufficient speed — but that question is unanswered and is one Passepartout is designed to answer.

-Convergence is possible if the agent automates proof discovery at sufficient speed, but that question is unanswered and is the one Passepartout is designed to answer.
+* The self-writing machine and proofs

-** Key insight from the conversation
+The self-writing machine does not need to prove itself from the void. It proves each next step using everything proven before. The writer (LLM) is allowed to be heuristic. The prover only needs to be conservative and accurate. Every time a human signs off on a proof, it becomes future automation.

-The self-writing machine doesn't need to prove itself from the void. It proves each next step using everything proven before. The writer is allowed to be heuristic (LLM). The prover just needs to be conservative and accurate. Every time a human signs off on a proof, it becomes future automation.
+Reference:
+- [[id:971cd9e7-2cc5-4743-8042-2469dbe4078f][CL Modernization]] — the project that builds the prover and modernizes the CL ecosystem
+- [[id:dddd52a7-adb8-470e-a459-614ade5f76af][Closing the Lisp Gap]] — detailed performance and ecosystem analysis
+- [[id:1c3ec48b-446c-50d2-b53e-126a81f5143f][Passepartout]] — the architecture that houses the prover