Files
hermes-brain/projects/passepartout/architecture/lisp-foundation.org

20 KiB
Raw Blame History

Lisp Foundation

Passepartout's architecture — verified gate, single address space, self-modifying agent — depends on running on a Lisp that can be proved correct. This document explains why Lisp is the right foundation, what gaps remain in the current ecosystem, and how the prover bootstraps from ACL2 to a verified HOL kernel to a self-verifying stack. It is the "Lisp layer" of the architecture, alongside the four-subsystem description in architecture.org and the design rationale in Design Decisions.

What It Is and Why

Common Lisp is a language from the 1990s with a 1950s syntax, a 1970s package manager, and a 2000s type system — yet it has never been beaten on its core bet: unmatched macro facility, interactive development, image-based workflow, and the most powerful bottom-up programming model ever designed. The Lisp foundation work within Passepartout is a systematic effort to bring the Lisp ecosystem forward by 25 years without sacrificing the language's defining character.

This is not a language fork. It is an ecosystem upgrade that preserves the 1994 ANSI standard while adding modern conventions (lockfiles, LSP, standard library, static binaries) and a verified prover (a HOL kernel verified by ACL2) that makes Lisp the safest language to write self-modifying code in — which is exactly what Passepartout is.

Why Now — The Economic Flip

The 1980s trade-off was: C is cheap enough for the market. Correctness is a luxury the market cannot afford. The 2020s trade-off: C is expensive for the market. Incorrectness has become the dominant cost of software, and Lisp's verification infrastructure is now the cheaper option.

Four transformations flipped the economics:

  1. Memory is free. 40MB runtime is noise on a $20 Raspberry Pi with 8GB RAM. In 1980, DRAM was ~$5,000/MB.
  2. Transistors are free. Billions of gates on a modern ARM core — GC and type dispatch cost nothing because the transistors are there whether used or not.
  3. Complexity saturates human verification. Systems are tens of millions of lines. Testing is necessary but insufficient — zero-day vulnerabilities prove bugs survive all testing. Formal verification is the only known path to guaranteed correctness.
  4. Cost of failure exceeds cost of verification. A single breach costs millions. Regulation mandates provable compliance. Proving correctness is cheaper than not proving it.

The verification appliance costs $5,000/year and replaces $500,000/year in compliance audits, breach litigation, and regulatory fines. This cost structure — zero marginal cost per additional user — is what makes Lisp economically viable at scale.

Why Lisp Is a Uniquely Good Candidate

Every language has a different starting position for bootstrapping an AI agent that improves its own compiler and toolchain while running on itself. Lisp's is the strongest. Here is why:

1. Homoiconicity — the prover works on the same artifact the programmer does.

The AST of a Lisp program is an S-expression — the same data structure the language uses for everything else. A prover in Lisp reads code as nested lists, transforms it, annotates it, and outputs modified code without a parse/unparse cycle. In Rust or Python, the prover would need a full parser, a serializer, and to handle every AST change as a diff against the source text — introducing a translation gap where the prover's internal representation and the programmer's source can drift apart. In Lisp, the internal representation is the source. This is the difference between a prover that can be built in 800 lines (HOL Light kernel) and one that requires 20,000 lines of parser, pretty-printer, and AST library.

2. ACL2 — a verified Lisp prover already exists.

ACL2 is a first-order theorem prover written in and for a subset of Common Lisp. It is meta-circularly verified — its kernel is proved correct in ACL2. This means there is a verified prover already running in Lisp that can verify a higher-order kernel (HOL Light's ~400 lines). ACL2 does the first verification for free:

SBCL runtime → ACL2 (verified by its own meta-circular proof)
           → HOL kernel (verified by ACL2)
           → any HOL theorem (proved by HOL kernel)

No other language has an existing verified prover that runs in the same runtime as the programs it verifies. Rust has none. Python has none. Lean has a verified kernel but the prover language is separate from the target language, creating a semantic gap.

3. Code density 3-5×.

Lisp's code density is 3-5× higher than C, Rust, or Go for the same semantic work. This is a maintenance claim, not a performance claim: fewer lines have fewer bugs, fewer attack surfaces, fewer compliance gaps. When the prover can verify those lines, the remaining surface shrinks further.

4. The type system is additive, not prescriptive — Coalton proves it.

Coalton embeds sound Hindley-Milner typing inside Common Lisp, preserving homoiconicity. You opt in per-file. This is the opposite of Rust, where type soundness is mandatory across the entire program. For a self-improving agent, the additive model is strictly better — it lets the agent work in a fluid, untyped style for exploration and lock down types for verification.

5. Everything else is ecosystem, not language.

Unicode handling, pattern matching, async I/O, immutable data structures, module systems, error messages — every one of these is a library or convention, not a language feature gap. Clojure proved this on the JVM: twenty years of CL ecosystem neglect can be fixed with sufficient automation. The gap is not in Lisp's capacity — it is in the community's labor supply, and an AI agent working at 10× human velocity changes that equation.

The Gap: What Needs Fixing

What cannot be fixed without changing what Lisp is:

  • GC pauses — everything is a tagged pointer on the heap; mitigatable (generational, concurrent collectors) but not eliminatable on commodity hardware
  • No memory model / no Send/Sync — threads share everything by default; concurrency correctness is a discipline, not a compiler guarantee

These are genuine costs, but they are contingent on hardware, not laws of nature. Symbolics' Genera OS ran on the Ivory processor with hardware tag checking and single-cycle CONS allocation — Lisp was a systems language when the chip was designed for it. A RISC-V Lisp extension with ~50K gates eliminates the hardware mismatch. Without dedicated silicon, modern concurrent generational GCs (single-digit millisecond pauses) are acceptable for everything Passepartout does.

What can be fixed — the performance gap (10-20% on hot numerical code):

Investment Effort The problem
Alias analysis pass for SBCL's IR ~1-2 person-years Rust's borrow checker gives LLVM perfect aliasing at function scope; SBCL conservatively assumes anything could alias anything
LLVM backend (Clasp maturity) ~5-10 person-years Clasp's hybrid fast-path/slow-path is the right design; needs engineering to match SBCL's general performance
PGO feedback loop ~1 person-year SBCL has profiling (sb-sprof) but no mechanism to reweight branches or layout hot/cold blocks
Portable SIMD abstraction ~2-3 person-years Every ISA extension needs its own VOP set; no equivalent of xmmintrin.h
Post-link optimization Research problem SBCL's trampoline-heavy code layout chokes tools like BOLT

What can be fixed — the ecosystem gap:

Feature Rust Common Lisp
Package manager Cargo (lockfile, binary cache, workspace) Quicklisp (no lockfile, source-only)
LSP rust-analyzer No universal LSP
Formatter rustfmt (universal) cl-form (not universal)
Linter Clippy (>700 rules) No equivalent
Documentation rustdoc (integrated) Conventions only
Cross-compilation target flag Manual, source-based
Test framework Built-in with benchmarks Various (FiveAM, Prove)
Sanitizers ASan, TSan, MSan, UBSan None
Library count ~180k ~2k
WASM target First-class Modest
Mobile targets First-class None native

The largest gaps are the package manager (the single highest-leverage investment — ~2-3 person-years) and the library count (~100× fewer). The library count follows from the lack of a modern build tool, not from a lack of demand.

Common misconceptions:

  • Unpredictable performance is not a language defect — write in a disciplined subset with type declarations and SBCL compiles to within 2x of C on hot paths
  • Macros do not cause unpredictability; they enable styles that confuse the optimizer. The choice is the programmer's.
  • The numeric tower is a genuine trade-off: mathematically correct, but blocks SIMD and the overflow assumptions that make Rust's arithmetic fast. Choose based on domain.

The Prover Architecture

An autonomous prover closes the gap between Lisp's flexibility and Rust's guarantees without making Lisp rigid. The prover sits between the programmer and the compiler:

[Lisp code] → [Passepartout prover] → [SBCL compiler] → [binary]
                   ↑
         [Verified: race-free, type-safe, bounded memory, constant-factor performance]

The prover proves memory safety without a borrow checker (by analyzing call graphs and data flow), performance predictability (by unfolding macros and constructing SSA form), and type soundness across untyped CL code (by inferring types and flagging contradictions). The language stays fluid — the prover is an external constraint, not a compiler-enforced limitation. The programmer can always bypass it for fast prototyping.

ACL2 is the right foundation but not the complete solution. It is first-order, interactive, restricted to a pure subset of CL, and does not scale to everyday production code without massive human effort. But ACL2 proves the architecture is viable: a Lisp-based prover verifying Lisp programs is natural, not forced.

Bootstrapping a HOL prover via ACL2 + Screamer — the HOL kernel (HOL Light: ~400 lines of OCaml, 10 primitive inference rules) is a well-known artifact, an engineering task rather than a research problem:

  1. Transcribe the HOL kernel into pure CL (~500-800 lines)
  2. ACL2 verifies the kernel implements the inference rules correctly
  3. Screamer provides the proof search engine (backtracking maps to proof tree exploration)
  4. HOL proves meta-level properties: macro soundness, evaluator equivalence, compositionality of skills, safety of self-modification
  5. HOL theorems are reflected back as ACL2 rewrite rules — automation compounds

The HOL kernel is an ideal LLM task — small, fully specified, stateless, with unambiguous correctness criteria. ACL2 filters hallucinations. Iteration converges in 2-3 attempts.

The self-writing machine does not prove itself from the void. It proves each next step using everything proven before. The writer (LLM) is allowed to be heuristic. The prover only needs to be conservative and accurate. Every time a human signs off on a proof, it becomes future automation.

Why Passepartout Changes the Equation

The standard model for closing the library gap is: more users → more libraries → more users. Passepartout's model is: agent synthesizes libraries on demand → fewer blockers → more users.

Knowledge-capture replaces community-coordination. The rate-limiter shifts from "people who can write Lisp libraries" to "people who can explain their domain well." A 3-hour conversation with a domain expert captures the specifications, invariants, and correctness criteria. The agent synthesizes the library, verifies it via the prover, and hot-reloads it.

The same compression applies to social infrastructure. The Passepartout architecture compresses the social ecosystem the same way it compresses the software ecosystem — replacing multiplicative duplication with additive specification. Twitter (~10M lines), Facebook (~60M), Reddit (~5M), Discord (~8M), Signal (~3M) — each independently implementing identity, messaging, groups, moderation, federation, access control — collapses to one verified spec stack (~5,000 lines) with per-community gate policies (~100 lines each). This is the same principle as the library gap: multiplicative maintenance burden collapses to additive specification complexity, amplified by Lisp's density and the prover's correctness guarantees.

Passepartout's position to direct Lisp development:

  1. Instrumentation advantage. It runs in Lisp, can profile its own execution in Lisp terms, identify SBCL's compiler bottlenecks, and generate patches — all in a closed feedback loop that C compilers lack.
  2. Coordination advantage by adoption. If Passepartout becomes the standard Lisp agent framework, its tooling choices become de facto standards: ocicl with lockfiles becomes the package manager, cl-lsp gets maintenance, the PGO pipeline gets built.
  3. Automated contribution pipeline. Profile → identify hot path → generate candidate VOP or optimization pass → run test suite (verified by ACL2) → submit patch. Cycle time drops from years to days.
  4. The prover multiplier. The agent can generate thousands of variants, verify correctness, benchmark, and keep the Pareto-optimal set — coverage a human cannot match.

Timeline compression. The gap-closing problem goes from "we need enough talented volunteers" to "we need enough compute" — and compute grows on a known exponential, while talent doesn't:

Gap Human-only AI-assisted Passepartout loop
SBCL alias analysis 18 mo 10 mo 3-4 mo
PGO pipeline 12 mo 6 mo 2-3 mo
Cargo-equivalent 24 mo 8 mo 4-6 mo
LSP server 18 mo 6 mo 3-4 mo
RISC-V Lisp extension 24 mo 12 mo 6-8 mo
Library coverage (100 APIs) 3-5 yrs 12 mo 3-4 mo (synthesis)
Clasp maturity 5-10 yrs 3-5 yrs 1-2 yrs

Landscape Impact

Rust has the most to lose. Its entire value proposition is compile-time safety. An AI prover that verifies the same properties about Lisp code makes the borrow checker a solved problem. Rust keeps its advantage only as long as the prover is slower and more expensive than a type system that runs in milliseconds. Once the prover matches or exceeds Rust's guarantees — memory safety, race freedom, constant-factor performance — the choice between Lisp and Rust becomes about workflow preference, not safety.

Python and JavaScript have the most surface to gain. Statically proving correctness of dynamically-typed programs becomes tractable through a verified Lisp intermediate (several transpilers exist). Python and JS gain type soundness, no null pointers, and thread safety without changing the language.

Lean is not a competitor. Lean is for interactive theorem proving in mathematics — mathlib4 has over 100,000 theorems, a dedicated community, and an elaborator built from decades of type theory research. Passepartout's prover verifies running code, not abstract mathematics. Its design optimizes for program properties (memory safety, race freedom, macro soundness), not mathematical expressiveness.

Hardware Endpoint

A Lisp ASIC (Symbolics Ivory, CADR, or a modern tagged RISC-V variant) raises the floor more than the ceiling:

  • Tag checking on every memory access — types, GC bits, forwarding pointers in parallel with ALU
  • Hardware CONS — single-cycle allocation with automatic write barriers
  • Hardware GC support — generational barriers at cache level
  • Hardware generic function dispatch — type-code lookup in CAM instead of method cache
  • Hardware stack groups and shallow binding — zero-cost special variable rebinding

The worst case becomes much faster — naive code and optimized code are closer together. The ASIC shifts the optimizer's job from "avoid Lisp overhead" to "use special instructions effectively." The gap between C and Lisp on hot numerical loops narrows from ~20% to maybe 5%.

RISC-V Lisp extension (near-term, ~50K gates): ~5 new instructions (CONS with write barrier, tag dispatch, read barrier, tag extract, GC check). Implementable as an FPGA soft-core. Makes Lisp viable where C is currently mandatory — real-time control, sensor nodes, IoT — by eliminating allocation cost and GC pause unpredictability. The gap goes from 10-100× to ~2-5×.

Strategic play: Define the extension, open-source an FPGA implementation, get it ratified as a standard RISC-V extension. Then any chip vendor can add it.

The Plan

The prover must come first. An unverified base cannot bootstrap a verified upper layer. The order flips from easiest first to most foundational first:

Phase What Timeline Key risk
0 Verified HOL kernel (~500-800 lines of CL). Transcribe HOL Light's 10 primitive inference rules, verify with ACL2. 2-4 months ACL2 iteration time
1 Minimal verified build system. Deterministic lockfiles, enough to compile the prover, LSP, and itself. Only deps and compilation need proving; IO layer stays trusted. 4-6 months IO verification boundary
2 Verified LSP server. Bridges SBCL's compiler to the LSP protocol — online mode (connected image) and offline mode (static analysis). 6-8 months SBCL's type inference was designed for compile-time warnings, not a responsive protocol
3 Coalton + verified standard library. Hash sets, priority queues, JSON, HTTP, async, immutable data structures — all proved correct. Largest phase by volume. 12-18 months Proof costs at scale; assumes agent compounds its own proving capability
4 Self-hosting, self-verifying CL stack. The toolchain compiles itself. The compiler is verified. Passepartout proves its own transformation rules correct. 6-12 months Self-verification bootstrapping is the hardest problem in proving
Total ~2-5 years

What accelerates this: an existing CL community contributing modules to verify; leveraging proofs from Coq, Lean, and Isabelle rather than proving from scratch; an LLM that compounds its own capability in the domain.

Expected Impact

On Lisp development: macro soundness guaranteed by the prover rather than programmer discipline; concurrency bugs caught at compile time; type inference across untyped code via the LSP; library import as a one-line cl add with deterministic lockfiles; deployment as cl build --release producing a static binary.

On Passepartout itself: a verified Lisp stack means every self-modification is proved safe before it applies. No LLM hallucination propagates into the running system. The machine is no longer gambling on correctness.

On specification-level verification: a shift from testing as correctness to proof as correctness. Testing becomes a fallback for properties the prover cannot yet establish, rather than the primary correctness mechanism. The attack surface for implementation bugs drops to near zero for any program the prover can analyze.

See also: