:PROPERTIES: :CREATED: [2026-06-03 Tue] :ID: 971cd9e7-2cc5-4743-8042-2469dbe4078f :END: #+title: CL Modernization #+filetags: :Passepartout:common-lisp:modernization:prover:tooling:build-system:HOL:ACL2: #+STATUS: active * What It Is and Why Common Lisp is a language from the 1990s with a 1950s syntax, a 1970s package manager, and a 2000s type system — yet it has never been beaten on its core bet: unmatched macro facility, interactive development, image-based workflow, and the most powerful bottom-up programming model ever designed. The CL modernization project within Passepartout is a systematic effort to bring the Lisp ecosystem forward by 25 years without sacrificing the language's defining character. This is not a language fork. It is an ecosystem upgrade that preserves the 1994 ANSI standard while adding modern conventions (lockfiles, LSP, standard library, static binaries) and a verified prover (bootstrap HOL verified by ACL2) that makes Lisp the safest language to write self-modifying code in — which is exactly what Passepartout is. The central claim: Lisp's defects are not fundamental. They are deferred ecosystem work. GC pauses, the lack of a memory model, unpredictable performance, the tiny library count, the missing package manager — every single one is fixable through engineering, not through a different language. Distinguish carefully: #+BEGIN_QUOTE **True defects** (inherent to the model, not fixable without changing what Lisp is): - GC pauses — everything is a tagged pointer on the heap; mitigatable but not eliminatable - No memory model / no Send/Sync — threads share everything by default; concurrency correctness is a discipline, not a compiler guarantee **Achievable improvements** (implementation and ecosystem, not philosophy): - Package manager and build system like Cargo - LSP-level tooling that surfaces SBCL's world-class type inference - Static binaries and deployment - Modern standard library (hash sets, priority queues, JSON, HTTP, async) - Verified prover for memory safety, race freedom, and macro soundness - Ecosystem scale (a network effect, fixable with sustained effort) #+END_QUOTE The first category — GC pauses and shared-state concurrency — is a genuine engineering constraint, but it is contingent on hardware, not a law of nature. Symbolics' Genera OS was a full operating system written in Lisp — device drivers, window system, TCP/IP, file server, all running on the Ivory processor with hardware tag checking, generational write barriers at cache level, and single-cycle CONS allocation. Lisp was a systems language; the chip was designed for it. The reason Lisp struggles on modern silicon is that 40 years of CPU design — cache lines (64 bytes for C structs), prefetchers (linear access patterns), TLBs (contiguous layout), branch predictors (C's control flow) — has been optimized for C's memory model, not for tagged pointer architectures. A RISC-V extension with ~50K additional gates (CONS with write barrier, tag dispatch, read barrier, tag extract, GC check) eliminates the hardware mismatch, as analyzed in [[id:dddd52a7-adb8-470e-a459-614ade5f76af][Closing the Lisp Gap]]. Without dedicated silicon, Lisp on commodity hardware has larger memory overhead and less predictable pause times on allocation-heavy workloads than C or Rust. For everything Passepartout does — knowledge management, protocol synthesis, agent coordination, verification — modern concurrent generational GCs (single-digit millisecond pauses) are entirely acceptable. The second category — the ecosystem gap — is what this project fixes. * Why Lisp Is a Uniquely Good Candidate Every other language has a worse starting position for this kind of bootstrap. Here is why Lisp wins: **1. Homoiconicity — the prover works on the same artifact the programmer does.** The AST of a Lisp program is an S-expression — the same data structure the language uses for everything else. A prover in Lisp reads code as nested lists, transforms it, annotates it, and outputs modified code without a parse/unparse cycle. In Rust or Python, the prover would need a full parser, a serializer, and to handle every AST change as a diff against the source text — introducing a translation gap where the prover's internal representation and the programmer's source can drift apart. In Lisp, the internal representation /is/ the source. The programmer types S-expressions. The prover reads S-expressions. They speak the same notation. This is not a minor convenience — it is the difference between a prover that can be built in 800 lines (HOL Light kernel) and one that requires 20,000 lines of parser, pretty-printer, and AST library. **2. ACL2 — a verified Lisp prover already exists.** ACL2 is a first-order theorem prover written in and for a subset of Common Lisp. It is itself meta-circularly verified — its kernel is proved correct in ACL2. This means there is a verified prover /already running in Lisp/ that can verify a higher-order kernel (HOL Light's ~400 lines). ACL2 does the first verification for free. The trust chain is: #+BEGIN_EXAMPLE SBCL runtime → ACL2 (verified by its own meta-circular proof) → HOL kernel (verified by ACL2) → any HOL theorem (proved by HOL kernel) #+END_EXAMPLE No other language has an existing verified prover that runs in the same runtime as the programs it verifies. Rust has no verified prover. Python has no verified prover. Lean has a verified kernel but does not run in Python — the prover language is separate from the target language, creating a semantic gap. **3. The numerical economy argument (density 3-5×).** Lisp's code density is 3-5× higher than C, Rust, or Go for the same semantic work. The social protocol alone — identity, messaging, groups, governance, federation — compresses from ~100 million lines (Twitter, Facebook, Discord, Signal, etc., each reimplementing the same primitives) to ~5,000 lines of specification and ~50,000 lines of synthesized Lisp. This 2,000× compression is not a performance claim — it is a /maintenance/ claim. 5,000 lines of spec have fewer bugs, fewer attack surfaces, and fewer compliance gaps than 100 million lines. When the prover can verify the 5,000 lines, the remaining surface shrinks to essentially zero for specification-level bugs. That is the stack effect: every reduction in complexity compounds because the prover's coverage grows with the system's clarity. **4. The type system is not a limitation — Coalton proves it.** Coalton embeds sound Hindley-Milner typing inside Common Lisp, preserving homoiconicity. You opt in per-file. This means a CL modernization project does not need to design a new type system — it inherits one that is already integrated and working. The type system becomes /additive/, not /prescriptive/: you can write untyped Lisp for prototypes and add Coalton annotations for production paths. This is the opposite of Rust, where type soundness is mandatory for the entire program. For a self-improving agent, the additive model is strictly better — it lets the agent work in the fluid untyped style for exploration and lock down types for verification. **5. Everything else is ecosystem, not language.** Unicode handling, pattern matching, async I/O, immutable data structures, module systems, error messages — every one of these is a library or convention, not a language feature gap. Clojure proved this on the JVM: twenty years of CL ecosystem neglect can be fixed in two years with sufficient automation. The gap is not in Lisp's capacity — it is in the community's labor supply. An AI agent working at 10× human velocity changes the labor equation. * The Gap: Current State Assessment **The performance gap** with C and Rust on hot numerical code is roughly 10-20% in SBCL, concentrated in a few well-understood weaknesses: - Alias analysis — SBCL conservatively assumes any memory reference could alias any other. Rust's borrow checker gives LLVM perfect aliasing at function scope. Fixable in ~1-2 person-years. - LLVM backend — Clasp's hybrid approach (fast-path LLVM, slow-path runtime) is the right design but needs significant engineering. - PGO — SBCL has profiling (sb-sprof) but no feedback loop. Fixable in ~1 person-year. - SIMD — No standard SIMD library for Lisp. ~2-3 person-years for a portable abstraction. - Post-link optimization — SBCL's unusual code layout chokes tools like BOLT. A research problem. **The ecosystem gap** is the larger barrier: | Feature | Rust | Common Lisp | |---------|------|-------------| | Package manager | Cargo (lockfile, binary cache, workspaces) | Quicklisp (source-only, no lockfiles) | | LSP | rust-analyzer | No universal LSP | | Formatter | rustfmt (universal) | cl-form (not universal) | | Test framework | Built-in with benchmarks | Various (FiveAM, Prove) | | Library count | ~180k | ~2k | | WASM target | First-class | Modest | | Static binaries | First-class | Possible but non-standard | The largest gap is the package manager — the single highest-leverage investment. The library count (~100× fewer) follows from the lack of a modern build tool, not from a lack of demand. * Corrected Phase Plan The prover must come first, not last. An unverified base cannot bootstrap a verified upper layer. Every tool that the agent uses to build the next tool must itself be provably correct. The order flips from /easiest first/ to /most foundational first/. **Phase 0: Verified HOL Kernel (~2-4 months, ~500-800 lines of CL)** The HOL kernel (HOL Light's 10 primitive inference rules, ~400 lines of OCaml) is the smallest, best-defined, highest-leverage artifact in the entire project. Transcribed into pure CL and verified by ACL2, it becomes a proved higher-order prover inside the Lisp environment. Why the kernel is the right first artifact: - /Tiny./ 500-800 lines. A well-known mathematical specification. No ambiguity about what correct means. - /High leverage./ The kernel is the first prover Passepartout can call. Once it exists, every subsequent phase can be verified by it. - /Low risk./ This is not a research problem — it is an engineering transcription of a known artifact. The only risk is ACL2 iteration time for verification. Dependency: Passepartout can write and debug CL code reliably. The HOL kernel ideal for an LLM to generate: small, fully specified, stateless, unambiguous correctness criteria. ACL2 catches hallucinations. Iteration converges in 2-3 attempts. **Phase 1: Minimal Verified Build System (~4-6 months)** A build system that can compile CL projects with deterministic lockfiles. Does not need to be Cargo-complete — just enough to compile the prover, the LSP, and its own source. Key insight: only the dependency resolution and compilation logic can be fully proved. The IO layer (filesystem, subprocesses, network) must be trusted or wrapped in a small verified interface. The build system is the first practical test of the prover — proving that dependency resolution is acyclic and that compilation is deterministic. **Phase 2: Verified LSP Server (~6-8 months)** Bridges SBCL's compiler to the LSP protocol. Two modes: - /Online (connected image):/ Rich interactivity, live type inference, jump-to-definition. - /Offline (static analysis):/ Works without a running image, uses proved type information. Verified to not crash, not deadlock, and produce correct type information. This is the tool that makes writing the rest of the stack productive. Key risk: SBCL's type inference is designed for compile-time warnings, not for a responsive protocol. Wrapping it requires significant engineering, and proving the wrapping layer is correct is non-trivial. **Phase 3: Coalton + Verified Standard Library (~12-18 months)** Coalton as a first-class typed path. Hash sets, priority queues, JSON, HTTP client, async scheduler, immutable data structures — all proved correct. Unicode-by-default string handling. Pattern matching. Backward-compatible with the 1994 standard. This is the largest phase by lines of code. The prover has to verify thousands of modules. Verification of complex data structures (hash tables, concurrent queues, async schedulers) is non-trivial. This is where proof costs become visible — and where the agent's automation of proof discovery matters most. Key risk: Volume. 12-18 months assumes the agent compounds its own proving capability during the phase. Without that, the timeline extends. **Phase 4: Self-Hosting, Self-Verifying CL Stack (~6-12 months)** The modernized CL toolchain compiles itself. The compiler is verified. The runtime has a proved GC (at least bounded pause times). Passepartout proves its own transformation rules correct. This is the capstone: a self-improving machine that proves every change before applying it. No other language has achieved this. Even the most advanced systems (Coq, Lean) have a trusted computing base that includes the kernel and the runtime — the /rest/ of their infrastructure is unverified. CL Modernization Phase 4 aims for a verified toolchain from kernel to compiler to build system, with only the SBCL runtime left as trusted. Key risk: Self-verification bootstrapping is the hardest problem in proving. Proving that the compiler compiling itself produces a correct binary requires a bootstrapping argument that few systems have ever attempted and none have fully achieved. The HOL prover needs to be powerful enough for this. * Timeline The estimates below assume Passepartout's capabilities compound during the project — the agent gets faster as it improves its own environment. | Phase | Optimistic | Realistic | Conservative | Risk | |-------|-----------|------------|-------------|------| | Phase 0: HOL kernel | 2 months | 3 months | 4 months | ACL2 verification iteration | | Phase 1: Build system | 3 months | 5 months | 6 months | IO layer verification boundary | | Phase 2: LSP server | 4 months | 7 months | 8 months | SBCL internals complexity | | Phase 3: Standard library | 10 months | 15 months | 18 months | Volume, proof costs | | Phase 4: Self-hosting | 4 months | 8 months | 12 months | Self-verification bootstrapping | | **Total** | **~23 months** | **~38 months** | **~48 months** | | The primary uncertainty is Phase 3 (verified standard library volume) and Phase 4 (self-verification bootstrapping). Phase 0 and Phase 1 are relatively predictable because the artifacts are small and well-defined. An optimistic 20-month timeline is possible if the agent compounds at sufficient speed and no fundamental proof barriers emerge — but 3-5 years is the realistic window. What accelerates this: - A existing CL community contributing modules to verify - Leveraging existing verified libraries (Coq, Lean, Isabelle) and adapting their proofs rather than proving from scratch - An LLM that gets faster at generating correct code as it gains experience in the domain (this is the compounding assumption) * Expected Impact **On Lisp development:** A modernized CL with a verified prover changes the economics of writing Lisp from /adventure/ to /engineering./ The distinguishing features that make Lisp powerful (macros, interactive development, image-based workflow) become /safe/ rather than /dangerous./ - Macro soundness is guaranteed by the prover, not by programmer discipline - Concurrency bugs are caught during compilation, not during runtime - Type inference across untyped code is available via the LSP, even if the file has no Coalton declarations - Library import is a one-line =cl add = command with deterministic lockfiles - Deployment is a =cl build --release= producing a static binary **On Passepartout itself:** Passepartout is a self-modifying agent. Every transformation it applies to its own code is a potential correctness risk. A verified Lisp stack means every self-modification is proved safe before it applies. This shifts Passepartout from /experimental/ to /capability./ The agent can reason about its own transformations with mathematical certainty, not statistical confidence. This is the meaning of Phase 4: the machine is no longer gambling on correctness. Every change is verified before it applies. No LLM hallucination propagates into the running system. The agent improves safely. **On the broader software landscape:** A proved Lisp has implications beyond the Lisp community: - /Rust loses its exclusivity on safe systems programming./ If a Lisp prover verifies memory safety, race freedom, and constant-factor performance, the borrow checker is no longer the only path to correctness. Lisp offers the same guarantees with a more flexible macro system and interactive development. The choice between Lisp and Rust becomes about workflow preference, not safety. - /Python and JavaScript gain a target./ If the Lisp prover can verify compiled code, it can verify Python or JS compiled to Lisp (there are several transpilers). Statically proving correctness of dynamically-typed programs becomes tractable through the verified prover, without requiring the language itself to change. - /The self-improving machine becomes a replicable architecture./ Any domain where an LLM writes code and a prover verifies it can adopt the same pipeline. The first domain is Lisp because of homoiconicity. Once the pipeline works, it can be adapted. **On specification-level verification:** The deepest impact is a shift from /testing as correctness/ to /proof as correctness./ In current practice, a program's correctness is approximated by test coverage, which is always incomplete. A verified Lisp toolchain can prove functional correctness for any program that runs on it. The programmer specifies what correct means (preconditions, postconditions, invariants), and the prover establishes it. Testing becomes a fallback for properties the prover cannot yet establish, rather than the primary correctness mechanism. This is not a claim that all bugs disappear — the specification itself can be wrong, the runtime is still trusted, and undecidable properties remain undecidable. But the attack surface for /implementation bugs/ (the majority of security vulnerabilities) drops to near zero for any program the prover can analyze. * Relationship to Passepartout CL Modernization is a project /within/ Passepartout, not adjacent to it. Passepartout's architecture — fact store, gates, ACL2 prover, self-improving loop — is the engine that drives the modernization. The agent improves Lisp while running on Lisp, creating a positive feedback loop: better tooling → faster development → more capability → better tooling. The relationship is symbiotic: - Passepartout provides the engineering capacity to close the 25-year gap - The modernized CL provides a verifiable foundation for Passepartout's self-modification - The ACL2 prover (already part of Passepartout's architecture) provides the verification for both See also: - [[id:1c3ec48b-446c-50d2-b53e-126a81f5143f][Passepartout]] — the project that drives this modernization - [[id:dddd52a7-adb8-470e-a459-614ade5f76af][Closing the Lisp Gap]] — detailed performance and ecosystem analysis - [[id:c0fdec00-8a44-43f0-ac81-e8dc61411865][Passepartout Architecture]] — how the gate system and prover fit together - [[id:9af13fff-9725-542b-93b1-a555bc74ad72][Lisp Economics]] — why Lisp is economically viable despite its small market share