Unify concepts into ideas — move 7 org files from concepts/ to ideas/, update homepage and nav

This commit is contained in:
Hermes
2026-06-03 17:31:57 +00:00
parent b1d00c79e6
commit 9553a90060
10 changed files with 11 additions and 30 deletions

View File

@@ -1,10 +0,0 @@
:PROPERTIES:
:CREATED: [2026-05-24 Sun]
:ID: 3e4f5a6b-7c8d-9e0f-1a2b-3c4d5e6f7a8b
:END:
#+title: Concepts
#+filetags: :index:navigation:
In-depth analysis and comparative studies of systems, languages, and architectures.
{{< page-list >}}

View File

@@ -1,107 +0,0 @@
:PROPERTIES:
:CREATED: [2026-05-27 Wed]
:ID: d2722576-fc9b-4bd3-bc2f-f5692b561b4e
:END:
#+title: Who Is Closest to Passepartout?
#+filetags: :passepartout:academia:comparison:neurosymbolic:verification:
A survey of academic researchers whose work overlaps with Passepartout's architecture along specific dimensions. The conclusion: no academic group combines all four architectural properties that define Passepartout's design. The closest groups each hold one or two pieces; none combine all.
* The Four Architectural Properties
1. **LLM-level generator with full creative freedom** — the generator synthesizes entire implementations from specifications, not individual tactic steps or hole-fillings.
2. **Theorem-prover verification with complete functional correctness** — the verifier checks all execution paths against the full spec, not bounded verification via SMT solvers.
3. **Asymmetric authority** — the symbolic component (prover) is the final authority and cannot be overridden by the neural component.
4. **Counterexample-guided retry loop** — when the prover rejects an implementation, it returns a concrete counterexample that the generator uses to reformulate.
* The Academic Landscape
**LLM + Theorem Prover Loops**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Sean Welleck | CMU | ImProver 2 | Self-improving LMs generating proof steps verified by Lean | Generator fills tactic holes in existing proofs, not full implementations. Camp B. |
| Timon Gehr | ETH | COPRA, Thor | LLM interacts with theorem prover kernel | Same constraint: tactic-level. Neural component generates one move at a time. |
| Kaiyu Yang | Princeton | LINC | Neural network learns symbolic rules, prover checks consistency | Neural component is a *learner* discovering rules from data, not a generator synthesizing from spec. Different abstraction level. |
All three are Camp B in the loop taxonomy (constrained generator + complete verifier). None gives the LLM freedom to synthesize full implementations. Welleck's ImProver is the closest in spirit — the loop iterates, the prover is authoritative — but the scope of what the generator produces is orders of magnitude smaller than what Passepartout's design requires.
**Synthesis + Verification (non-LLM)**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Armando Solar-Lezama | MIT | Sketch | Synthesis-aided verification: partial program → solver fills holes → assertions checked | Generator is constraint-based SAT/SMT, not an LLM. Verification is bounded (solver capacity). |
| Emina Torlak | UW | Rosette | Solver-aided languages for synthesis + verification | Same constraints as Sketch. Bounded, non-LLM. |
| Swarat Chaudhuri | UT Austin | Neurosymbolic Programming | Neural networks guide program synthesis, symbolic analysis verifies | Uses SMT for bounded verification, not theorem prover for complete. Neural-symbolic are symmetric collaborators, not asymmetric authority. |
Chaudhuri is the closest overall academic neighbor. His group explicitly works on combining neural and symbolic components, with symbolic verification of neural-generated candidates. But the verification is bounded (SMT), not complete (theorem prover), and the loop does not have Passepartout's asymmetric authority design.
**Lisp as Infrastructure for Verification**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Christian Schafmeister | Temple | Clasp | Common Lisp through LLVM; interactive Lisp for serious computation | Lisp infrastructure, not a neurosymbolic loop. No ACL2 integration. |
| Kaufmann, Moore | UT Austin / Retired | ACL2 | The theorem prover itself | Pure symbolic verification. No LLM loop. |
Schafmeister is aligned with Passepartout on the "why Lisp" question — interactive development, uniform representation, C++ interop for performance — but does not work on agentic verification loops.
**Autonomous Code Modification Loops**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Kevin Ellis | Cornell | DreamCoder | Neural program synthesis loop: generate → execute → learn | Verifier is interpreter (does it run?), not prover (is it correct?). Camp A. |
| Andrej Karpathy | Anthropic | autoresearch | LLM modifies code, runs experiments, keeps/discards based on metric | Verifier is val_bpb — a single empirical number. No specification, no formal guarantee. Camp C. |
Both prove the viability of the autonomous loop concept but use the weakest possible verifiers (execution and empirical metrics).
**The Bitter Lesson / Temporal Credit Assignment (Sutton)**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Richard Sutton | Alberta / Keen Technologies | TD learning, eligibility traces, Alberta Plan | The fundamental problem in verification — *an action was checked, but the consequence plays out hours later; was the action correct?* — is the same problem TD learning solves in RL: assigning credit to actions based on delayed outcomes. Sutton's temporal credit assignment work is the theory you would need to extend Passepartout from per-action gates to trajectory-level verification. His Bitter Lesson (scale beats engineered knowledge at sufficient compute) is the most commonly cited argument against the symbolic verification approach Passepartout bets on. | The Bitter Lesson is not anti-knowledge — it says methods that improve with more computation eventually dominate. Passepartout's gate is a deliberately small engineered knowledge system that *won't* benefit from more compute (the ACL2 lemmas don't get more correct with more hardware). That's acceptable because the gate is a narrow bottleneck (permit/deny). The LLM layer inside the gate *does* benefit from scale. The architecture already respects the Bitter Lesson by placing the scalable piece where scale helps and the non-scalable piece where deductive certainty matters. Sutton's Alberta Plan (world model + reward + learning algorithm) parallels Passepartout's Stage 6 (world model + gate + verified fine-tuning), but Sutton's agents learn by pure reward while Passepartout's learn by reward constrained by verified policy. Sutton would likely argue that a learned safety policy at scale would outcompete the gate. Passepartout's bet is that access control, message authentication, and compliance should never be probabilistic, even at infinite scale.
**Integrate-Symbolic-Into-Neural (Garcez)**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Artur d'Avila Garcez | City, Univ. of London | NeSy frameworks, NSL | Pioneer of neural-symbolic computation since 1990s. Book: "Neural-Symbolic Cognitive Reasoning." Runs NeSy workshop series. | His approach *integrates* symbolic knowledge into neural networks (logic regularization, knowledge distillation). Symbolic rules are a training signal, not a runtime verifier. The neural component can override symbolic constraints through the loss landscape. No asymmetric authority, no theorem prover, no complete verification. His camp is "make neural networks behave more symbolically." Passepartout's camp is "make neural networks accountable to symbolic verification." Opposite architectural philosophy. |
Garcez's position in the design space is closest to Camp A (no independent verifier). The symbolic rules guide learning but do not veto outputs at runtime. His work is foundational for the field of neural-symbolic computation, but his *architectural philosophy* is the inverse of Passepartout's. He wants the symbolic inside the neural. Passepartout wants them separate with the symbolic holding authority.
**Theorist of the Hybrid Thesis (Marcus)**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Gary Marcus | NYU Emeritus, Robust.AI | None (theorist/critic) | Longest-standing public advocate for hybrid AI. Since "The Algebraic Mind" (2001) and "Rebooting AI" (2019), he has argued deep learning alone cannot achieve systematicity, composition, or reasoning. He identified the *need* for the approach Passepartout implements. As of May 2026, he is publicly asking why LLM agent frameworks are not using LEAN as a theorem-prover verifier — the same engineering gap Passepartout occupies. | He does not propose a specific architecture or loop design. His background is cognitive science and developmental psychology, not formal verification. The "symbolic component" he advocates is abstract — structured knowledge representations, not ACL2 theorem proving. He has no answer to the cost/feasibility question (the "Better is Cheaper" argument is Passepartout's contribution, not Marcus's). He is a theorist of the problem, not an architect of the solution — though his May 2026 tweet shows he is now engaging with the engineering question directly. |
Marcus occupies a category that does not appear in the loop taxonomy (Camps A-D) because he does not define a loop. He identifies the *need* for hybrid AI with genuine symbolic authority. Passepartout is the engineering response to the thesis Marcus has been arguing since before most of the field would admit the limitations existed. His May 2026 tweet asking "they aren't using LEAN in one of those many tools?" is the theorist noticing the empty cell Passepartout was designed to fill.
* The Gap
| Property | Passepartout | Closest academic | Academic's limiter |
|----------|-------------|-----------------|-------------------|
| Generator freedom | Full synthesis from spec | ImProver (Welleck) | Fills tactic holes only |
| Verification completeness | Complete (theorem prover) | Sketch (Solar-Lezama) | Bounded (SMT) |
| Asymmetric authority | Neural cannot override prover | Neurosymbolic Prog (Chaudhuri) | Symmetric collaboration |
| Counterexample feedback | Structured from prover to LLM | ImProver (Welleck) | Pass/fail at tactic level |
| Two symbolic layers | Gates + prover independent | None | No second layer exists |
No academic group combines all four properties. The closest — Chaudhuri — has three of five (neural + symbolic + verification) but fails on verification completeness (SMT not ACL2), asymmetric authority (symmetric not asymmetric), and the two-layer gate design.
* What This Means
The gap is either:
1. **A genuinely empty cell in the design space.** The combination is novel, the components have not converged in one system before, and Passepartout's design is early.
2. **A sign that the combination is not as valuable as the components.** No major academic lab has invested in this specific loop because the cost of writing complete formal specifications exceeds the benefit of complete formal verification, given the alternative of bounded verification (SMT) with cheaper spec costs.
The way to distinguish (1) from (2) is to build the architecture and measure whether the spec-writing cost is amortized over enough synthesized implementations to justify it. Passepartout's answer is: yes, because specs are written once and implementations are generated for every deployment context. The academic literature has not tested this claim.
* References
- [[id:be9bccc7-5adf-4d0d-8ee4-8855892189bf][Neurosymbolic Loop Architectures]] — the taxonomy that positions these comparisons
- [[id:ee8f3b2a-4c7d-4e1b-9b0a-6d8f2e3c1a5b][Neurosymbolic AI Paper Library]] — papers referenced above are in the local library

View File

@@ -1,22 +0,0 @@
:PROPERTIES:
:CREATED: [2026-06-03 Tue]
:ID: a6a04686-56d3-4cb0-9c3d-4722f8f586f3
:END:
#+title: CL Modernization — Corrected Phase Order
#+filetags: :HOL:common-lisp:modernization:passepartout:theorem-proving:verification:
* Corrected insight
The prover must come first, not last. An unverified base can't bootstrap a verified upper layer. Every tool that the agent uses to build the next tool must itself be provably correct. The order flips from /easiest first/ to /most foundational first./
* Corrected Phase Order
** Phase 0: HOL Kernel (~500-800 lines of pure CL, verified by ACL2). The smallest, best-defined, highest-leverage artifact. Everything depends on it. A well-known mathematical specification (HOL Light's 10 primitive inference rules).
** Phase 1: Minimal Verified Build System. Just enough to compile the prover and the LSP. Doesn't need to be Cargo-complete — needs to be verified.
** Phase 2: Verified LSP Server. Surfaces SBCL's type inference backed by proofs.
** Phase 3: Verified Language Extensions. Coalton, standard library, all proved correct.
** Phase 4: Full Modernized CL Stack. Self-hosted, self-verifying.

View File

@@ -1,31 +0,0 @@
:PROPERTIES:
:CREATED: [2026-06-03 Tue]
:ID: e199190b-5eff-4a77-8aa5-b6c77647e316
:END:
#+title: CL Modernization Initiative via Passepartout
#+filetags: :Passepartout:build-system:coalton:common-lisp:modernization:theorem-proving:tooling:
* The Initiative
A plan to bring Common Lisp from the 1990s to modern standards using Passepartout as an autonomous software engineer. The gap is ~25 years of deferred ecosystem work. An agent working at 10x human velocity closes it in 2-3 years.
* Four Phases
1. **Build System** — =cl build= / =cl test= / =cl doc= that works like Cargo. Lockfiles, single namespace, integrated test/doc runner, parallel compilation. Wraps ASDF with modern conventions via a CL.toml manifest.
2. **Tooling** — LSP server that surfaces SBCL's world-class type inference. Jump-to-definition, type-on-hover, rename, refactoring. Online (connected image) and offline (static analysis) modes. Integrates with Coalton for typed paths.
3. **Language Modernization** — Coalton as first-class opt-in typing. Modern standard library (hash sets, priority queues, JSON, HTTP, async, immutable structures). Unicode-by-default. Pattern matching. All backward-compatible with the 1994 standard.
4. **Prover Bootstrap + Deployment** — HOL kernel in CL verified by ACL2. Screamer-backed proof search. HOL proves meta-level properties (macro soundness, evaluator equivalence, concurrent safety). Static binary deployment via =cl build --release=.
* Dependencies
Phase 1 → Phase 2 → Phase 3 → Phase 4. No circular dependency. Each phase is a prerequisite for the next and produces a usable artifact.
* Key Principles
- Every tool is written in CL, for CL, by the agent
- The 1994 standard is preserved — modernization lives alongside it
- The agent is the primary consumer (human UX is secondary)
- The HOL prover grows from use — every proved theorem is future automation

View File

@@ -1,66 +0,0 @@
:PROPERTIES:
:CREATED: [2026-06-03 Tue]
:ID: a2811c83-b315-47fd-8ab8-25627c389d1a
:END:
#+title: CL Modernization — Timeline Estimate
#+filetags: :HOL:common-lisp:modernization:passepartout:timeline:verification:
* Timeline from Now to a Proved Lisp Machine
** Phase 0: HOL Kernel in CL (~2-4 months)
The smallest, most bounded piece. 500-800 lines of pure CL, well-defined mathematical spec (HOL Light's 10 primitive inference rules). Passepartout writes it, ACL2 verifies it. The first genuine artifact: a verified higher-order prover running inside CL.
Dependency: Passepartout can write and debug CL code reliably. Doesn't need much else.
Key risk: ACL2 verification of the kernel may require iterations — ACL2's automation is limited and some lemmas may need to be added manually.
** Phase 1: Minimal Verified Build System (~4-6 months)
A verified build system that can compile CL projects with deterministic lockfiles. Doesn't need to be Cargo-complete — just enough to compile the prover, the LSP, and its own source.
Dependency: Phase 0 (prover proves build system correctness). Also a solid understanding of ASDF internals.
Key risk: The build system touches the filesystem, subprocesses, and network — all outside the prover's pure core. Only the /dependency resolution and compilation logic/ can be fully proved. The IO layer must be trusted or wrapped in a small verified interface.
** Phase 2: Verified LSP Server (~6-8 months)
Bridges SBCL's compiler to the LSP protocol. Online mode (connected image) for rich interactivity. Verified to not crash, not deadlock, and produce correct type information.
Dependency: Phase 1 (can build and distribute the LSP). Deep SBCL internals knowledge (the agent must learn SBCL's type inference API).
Key risk: SBCL's type inference is not designed for an LSP — it's designed for compile-time warnings. Wrapping it in a responsive protocol requires significant engineering.
** Phase 3: Coalton + Verified Standard Library (~12-18 months)
Coalton as a first-class typed path. Hash sets, priority queues, JSON, HTTP client, async, immutable structures — all proved correct. Unicode-by-default string handling.
Dependency: Phase 2 (the agent needs good tooling to develop and debug this volume of code).
Key risk: The volume. This is the largest phase by lines of code. The prover has to verify thousands of modules. Verification of complex data structures (hash tables, async schedulers) is non-trivial. This is where proof costs become visible.
** Phase 4: Self-Hosting, Self-Verifying CL Stack (~6-12 months)
The modernized CL toolchain can compile itself. The compiler is verified. The runtime has a proved GC (at least bounded pause times). Passepartout proves its own transformation rules correct — the self-improving machine.
Dependency: Phase 3. The standard library and language must be mature enough to write serious systems software in.
Key risk: Self-verification is the hardest problem. Proving that the compiler compiling itself produces a correct binary requires a bootstrapping proof that few systems have attempted. The HOL prover needs to be powerful enough for this.
* The Proved Lisp Machine
After Phase 4: the agent runs on a proved CL, proving its own transformations as it extends itself. This is the capstone. The machine is no longer gambling on correctness — every change is verified before it applies.
* Total Timeline Estimate
- Optimistic (smooth sailing): ~2.5-3 years
- Realistic (normal engineering friction): ~3.5-5 years
- Conservative (major proof bottlenecks): ~5-7 years
The primary uncertainty is Phase 3 (verified standard library volume) and Phase 4 (self-verification bootstrapping). Phase 0 and Phase 1 are relatively predictable.
* What Accelerates This
- A existing CL community contributing modules to verify
- An agent that gets faster as it improves its own environment
- Leveraging existing verified libraries (Coq, Lean, Isabelle) and adapting their proofs rather than proving from scratch

View File

@@ -1,138 +0,0 @@
:PROPERTIES:
:CREATED: [2026-05-27 Wed]
:ID: dddd52a7-adb8-470e-a459-614ade5f76af
:END:
#+title: Closing the Lisp Gap
#+filetags: :passepartout:lisp:compiler:performance:ecosystem:asics:bootstrapping:
Lisp has a performance gap with C and Rust of roughly 10-20% on hot numerical code and a much larger ecosystem gap (package manager, LSP, cross-compilation, library count ~2k vs ~180k). This page diagnoses both gaps and argues that [[id:28c46769-c14b-42aa-ac7a-69d310157f8f][Passepartout]]'s architecture — fact store, gates, ACL2 prover, self-improving loop — changes the nature of the problem from community-coordination to knowledge-capture, making the gap closable in 1-2 years rather than 20-30 person-years.
See [[id:9af13fff-9725-542b-93b1-a555bc74ad72][why Lisp is economically viable now]] for the economic foundation. See [[id:13e6ae54-2d24-5aa0-b1cd-a7e8e749aa70][self-driving Lisp Machine]] for the hardware endpoint.
* The Performance Gap
What Lisp (SBCL) still needs to close the remaining gap with C/Rust:
- **Alias analysis on SBCL's IR.** Rust's borrow checker gives LLVM perfect aliasing at function scope. SBCL conservatively assumes any memory reference could alias any other. Fix: a static analysis pass proving non-aliasing for common patterns (different simple-array types, arrays vs scalars). ~1-2 person-years, biggest single investment.
- **LLVM backend (Clasp).** Clasp gives LLVM's full pipeline — auto-vectorization, LICM, inlining cost modeling — at the cost of fighting LLVM's static worldview with CLOS's dynamic semantics. Clasp's hybrid approach (fast-path LLVM, slow-path runtime) is the right design. ~5-10 person-years to reach SBCL's general performance with better hot-path optimization.
- **PGO (profile-guided optimization).** SBCL has profiling (sb-sprof) but no feedback loop — the compiler cannot reweight branches or layout hot/cold blocks. Fix: instrumentation pass that emits counter VOPs, a profile binary, and a recompilation step that annotates IR with branch weights. ~1 person-year.
- **Post-link optimization.** Facebook's BOLT profiles binary execution and rewrites code layout to reduce I-cache misses. SBCL's unusual code layout (trampolines, closure entry points) chokes BOLT. Hard research problem.
- **Vectorization VOPs.** No standard SIMD library for Lisp. Every ISA extension needs its own VOP set. C gets this via xmmintrin.h and auto-vectorization. ~2-3 person-years for a portable SIMD abstraction backed by platform VOPs.
* The Ecosystem Gap
| Feature | Rust | Common Lisp |
|---------|------|-------------|
| Package manager | Cargo (lockfile, binary cache, workspace) | Quicklisp (no lockfile, source-only) |
| LSP | rust-analyzer (excellent) | No universal LSP |
| Formatter | rustfmt (universal) | cl-form (not universal) |
| Linter | Clippy (>700 rules) | No equivalent |
| Doc system | rustdoc (integrated) | Conventions only |
| Cross-compilation | --target flag | Manual, source-based |
| Test framework | Built-in with benchmarks | Various (FiveAM, Prove) |
| Sanitizers | ASan, TSan, MSan, UBSan | None |
| Library count | ~180k | ~2k |
| WASM target | First-class | Modest |
| Mobile targets | First-class | None native |
The largest gaps are the package manager (the single highest-leverage investment — ~2-3 person-years + community coordination) and the library count (~100× fewer).
* Lisp ASICs: What They Change
A Lisp ASIC (Symbolics Ivory, CADR, or a modern RISC-V variant) **raises the floor** more than the ceiling:
**What it gives in hardware:**
- Tag checking on every memory access — types, GC bits, forwarding pointers in parallel with ALU
- Hardware CONS — single-cycle allocation from a dedicated region with automatic write barriers
- Hardware GC support — generational barriers at cache level
- Hardware generic function dispatch — type-code lookup in CAM instead of method cache
- Hardware stack groups and shallow binding — zero-cost special variable rebinding
**Does it eliminate the need for compiler optimization?** Both yes and no. The worst case becomes much faster — naive code and optimized code are closer together. But the compiler still matters for instruction scheduling, inlining decisions, dead code elimination, and loop optimizations. The ASIC shifts the optimizer's job from "avoid Lisp overhead" to "use special instructions effectively." The gap between C and Lisp on hot numerical loops narrows from ~20% to maybe 5%.
* Embedded and Small Systems: RISC-V Lisp Extension
A full Lisp Machine chip doesn't make sense for embedded, but a **RISC-V extension with 5-10 Lisp primitives** does:
- ~5 new instructions added to a RV32/RV64 core (CONS with write barrier, tag dispatch, read barrier, tag extract, GC check)
- Implementable as an FPGA soft-core or small coprocessor alongside a microcontroller
- ~50K additional gates on a RISC-V e300-class core
This makes Lisp viable where C is currently mandatory: real-time control, sensor nodes, IoT. The hardware CONS eliminates allocation cost; the hardware read barrier eliminates GC pause unpredictability by making the collector a concurrent background task instead of a stop-the-world event. The gap goes from 10-100× (current, boxed allocation + GC) to ~2-5×.
**Strategic play:** Define the extension, open-source an FPGA implementation, get it ratified as a standard RISC-V extension. Then any chip vendor can add it.
* Why Passepartout Changes the Equation
The standard model for closing the library gap: *more users → more libraries → more users (virtuous cycle).*
Passepartout's model: *agent synthesizes libraries on demand → fewer blockers → more users.*
**Knowledge-capture replaces community-coordination.** The rate-limiter shifts from "people who can write Lisp libraries" to "people who can explain their domain well." A 3-hour conversation with a domain expert captures the specifications, invariants, known techniques, and correctness criteria. The agent synthesizes the library, verifies it against stored properties via ACL2, and hot-reloads it.
**What changes:**
- Standard protocols (HTTP, SQL, JSON, crypto, file formats) — the agent synthesizes these from specs it has seen in training.
- Wrapper libraries (CFFI bindings, REST clients) — mechanical, the agent handles well.
- Domain-specific libraries — the hardest category *without* a knowledge architecture, and the most tractable *with* one. The gate becomes the bridge: human provides $10K/year-level insight, agent provides $200K/year-level engineering.
- What stays hard: libraries where the algorithm requires original research (not yet done by anyone), libraries requiring specialized hardware knowledge the human cannot articulate, and battle-tested implementations where verification of constant-time correctness is near-impossible.
**Passepartout's position to direct Lisp development:**
1. **Instrumentation advantage.** It runs in Lisp, can profile its own execution in Lisp terms, identify SBCL's compiler bottlenecks, and generate patches — all in a closed feedback loop that C compilers lack.
2. **Coordination advantage by adoption.** If Passepartout becomes the standard Lisp agent framework, its tooling choices become de facto standards: ocicl with lockfiles becomes the package manager, cl-lsp gets maintenance, sb-sprof fed into the optimizer becomes the PGO pipeline.
3. **Automated contribution pipeline.** Profile → identify hot path in SBCL → generate candidate VOP or optimization pass → run test suite (verified by ACL2) → submit patch. Cycle time from "this would be faster if SBCL did X" to "SBCL does X" drops from years to days.
4. **The prover multiplier.** ACL2 makes safe optimization an engineering discipline instead of an experimental art. The agent can generate thousands of variants, verify correctness, benchmark, and keep the Pareto-optimal set — coverage a human cannot match.
* The Same Compression Applies to Social Infrastructure
The Passepartout architecture compresses the *social* ecosystem of the internet the same way it compresses the *software* ecosystem — replacing multiplicative duplication with additive specification.
**The problem:** Twitter (~10M lines), Facebook (~60M lines), Reddit (~5M lines), Discord (~8M lines), Signal (~3M lines), Telegram (~5M lines), LinkedIn (~15M lines), WhatsApp (~3M lines). Each independently implements the same primitives: identity, identity verification, key recovery, encrypted messaging, group messaging, moderation, content addressing, access control, payment integration, rate limiting, spam detection, reporting, appeal, ban propagation, federation, data export, deletion. Each is a separate 3-60 million line codebase with the *same semantics* but different APIs, different bugs, different compliance postures, different threat models. None compose with any other.
**Under Passepartout:**
- One spec for identity: keys, recovery, delegation, attestation. ~500 lines.
- One spec for messaging: content-addressed, encrypted, ordered by a Merkle DAG. ~1,000 lines.
- One spec for groups: membership, permissions, moderation. ~1,500 lines.
- One spec for federation: cross-server state replication, conflict resolution. ~1,000 lines.
- Gates for policy: what content types are allowed, what moderation actions exist, how appeals work. Configurable per community, enforced by the prover.
Total spec surface: ~5,000 lines. Total synthesized code (client + server for any platform): ~50,000 lines of Lisp.
**5,000 lines of spec + 50,000 lines of synthesized implementation vs ~100 million lines of independent, incompatible, duplicative implementations** that each require their own security audits, compliance reviews, and maintenance pipelines.
**Composition replaces isolation.** In the current model, Signal's identity system cannot talk to WhatsApp's. Reddit's moderation cannot federate with Discord's. The protocols don't compose because they were designed independently with no shared semantic foundation. In the Passepartout model, every synthesized client and server speaks the same spec. A community can use a Telegram-like frontend and a Reddit-like frontend — both synthesized from the same messaging and content-addressing specs — and they share identity, encryption, and federation because those are the same underlying specifications. The frontend is a gate: a policy on how to render the same underlying protocol data.
**The multiplication disappears.** Under the old model, N platforms cost N × (full stack). Under Passepartout, N communities cost one stack + N × (~100 lines of gate policy). The marginal cost of adding a new community approaches zero. This is the same argument as the library gap: N libraries × (different maintainers, different bugs, different API surfaces) collapses to N specifications × (one verified synthesized implementation).
**The deeper claim:** the protocol isn't a separate feature of Passepartout. It is the same compression architecture applied to a different domain — coordination infrastructure instead of library infrastructure. Both are about replacing multiplicative maintenance burden with additive specification complexity, amplified by Lisp's density and the prover's correctness guarantees.
* Timeline Compression
With the self-improving loop, the marginal cost of each incremental improvement drops to near zero. The system improves continuously rather than in discrete releases. Every time compute is added to the pipeline, the system gets faster.
| Gap | Human-only | AI-assisted | Passepartout loop |
|-----|-----------|-------------|------------------|
| SBCL alias analysis | 18 mo | 10 mo | 3-4 mo |
| PGO pipeline | 12 mo | 6 mo | 2-3 mo |
| Cargo-equivalent | 24 mo | 8 mo | 4-6 mo |
| LSP server | 18 mo | 6 mo | 3-4 mo |
| RISC-V Lisp extension | 24 mo | 12 mo | 6-8 mo |
| Library coverage (100 APIs) | 3-5 yrs | 12 mo | 3-4 mo (synthesis) |
| Clasp maturity | 5-10 yrs | 3-5 yrs | 1-2 yrs |
The meta-level effect: once the pipeline exists (profile → generate → verify → test → deploy), the system improves continuously. The gap-closing problem goes from "we need enough talented volunteers" to "we need enough compute" — and compute grows on a known exponential, while talent doesn't.
* Relationship to Other Concepts
- [[id:9af13fff-9725-542b-93b1-a555bc74ad72][Lisp economics]] — the economic viability argument provides the why; this page provides the how.
- [[id:13e6ae54-2d24-5aa0-b1cd-a7e8e749aa70][Self-driving Lisp Machine]] — the hardware endpoint enabled by this software-level gap-closing.
- [[id:efc76898-03f7-57ba-923d-35d65da88bb7][Sufficiency flip]] — the threshold where symbolic reasoning becomes cheaper than LLM calls; applies to compiler optimization as a domain.
- [[id:84a537b4-4256-50c8-91f5-dd5b4538418f][Verification appliance]] — the $5K/year hardware that replaces $500K/year in compliance costs; RISC-V Lisp extension is the technical path to making this run everywhere.

View File

@@ -1,102 +0,0 @@
:PROPERTIES:
:CREATED: [2026-06-03 Tue]
:ID: 85f963a7-a10f-45cc-ace6-6edfeefee762
:END:
#+title: Lisp, Provers, and vs Rust
#+filetags: :ACL2:HOL:LLM:bootstrapping:lisp:rust:theorem-proving:
* Lisp vs Rust — the honest gap
** True defects in Lisp (inherent, not fixable without changing what Lisp is)
- GC pauses — everything is a tagged pointer on the heap; mitigatable but not eliminatable
- No memory model / no Send/Sync — threads share everything; concurrency correctness is a discipline, not a guarantee
** Achievable improvements (implementation/ecosystem, not philosophy)
- Package manager and build system like Cargo (ASDF + Quicklisp is from 2005)
- LSP-level tooling (SBCL infers types well — surface it)
- Static binaries and deployment (save-lisp-and-die + tree-shaker exists, not standard)
- Standard library modernization (hash sets, priority queues, JSON, HTTP, async — Clojure proved this)
- Ecosystem scale (network effect, fixable with sustained effort)
** Things I was wrong about (corrected during conversation)
- The type system is not a fundamental limitation — Coalton embeds sound Hindley-Milner typing inside CL, preserving homoiconicity. You opt in per-file.
- Unpredictable performance is not a language defect — it's a consequence of macro-heavy programming styles. Write in a disciplined subset with declarations and SBCL compiles to within 2x of C. The compiler is deterministic.
- Macros don't cause unpredictability; they enable styles that confuse the optimizer. You choose not to use those styles.
** Everything else is ecosystem and convention, not language capability: Unicode handling, pattern matching, async I/O, immutable data structures, module systems, error messages.
** The numeric tower is a genuine trade-off: Lisp's auto-promoting math (integers, ratios, floats, complex) is mathematically correct but blocks SIMD and overflow assumptions that make Rust's arithmetic fast. Pick your priority.
* Prover discussion
** The architecture for a Lisp prover
An autonomous agent with mathematical proving can close the gap between Lisp's flexibility and Rust's guarantees /without/ making Lisp rigid. The prover sits between the programmer and the compiler:
#+BEGIN_EXAMPLE
[Lisp code] → [Passepartout prover] → [SBCL compiler] → [binary]
[Verified: race-free, type-safe, bounded memory, constant-factor performance]
#+END_EXAMPLE
The prover can:
- Prove memory safety without a borrow checker — analyze call graphs and data flow
- Predict performance — unfold macros, SSA-ize, tell the programmer what will optimize
- Infer types across untyped CL code — flag contradictions, converge toward provably-correct programs
This preserves Lisp's core property: the language stays fluid. The proving agent is an external constraint, not a compiler-enforced limitation. You can always bypass it for fast prototyping.
** Where the coming shift ranks languages
Rust has the most to lose — its entire value proposition is compile-time safety. An AI prover that verifies the same properties about Lisp code makes the borrow checker a solved problem. Rust keeps the advantage only as long as the prover is slower and more expensive than a type system that runs in milliseconds.
Python/JS have the most surface to gain — an AI prover could give them type soundness, no null pointers, thread safety, without changing the languages.
Lisp is uniquely positioned because of homoiconicity. The prover works on S-expressions, which /are/ the AST. No parse/unparse gap. The prover and programmer work on exactly the same artifact. The prover can rewrite a macro expansion without expanding it, annotate specific cons cells with lifetime constraints, and the output is standard diffable code.
** ACL2 and the bootstrap
ACL2 is the right foundation but not the complete solution:
- It is first-order — can't quantify over functions, can't prove meta-level properties about the evaluator itself
- It is interactive — human must guide the proof, lemmas must be manually provided
- It is a subset — pure, no side effects, no CLOS, no I/O, no threading
- It doesn't scale to everyday production code without massive human effort
But ACL2 proves the architecture is viable: a Lisp-based prover verifying Lisp programs is deeply natural.
** Bootstrapping a HOL prover via ACL2 + Screamer
The HOL kernel is famously tiny (HOL Light: ~400 lines of OCaml, 10 primitive inference rules). The path:
1. Write the HOL kernel in pure CL (~500-800 lines)
2. ACL2 verifies the kernel implements the inference rules correctly
3. Screamer provides the proof search engine (backtracking maps to proof tree exploration)
4. HOL proves meta-level properties: macro soundness, evaluator equivalence, compositionality of skills, safety of self-modification
5. HOL theorems are reflected back as ACL2 rewrite rules — automation compounds
Trust chain: SBCL runtime → ACL2 (verified by its own meta-circular proof) → HOL kernel (verified by ACL2) → any HOL theorem.
This is not a research problem — the kernel is a well-known artifact. It's an engineering task: transcribe HOL Light's kernel into CL, verify with ACL2, couple with Screamer.
** An LLM can write the HOL kernel
The HOL kernel is the ideal LLM task: ~400-800 lines of pure code, fully specified, well-documented, stateless, with unambiguous correctness criteria.
Workflow: LLM writes plausible kernel → ACL2 verifies it → ACL2 catches hallucinations → iteration converges in 2-3 attempts.
The LLM is allowed to be wrong. ACL2 is the filter. This is achievable today.
** Comparison with Lean
Lean is the standard for mathematics and Passepartout will not compete with it for that purpose:
- mathlib4 has 100,000+ theorems. A bootstrapped HOL prover starts at zero.
- Lean has a community of mathematicians; a Lisp prover won't attract one.
- Lean's elaborator is the result of decades of type theory research; a HOL kernel + Screamer doesn't touch that.
Where Passepartout's prover wins: it verifies /running code/, not abstract mathematics. It is embedded in a self-modifying agent. Its design is optimized for program properties (memory safety, race freedom, macro soundness), not mathematical expressiveness.
Convergence is possible if the agent automates proof discovery at sufficient speed, but that question is unanswered and is the one Passepartout is designed to answer.
** Key insight from the conversation
The self-writing machine doesn't need to prove itself from the void. It proves each next step using everything proven before. The writer is allowed to be heuristic (LLM). The prover just needs to be conservative and accurate. Every time a human signs off on a proof, it becomes future automation.

View File

@@ -1,126 +0,0 @@
:PROPERTIES:
:CREATED: [2026-05-27 Wed]
:ID: be9bccc7-5adf-4d0d-8ee4-8855892189bf
:END:
#+title: Neurosymbolic Loop Architectures
#+filetags: :passepartout:neurosymbolic:research:verification:architectures:
Taxonomy of loop architectures that combine neural components (LLMs, neural program synthesis) with symbolic components (interpreters, theorem provers, constraint solvers). Each architecture is defined by its answer to a single question: *who verifies whom, and what gets verified?*
* The Six Architectures
**1. Neural-Guided Program Synthesis (DreamCoder et al.)**
Path: generate → execute → filter → train.
The neural network proposes candidate programs. The symbolic executor runs them. The neural network learns from which ones succeeded. The symbolic side is a ground-truth interpreter — a Lisp evaluator, a lambda calculus reducer — but it only answers "does this program execute without error?" It does not answer "is this program correct for any spec." The neural network learns heuristics for what tends to work, but there is no formal correctness guarantee.
Guarantee: empirical (the program ran on the test cases the synthesizer generated).
Bottleneck: search space — too many candidates, too little signal from just "ran without error."
Neural authority: equal to symbolic (both are parts of a loss function).
**2. Neural Theorem Proving (GPT-f, Thor, COPRA, Magnushammer)**
Path: generate tactic → check with prover kernel → backpropagate search guidance.
The neural network proposes proof steps (tactics). The theorem prover kernel (Lean, Isabelle, Metamath) checks them. The neural network learns which tactics work in which proof states. The symbolic side is the final arbiter of correctness — the kernel cannot be wrong.
The constraint: the neural network never generates full programs. It proposes the next single move in a proof that a human already set up. The loop terminates when the kernel accepts.
Guarantee: formal (step-level — the kernel checks each tactic, but the theorem to prove was written by a human).
Bottleneck: tactic prediction quality — the neural net must suggest the right move in a combinatorially large search space.
Neural authority: none over verification — the kernel can reject any tactic. But the neural net has no creative freedom; it fills narrow holes in an already-structured proof.
**3. Differentiable Program Synthesis (τ-MAML, neural Turing machines)**
Path: end-to-end differentiable computation graph.
The program is represented as a continuous computation graph. The symbolic structure is learned jointly with the parameters. There is no separation between neural and symbolic components — they are fused.
Guarantee: none (the "symbolic" structure is a regularizer, not a verifier).
Bottleneck: gradient signal — complex program structure is hard to learn through backpropagation alone.
Neural authority: total (there is no independent symbolic verifier).
**4. LLM + Partial Verifier (ReAct, Reflexion, STaR, self-consistency, Tree-of-Thought)**
Path: generate → check (partial) → retry or halt.
The LLM generates a response. A symbolic layer extracts structured claims. A verifier — often another LLM, sometimes a constraint solver — checks for consistency. If the verifier flags a problem, the LLM retries.
The verifier is partial. It cannot prove the response is correct. It can only detect certain classes of inconsistency (contradictions, type errors, out-of-range values). The user is asked to trust the unverified parts.
Guarantee: partial (some consistency checks pass; no claim of complete correctness).
Bottleneck: LLM cost per loop iteration, and the verifier misses anything it was not programmed to check.
Neural authority: can override the symbolic verifier by producing output the verifier is not designed to catch.
**5. Neural Optimization of Symbolic Programs (DSPy)**
Path: declarative program skeleton → neural optimizer searches prompt/tool-use space → loss function measures output quality → update optimizer.
The symbolic structure (the program skeleton with typed modules) is written by the human. A neural optimizer searches the space of prompts, few-shot examples, and module compositions to minimize a loss function. The goal is program-level optimization, not correctness.
Guarantee: empirical (the loss improved on the evaluation set).
Bottleneck: optimization budget (number of program variants the optimizer can try before the user runs out of patience or tokens).
Neural authority: symmetric with symbolic (both are searchable parameters).
**6. Passepartout: Verified Synthesis Loop**
Path: specification + gates → agent synthesizes implementation → ACL2 prover verifies against spec → gate engine checks policy → counterexample feedback to agent → retry or halt.
Contrast with every other architecture:
| Property | Passepartout | All others |
|----------|-------------|------------|
| Generator scope | Full implementations from spec | Tactics (GPT-f), candidate programs (DreamCoder), responses (ReAct) |
| Verification scope | Complete functional correctness | Step-level (GPT-f), execution-only (DreamCoder), partial consistency (ReAct) |
| Verifier authority | Asymmetric — cannot override | None (DreamCoder), symmetric (DSPy), LLM self-evaluates (Reflexion) |
| Feedback from verifier | Counterexamples with structure | Pass/fail (most), tactic-level error (GPT-f) |
| Gate layer | Independent policy verification | None |
Guarantee: formal (complete — the prover checks the implementation against the full spec for all inputs).
Bottleneck: spec quality (the spec must be complete enough for the prover to work with).
Neural authority: none over verification (the prover is the final authority and cannot be overridden), but the neural component has full creative freedom in how to satisfy the spec.
* The Key Design Dimension: Asymmetric Authority
Every existing loop falls into one of three camps:
**Camp A: No independent verifier.** DreamCoder, differentiable synthesis, DSPy, and pure LLM pipelines have no component that can say "this output is wrong" with authority. The system improves empirically; it never proves correctness. This is the largest camp by publication count.
**Camp B: Constrained generator + complete verifier.** GPT-f, Thor, and COPRA have a theorem prover as the authority, but the generator is constrained to single-tactic moves within a human-written proof skeleton. The verifier covers everything the generator produces — but the generator can only produce a tiny part of the solution. The human provides the creative structure.
**Camp C: Unconstrained generator + partial verifier.** ReAct, Reflexion, and STaR give the LLM broad freedom but use a partial verifier that misses large classes of errors. The verifier's incompleteness means the neural component can always override it by producing output the verifier cannot analyze.
**Passepartout occupies a previously empty position: Camp D — unconstrained generator + complete verifier.** No previous published system gives the neural component complete creative freedom (synthesize the entire implementation) while subjecting its output to complete verification (the prover checks every path against the full spec).
* Why This Position Was Empty
The reasons are historical and economic, not technical:
**The LLM came first, the verifier second.** GPT-f (2021) and Thor (2022) were constrained to tactic-level because the LLMs available at the time could not reliably produce correct code at the module level. DreamCoder (2019) used neural program synthesis, not LLMs, and its generator was too weak to implement a full protocol. The hardware needed to run an LLM powerful enough to synthesize an entire TLS implementation, plus an ACL2 prover to verify it, did not exist in the same price range until roughly 2024.
**The verifier scales differently from the generator.** ACL2 can verify a 50,000-line TLS implementation, but doing so requires the spec to be written in a form the prover can consume — which is a human investment at least as large as writing the implementation. The old cost structure (Gabriel's "correctness is expensive") made the Passepartout loop uneconomical for any practical use case. Only the cost inversion described in [[id:c2789c0f-0955-43af-8a4a-f83ba87128fd][Better is Cheaper]] makes it viable.
**The field prioritized breadth over depth.** DSPy, ReAct, and similar systems optimize for broad applicability across many domains with partial guarantees — the "worse is better" of neurosymbolic design. Passepartout optimizes for deep correctness in a narrower domain. The field has not yet asked "what if we take the complete verification path and give the generator full freedom?" because the components to ask that question did not converge until now.
* What This Means for Research
Passepartout's loop is not patentable as a combination of existing parts — the individual components (LLM, ACL2, gates) are well-known. But the *unoccupied position in the design space* is a research contribution that no paper has described because the loop has only been technically feasible for approximately the last 12-18 months.
The open questions that define the research agenda:
1. **Counterexample bandwidth.** Can the prover produce counterexamples rich enough for the LLM to learn from in a single retry? Prover counterexamples are typically concrete (a specific input where the implementation deviates from spec). The LLM must generalize from one concrete failure to a correct implementation. This is harder for the LLM than what GPT-f faces (the prover tells it exactly which tactic failed and why).
2. **Spec completeness threshold.** How incomplete can a spec be and still produce correct implementations? If the spec is vague (in natural language with formal fragments), the prover cannot check everything, and the loop degrades to ReAct-level partial verification. The research question is the minimum spec density needed to enter the complete-verification regime.
3. **Gate interaction with verification.** What happens when a spec passes the prover but violates a gate (organizational policy)? The gate is a second symbolic layer with different authority — it checks policy, not correctness. Does the gate's failure feedback go to the agent (regenerate), the human (update policy), or both? This interaction has no existing literature because no system has two independent symbolic verification layers.
4. **Specification as bottleneck.** The field has spent decades studying how hard it is to write code. Passepartout replaces "writing code" with "writing specifications." Is specification-writing easier than implementation-writing at the same level of quality? Or does the difficulty just shift from a familiar bottleneck to an unfamiliar one?
These questions are unanswered because the architecture that makes them meaningful — unconstrained generator + complete verifier — did not exist in any published system. Passepartout is not just "another neurosymbolic loop." It occupies a cell in the design space that was previously empty. That is what makes it novel.
* References
- [[id:dddd52a7-adb8-470e-a459-614ade5f76af][Closing the Lisp Gap]] — the context in which this architecture becomes viable
- [[id:c2789c0f-0955-43af-8a4a-f83ba87128fd][Better is Cheaper]] — the cost inversion that makes this loop economical
- [[id:9af13fff-9725-542b-93b1-a555bc74ad72][Lisp Economics]] — why the old cost structure prevented this architecture from being built earlier