gbrain: sync converted org-mode brain files

This commit is contained in:
Hermes
2026-05-28 03:01:01 +00:00
parent 3efbf760e6
commit af132f7e88
26 changed files with 48165 additions and 1 deletions

View File

@@ -0,0 +1,101 @@
:PROPERTIES:
:CREATED: [2026-05-27 Wed]
:ID: d2722576-fc9b-4bd3-bc2f-f5692b561b4e
:END:
#+title: Who Is Closest to Passepartout?
#+filetags: :passepartout:academia:comparison:neurosymbolic:verification:
A survey of academic researchers whose work overlaps with Passepartout's architecture along specific dimensions. The conclusion: no academic group combines all four architectural properties that define Passepartout's design. The closest groups each hold one or two pieces; none combine all.
* The Four Architectural Properties
1. **LLM-level generator with full creative freedom** — the generator synthesizes entire implementations from specifications, not individual tactic steps or hole-fillings.
2. **Theorem-prover verification with complete functional correctness** — the verifier checks all execution paths against the full spec, not bounded verification via SMT solvers.
3. **Asymmetric authority** — the symbolic component (prover) is the final authority and cannot be overridden by the neural component.
4. **Counterexample-guided retry loop** — when the prover rejects an implementation, it returns a concrete counterexample that the generator uses to reformulate.
* The Academic Landscape
**LLM + Theorem Prover Loops**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Sean Welleck | CMU | ImProver 2 | Self-improving LMs generating proof steps verified by Lean | Generator fills tactic holes in existing proofs, not full implementations. Camp B. |
| Timon Gehr | ETH | COPRA, Thor | LLM interacts with theorem prover kernel | Same constraint: tactic-level. Neural component generates one move at a time. |
| Kaiyu Yang | Princeton | LINC | Neural network learns symbolic rules, prover checks consistency | Neural component is a *learner* discovering rules from data, not a generator synthesizing from spec. Different abstraction level. |
All three are Camp B in the loop taxonomy (constrained generator + complete verifier). None gives the LLM freedom to synthesize full implementations. Welleck's ImProver is the closest in spirit — the loop iterates, the prover is authoritative — but the scope of what the generator produces is orders of magnitude smaller than what Passepartout's design requires.
**Synthesis + Verification (non-LLM)**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Armando Solar-Lezama | MIT | Sketch | Synthesis-aided verification: partial program → solver fills holes → assertions checked | Generator is constraint-based SAT/SMT, not an LLM. Verification is bounded (solver capacity). |
| Emina Torlak | UW | Rosette | Solver-aided languages for synthesis + verification | Same constraints as Sketch. Bounded, non-LLM. |
| Swarat Chaudhuri | UT Austin | Neurosymbolic Programming | Neural networks guide program synthesis, symbolic analysis verifies | Uses SMT for bounded verification, not theorem prover for complete. Neural-symbolic are symmetric collaborators, not asymmetric authority. |
Chaudhuri is the closest overall academic neighbor. His group explicitly works on combining neural and symbolic components, with symbolic verification of neural-generated candidates. But the verification is bounded (SMT), not complete (theorem prover), and the loop does not have Passepartout's asymmetric authority design.
**Lisp as Infrastructure for Verification**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Christian Schafmeister | Temple | Clasp | Common Lisp through LLVM; interactive Lisp for serious computation | Lisp infrastructure, not a neurosymbolic loop. No ACL2 integration. |
| Kaufmann, Moore | UT Austin / Retired | ACL2 | The theorem prover itself | Pure symbolic verification. No LLM loop. |
Schafmeister is aligned with Passepartout on the "why Lisp" question — interactive development, uniform representation, C++ interop for performance — but does not work on agentic verification loops.
**Autonomous Code Modification Loops**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Kevin Ellis | Cornell | DreamCoder | Neural program synthesis loop: generate → execute → learn | Verifier is interpreter (does it run?), not prover (is it correct?). Camp A. |
| Andrej Karpathy | Anthropic | autoresearch | LLM modifies code, runs experiments, keeps/discards based on metric | Verifier is val_bpb — a single empirical number. No specification, no formal guarantee. Camp C. |
Both prove the viability of the autonomous loop concept but use the weakest possible verifiers (execution and empirical metrics).
**Integrate-Symbolic-Into-Neural (Garcez)**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Artur d'Avila Garcez | City, Univ. of London | NeSy frameworks, NSL | Pioneer of neural-symbolic computation since 1990s. Book: "Neural-Symbolic Cognitive Reasoning." Runs NeSy workshop series. | His approach *integrates* symbolic knowledge into neural networks (logic regularization, knowledge distillation). Symbolic rules are a training signal, not a runtime verifier. The neural component can override symbolic constraints through the loss landscape. No asymmetric authority, no theorem prover, no complete verification. His camp is "make neural networks behave more symbolically." Passepartout's camp is "make neural networks accountable to symbolic verification." Opposite architectural philosophy. |
Garcez's position in the design space is closest to Camp A (no independent verifier). The symbolic rules guide learning but do not veto outputs at runtime. His work is foundational for the field of neural-symbolic computation, but his *architectural philosophy* is the inverse of Passepartout's. He wants the symbolic inside the neural. Passepartout wants them separate with the symbolic holding authority.
**Theorist of the Hybrid Thesis (Marcus)**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Gary Marcus | NYU Emeritus, Robust.AI | None (theorist/critic) | Longest-standing public advocate for hybrid AI. Since "The Algebraic Mind" (2001) and "Rebooting AI" (2019), he has argued deep learning alone cannot achieve systematicity, composition, or reasoning. He identified the *need* for the approach Passepartout implements. As of May 2026, he is publicly asking why LLM agent frameworks are not using LEAN as a theorem-prover verifier — the same engineering gap Passepartout occupies. | He does not propose a specific architecture or loop design. His background is cognitive science and developmental psychology, not formal verification. The "symbolic component" he advocates is abstract — structured knowledge representations, not ACL2 theorem proving. He has no answer to the cost/feasibility question (the "Better is Cheaper" argument is Passepartout's contribution, not Marcus's). He is a theorist of the problem, not an architect of the solution — though his May 2026 tweet shows he is now engaging with the engineering question directly. |
Marcus occupies a category that does not appear in the loop taxonomy (Camps A-D) because he does not define a loop. He identifies the *need* for hybrid AI with genuine symbolic authority. Passepartout is the engineering response to the thesis Marcus has been arguing since before most of the field would admit the limitations existed. His May 2026 tweet asking "they aren't using LEAN in one of those many tools?" is the theorist noticing the empty cell Passepartout was designed to fill.
* The Gap
| Property | Passepartout | Closest academic | Academic's limiter |
|----------|-------------|-----------------|-------------------|
| Generator freedom | Full synthesis from spec | ImProver (Welleck) | Fills tactic holes only |
| Verification completeness | Complete (theorem prover) | Sketch (Solar-Lezama) | Bounded (SMT) |
| Asymmetric authority | Neural cannot override prover | Neurosymbolic Prog (Chaudhuri) | Symmetric collaboration |
| Counterexample feedback | Structured from prover to LLM | ImProver (Welleck) | Pass/fail at tactic level |
| Two symbolic layers | Gates + prover independent | None | No second layer exists |
No academic group combines all four properties. The closest — Chaudhuri — has three of five (neural + symbolic + verification) but fails on verification completeness (SMT not ACL2), asymmetric authority (symmetric not asymmetric), and the two-layer gate design.
* What This Means
The gap is either:
1. **A genuinely empty cell in the design space.** The combination is novel, the components have not converged in one system before, and Passepartout's design is early.
2. **A sign that the combination is not as valuable as the components.** No major academic lab has invested in this specific loop because the cost of writing complete formal specifications exceeds the benefit of complete formal verification, given the alternative of bounded verification (SMT) with cheaper spec costs.
The way to distinguish (1) from (2) is to build the architecture and measure whether the spec-writing cost is amortized over enough synthesized implementations to justify it. Passepartout's answer is: yes, because specs are written once and implementations are generated for every deployment context. The academic literature has not tested this claim.
* References
- [[id:be9bccc7-5adf-4d0d-8ee4-8855892189bf][Neurosymbolic Loop Architectures]] — the taxonomy that positions these comparisons
- [[id:ee8f3b2a-4c7d-4e1b-9b0a-6d8f2e3c1a5b][Neurosymbolic AI Paper Library]] — papers referenced above are in the local library

View File

@@ -0,0 +1,138 @@
:PROPERTIES:
:CREATED: [2026-05-27 Wed]
:ID: dddd52a7-adb8-470e-a459-614ade5f76af
:END:
#+title: Closing the Lisp Gap
#+filetags: :passepartout:lisp:compiler:performance:ecosystem:asics:bootstrapping:
Lisp has a performance gap with C and Rust of roughly 10-20% on hot numerical code and a much larger ecosystem gap (package manager, LSP, cross-compilation, library count ~2k vs ~180k). This page diagnoses both gaps and argues that [[id:28c46769-c14b-42aa-ac7a-69d310157f8f][Passepartout]]'s architecture — fact store, gates, ACL2 prover, self-improving loop — changes the nature of the problem from community-coordination to knowledge-capture, making the gap closable in 1-2 years rather than 20-30 person-years.
See [[id:9af13fff-9725-542b-93b1-a555bc74ad72][why Lisp is economically viable now]] for the economic foundation. See [[id:13e6ae54-2d24-5aa0-b1cd-a7e8e749aa70][self-driving Lisp Machine]] for the hardware endpoint.
* The Performance Gap
What Lisp (SBCL) still needs to close the remaining gap with C/Rust:
- **Alias analysis on SBCL's IR.** Rust's borrow checker gives LLVM perfect aliasing at function scope. SBCL conservatively assumes any memory reference could alias any other. Fix: a static analysis pass proving non-aliasing for common patterns (different simple-array types, arrays vs scalars). ~1-2 person-years, biggest single investment.
- **LLVM backend (Clasp).** Clasp gives LLVM's full pipeline — auto-vectorization, LICM, inlining cost modeling — at the cost of fighting LLVM's static worldview with CLOS's dynamic semantics. Clasp's hybrid approach (fast-path LLVM, slow-path runtime) is the right design. ~5-10 person-years to reach SBCL's general performance with better hot-path optimization.
- **PGO (profile-guided optimization).** SBCL has profiling (sb-sprof) but no feedback loop — the compiler cannot reweight branches or layout hot/cold blocks. Fix: instrumentation pass that emits counter VOPs, a profile binary, and a recompilation step that annotates IR with branch weights. ~1 person-year.
- **Post-link optimization.** Facebook's BOLT profiles binary execution and rewrites code layout to reduce I-cache misses. SBCL's unusual code layout (trampolines, closure entry points) chokes BOLT. Hard research problem.
- **Vectorization VOPs.** No standard SIMD library for Lisp. Every ISA extension needs its own VOP set. C gets this via xmmintrin.h and auto-vectorization. ~2-3 person-years for a portable SIMD abstraction backed by platform VOPs.
* The Ecosystem Gap
| Feature | Rust | Common Lisp |
|---------|------|-------------|
| Package manager | Cargo (lockfile, binary cache, workspace) | Quicklisp (no lockfile, source-only) |
| LSP | rust-analyzer (excellent) | No universal LSP |
| Formatter | rustfmt (universal) | cl-form (not universal) |
| Linter | Clippy (>700 rules) | No equivalent |
| Doc system | rustdoc (integrated) | Conventions only |
| Cross-compilation | --target flag | Manual, source-based |
| Test framework | Built-in with benchmarks | Various (FiveAM, Prove) |
| Sanitizers | ASan, TSan, MSan, UBSan | None |
| Library count | ~180k | ~2k |
| WASM target | First-class | Modest |
| Mobile targets | First-class | None native |
The largest gaps are the package manager (the single highest-leverage investment — ~2-3 person-years + community coordination) and the library count (~100× fewer).
* Lisp ASICs: What They Change
A Lisp ASIC (Symbolics Ivory, CADR, or a modern RISC-V variant) **raises the floor** more than the ceiling:
**What it gives in hardware:**
- Tag checking on every memory access — types, GC bits, forwarding pointers in parallel with ALU
- Hardware CONS — single-cycle allocation from a dedicated region with automatic write barriers
- Hardware GC support — generational barriers at cache level
- Hardware generic function dispatch — type-code lookup in CAM instead of method cache
- Hardware stack groups and shallow binding — zero-cost special variable rebinding
**Does it eliminate the need for compiler optimization?** Both yes and no. The worst case becomes much faster — naive code and optimized code are closer together. But the compiler still matters for instruction scheduling, inlining decisions, dead code elimination, and loop optimizations. The ASIC shifts the optimizer's job from "avoid Lisp overhead" to "use special instructions effectively." The gap between C and Lisp on hot numerical loops narrows from ~20% to maybe 5%.
* Embedded and Small Systems: RISC-V Lisp Extension
A full Lisp Machine chip doesn't make sense for embedded, but a **RISC-V extension with 5-10 Lisp primitives** does:
- ~5 new instructions added to a RV32/RV64 core (CONS with write barrier, tag dispatch, read barrier, tag extract, GC check)
- Implementable as an FPGA soft-core or small coprocessor alongside a microcontroller
- ~50K additional gates on a RISC-V e300-class core
This makes Lisp viable where C is currently mandatory: real-time control, sensor nodes, IoT. The hardware CONS eliminates allocation cost; the hardware read barrier eliminates GC pause unpredictability by making the collector a concurrent background task instead of a stop-the-world event. The gap goes from 10-100× (current, boxed allocation + GC) to ~2-5×.
**Strategic play:** Define the extension, open-source an FPGA implementation, get it ratified as a standard RISC-V extension. Then any chip vendor can add it.
* Why Passepartout Changes the Equation
The standard model for closing the library gap: *more users → more libraries → more users (virtuous cycle).*
Passepartout's model: *agent synthesizes libraries on demand → fewer blockers → more users.*
**Knowledge-capture replaces community-coordination.** The rate-limiter shifts from "people who can write Lisp libraries" to "people who can explain their domain well." A 3-hour conversation with a domain expert captures the specifications, invariants, known techniques, and correctness criteria. The agent synthesizes the library, verifies it against stored properties via ACL2, and hot-reloads it.
**What changes:**
- Standard protocols (HTTP, SQL, JSON, crypto, file formats) — the agent synthesizes these from specs it has seen in training.
- Wrapper libraries (CFFI bindings, REST clients) — mechanical, the agent handles well.
- Domain-specific libraries — the hardest category *without* a knowledge architecture, and the most tractable *with* one. The gate becomes the bridge: human provides $10K/year-level insight, agent provides $200K/year-level engineering.
- What stays hard: libraries where the algorithm requires original research (not yet done by anyone), libraries requiring specialized hardware knowledge the human cannot articulate, and battle-tested implementations where verification of constant-time correctness is near-impossible.
**Passepartout's position to direct Lisp development:**
1. **Instrumentation advantage.** It runs in Lisp, can profile its own execution in Lisp terms, identify SBCL's compiler bottlenecks, and generate patches — all in a closed feedback loop that C compilers lack.
2. **Coordination advantage by adoption.** If Passepartout becomes the standard Lisp agent framework, its tooling choices become de facto standards: ocicl with lockfiles becomes the package manager, cl-lsp gets maintenance, sb-sprof fed into the optimizer becomes the PGO pipeline.
3. **Automated contribution pipeline.** Profile → identify hot path in SBCL → generate candidate VOP or optimization pass → run test suite (verified by ACL2) → submit patch. Cycle time from "this would be faster if SBCL did X" to "SBCL does X" drops from years to days.
4. **The prover multiplier.** ACL2 makes safe optimization an engineering discipline instead of an experimental art. The agent can generate thousands of variants, verify correctness, benchmark, and keep the Pareto-optimal set — coverage a human cannot match.
* The Same Compression Applies to Social Infrastructure
The Passepartout architecture compresses the *social* ecosystem of the internet the same way it compresses the *software* ecosystem — replacing multiplicative duplication with additive specification.
**The problem:** Twitter (~10M lines), Facebook (~60M lines), Reddit (~5M lines), Discord (~8M lines), Signal (~3M lines), Telegram (~5M lines), LinkedIn (~15M lines), WhatsApp (~3M lines). Each independently implements the same primitives: identity, identity verification, key recovery, encrypted messaging, group messaging, moderation, content addressing, access control, payment integration, rate limiting, spam detection, reporting, appeal, ban propagation, federation, data export, deletion. Each is a separate 3-60 million line codebase with the *same semantics* but different APIs, different bugs, different compliance postures, different threat models. None compose with any other.
**Under Passepartout:**
- One spec for identity: keys, recovery, delegation, attestation. ~500 lines.
- One spec for messaging: content-addressed, encrypted, ordered by a Merkle DAG. ~1,000 lines.
- One spec for groups: membership, permissions, moderation. ~1,500 lines.
- One spec for federation: cross-server state replication, conflict resolution. ~1,000 lines.
- Gates for policy: what content types are allowed, what moderation actions exist, how appeals work. Configurable per community, enforced by the prover.
Total spec surface: ~5,000 lines. Total synthesized code (client + server for any platform): ~50,000 lines of Lisp.
**5,000 lines of spec + 50,000 lines of synthesized implementation vs ~100 million lines of independent, incompatible, duplicative implementations** that each require their own security audits, compliance reviews, and maintenance pipelines.
**Composition replaces isolation.** In the current model, Signal's identity system cannot talk to WhatsApp's. Reddit's moderation cannot federate with Discord's. The protocols don't compose because they were designed independently with no shared semantic foundation. In the Passepartout model, every synthesized client and server speaks the same spec. A community can use a Telegram-like frontend and a Reddit-like frontend — both synthesized from the same messaging and content-addressing specs — and they share identity, encryption, and federation because those are the same underlying specifications. The frontend is a gate: a policy on how to render the same underlying protocol data.
**The multiplication disappears.** Under the old model, N platforms cost N × (full stack). Under Passepartout, N communities cost one stack + N × (~100 lines of gate policy). The marginal cost of adding a new community approaches zero. This is the same argument as the library gap: N libraries × (different maintainers, different bugs, different API surfaces) collapses to N specifications × (one verified synthesized implementation).
**The deeper claim:** the protocol isn't a separate feature of Passepartout. It is the same compression architecture applied to a different domain — coordination infrastructure instead of library infrastructure. Both are about replacing multiplicative maintenance burden with additive specification complexity, amplified by Lisp's density and the prover's correctness guarantees.
* Timeline Compression
With the self-improving loop, the marginal cost of each incremental improvement drops to near zero. The system improves continuously rather than in discrete releases. Every time compute is added to the pipeline, the system gets faster.
| Gap | Human-only | AI-assisted | Passepartout loop |
|-----|-----------|-------------|------------------|
| SBCL alias analysis | 18 mo | 10 mo | 3-4 mo |
| PGO pipeline | 12 mo | 6 mo | 2-3 mo |
| Cargo-equivalent | 24 mo | 8 mo | 4-6 mo |
| LSP server | 18 mo | 6 mo | 3-4 mo |
| RISC-V Lisp extension | 24 mo | 12 mo | 6-8 mo |
| Library coverage (100 APIs) | 3-5 yrs | 12 mo | 3-4 mo (synthesis) |
| Clasp maturity | 5-10 yrs | 3-5 yrs | 1-2 yrs |
The meta-level effect: once the pipeline exists (profile → generate → verify → test → deploy), the system improves continuously. The gap-closing problem goes from "we need enough talented volunteers" to "we need enough compute" — and compute grows on a known exponential, while talent doesn't.
* Relationship to Other Concepts
- [[id:9af13fff-9725-542b-93b1-a555bc74ad72][Lisp economics]] — the economic viability argument provides the why; this page provides the how.
- [[id:13e6ae54-2d24-5aa0-b1cd-a7e8e749aa70][Self-driving Lisp Machine]] — the hardware endpoint enabled by this software-level gap-closing.
- [[id:efc76898-03f7-57ba-923d-35d65da88bb7][Sufficiency flip]] — the threshold where symbolic reasoning becomes cheaper than LLM calls; applies to compiler optimization as a domain.
- [[id:84a537b4-4256-50c8-91f5-dd5b4538418f][Verification appliance]] — the $5K/year hardware that replaces $500K/year in compliance costs; RISC-V Lisp extension is the technical path to making this run everywhere.

View File

@@ -0,0 +1,126 @@
:PROPERTIES:
:CREATED: [2026-05-27 Wed]
:ID: be9bccc7-5adf-4d0d-8ee4-8855892189bf
:END:
#+title: Neurosymbolic Loop Architectures
#+filetags: :passepartout:neurosymbolic:research:verification:architectures:
Taxonomy of loop architectures that combine neural components (LLMs, neural program synthesis) with symbolic components (interpreters, theorem provers, constraint solvers). Each architecture is defined by its answer to a single question: *who verifies whom, and what gets verified?*
* The Six Architectures
**1. Neural-Guided Program Synthesis (DreamCoder et al.)**
Path: generate → execute → filter → train.
The neural network proposes candidate programs. The symbolic executor runs them. The neural network learns from which ones succeeded. The symbolic side is a ground-truth interpreter — a Lisp evaluator, a lambda calculus reducer — but it only answers "does this program execute without error?" It does not answer "is this program correct for any spec." The neural network learns heuristics for what tends to work, but there is no formal correctness guarantee.
Guarantee: empirical (the program ran on the test cases the synthesizer generated).
Bottleneck: search space — too many candidates, too little signal from just "ran without error."
Neural authority: equal to symbolic (both are parts of a loss function).
**2. Neural Theorem Proving (GPT-f, Thor, COPRA, Magnushammer)**
Path: generate tactic → check with prover kernel → backpropagate search guidance.
The neural network proposes proof steps (tactics). The theorem prover kernel (Lean, Isabelle, Metamath) checks them. The neural network learns which tactics work in which proof states. The symbolic side is the final arbiter of correctness — the kernel cannot be wrong.
The constraint: the neural network never generates full programs. It proposes the next single move in a proof that a human already set up. The loop terminates when the kernel accepts.
Guarantee: formal (step-level — the kernel checks each tactic, but the theorem to prove was written by a human).
Bottleneck: tactic prediction quality — the neural net must suggest the right move in a combinatorially large search space.
Neural authority: none over verification — the kernel can reject any tactic. But the neural net has no creative freedom; it fills narrow holes in an already-structured proof.
**3. Differentiable Program Synthesis (τ-MAML, neural Turing machines)**
Path: end-to-end differentiable computation graph.
The program is represented as a continuous computation graph. The symbolic structure is learned jointly with the parameters. There is no separation between neural and symbolic components — they are fused.
Guarantee: none (the "symbolic" structure is a regularizer, not a verifier).
Bottleneck: gradient signal — complex program structure is hard to learn through backpropagation alone.
Neural authority: total (there is no independent symbolic verifier).
**4. LLM + Partial Verifier (ReAct, Reflexion, STaR, self-consistency, Tree-of-Thought)**
Path: generate → check (partial) → retry or halt.
The LLM generates a response. A symbolic layer extracts structured claims. A verifier — often another LLM, sometimes a constraint solver — checks for consistency. If the verifier flags a problem, the LLM retries.
The verifier is partial. It cannot prove the response is correct. It can only detect certain classes of inconsistency (contradictions, type errors, out-of-range values). The user is asked to trust the unverified parts.
Guarantee: partial (some consistency checks pass; no claim of complete correctness).
Bottleneck: LLM cost per loop iteration, and the verifier misses anything it was not programmed to check.
Neural authority: can override the symbolic verifier by producing output the verifier is not designed to catch.
**5. Neural Optimization of Symbolic Programs (DSPy)**
Path: declarative program skeleton → neural optimizer searches prompt/tool-use space → loss function measures output quality → update optimizer.
The symbolic structure (the program skeleton with typed modules) is written by the human. A neural optimizer searches the space of prompts, few-shot examples, and module compositions to minimize a loss function. The goal is program-level optimization, not correctness.
Guarantee: empirical (the loss improved on the evaluation set).
Bottleneck: optimization budget (number of program variants the optimizer can try before the user runs out of patience or tokens).
Neural authority: symmetric with symbolic (both are searchable parameters).
**6. Passepartout: Verified Synthesis Loop**
Path: specification + gates → agent synthesizes implementation → ACL2 prover verifies against spec → gate engine checks policy → counterexample feedback to agent → retry or halt.
Contrast with every other architecture:
| Property | Passepartout | All others |
|----------|-------------|------------|
| Generator scope | Full implementations from spec | Tactics (GPT-f), candidate programs (DreamCoder), responses (ReAct) |
| Verification scope | Complete functional correctness | Step-level (GPT-f), execution-only (DreamCoder), partial consistency (ReAct) |
| Verifier authority | Asymmetric — cannot override | None (DreamCoder), symmetric (DSPy), LLM self-evaluates (Reflexion) |
| Feedback from verifier | Counterexamples with structure | Pass/fail (most), tactic-level error (GPT-f) |
| Gate layer | Independent policy verification | None |
Guarantee: formal (complete — the prover checks the implementation against the full spec for all inputs).
Bottleneck: spec quality (the spec must be complete enough for the prover to work with).
Neural authority: none over verification (the prover is the final authority and cannot be overridden), but the neural component has full creative freedom in how to satisfy the spec.
* The Key Design Dimension: Asymmetric Authority
Every existing loop falls into one of three camps:
**Camp A: No independent verifier.** DreamCoder, differentiable synthesis, DSPy, and pure LLM pipelines have no component that can say "this output is wrong" with authority. The system improves empirically; it never proves correctness. This is the largest camp by publication count.
**Camp B: Constrained generator + complete verifier.** GPT-f, Thor, and COPRA have a theorem prover as the authority, but the generator is constrained to single-tactic moves within a human-written proof skeleton. The verifier covers everything the generator produces — but the generator can only produce a tiny part of the solution. The human provides the creative structure.
**Camp C: Unconstrained generator + partial verifier.** ReAct, Reflexion, and STaR give the LLM broad freedom but use a partial verifier that misses large classes of errors. The verifier's incompleteness means the neural component can always override it by producing output the verifier cannot analyze.
**Passepartout occupies a previously empty position: Camp D — unconstrained generator + complete verifier.** No previous published system gives the neural component complete creative freedom (synthesize the entire implementation) while subjecting its output to complete verification (the prover checks every path against the full spec).
* Why This Position Was Empty
The reasons are historical and economic, not technical:
**The LLM came first, the verifier second.** GPT-f (2021) and Thor (2022) were constrained to tactic-level because the LLMs available at the time could not reliably produce correct code at the module level. DreamCoder (2019) used neural program synthesis, not LLMs, and its generator was too weak to implement a full protocol. The hardware needed to run an LLM powerful enough to synthesize an entire TLS implementation, plus an ACL2 prover to verify it, did not exist in the same price range until roughly 2024.
**The verifier scales differently from the generator.** ACL2 can verify a 50,000-line TLS implementation, but doing so requires the spec to be written in a form the prover can consume — which is a human investment at least as large as writing the implementation. The old cost structure (Gabriel's "correctness is expensive") made the Passepartout loop uneconomical for any practical use case. Only the cost inversion described in [[id:c2789c0f-0955-43af-8a4a-f83ba87128fd][Better is Cheaper]] makes it viable.
**The field prioritized breadth over depth.** DSPy, ReAct, and similar systems optimize for broad applicability across many domains with partial guarantees — the "worse is better" of neurosymbolic design. Passepartout optimizes for deep correctness in a narrower domain. The field has not yet asked "what if we take the complete verification path and give the generator full freedom?" because the components to ask that question did not converge until now.
* What This Means for Research
Passepartout's loop is not patentable as a combination of existing parts — the individual components (LLM, ACL2, gates) are well-known. But the *unoccupied position in the design space* is a research contribution that no paper has described because the loop has only been technically feasible for approximately the last 12-18 months.
The open questions that define the research agenda:
1. **Counterexample bandwidth.** Can the prover produce counterexamples rich enough for the LLM to learn from in a single retry? Prover counterexamples are typically concrete (a specific input where the implementation deviates from spec). The LLM must generalize from one concrete failure to a correct implementation. This is harder for the LLM than what GPT-f faces (the prover tells it exactly which tactic failed and why).
2. **Spec completeness threshold.** How incomplete can a spec be and still produce correct implementations? If the spec is vague (in natural language with formal fragments), the prover cannot check everything, and the loop degrades to ReAct-level partial verification. The research question is the minimum spec density needed to enter the complete-verification regime.
3. **Gate interaction with verification.** What happens when a spec passes the prover but violates a gate (organizational policy)? The gate is a second symbolic layer with different authority — it checks policy, not correctness. Does the gate's failure feedback go to the agent (regenerate), the human (update policy), or both? This interaction has no existing literature because no system has two independent symbolic verification layers.
4. **Specification as bottleneck.** The field has spent decades studying how hard it is to write code. Passepartout replaces "writing code" with "writing specifications." Is specification-writing easier than implementation-writing at the same level of quality? Or does the difficulty just shift from a familiar bottleneck to an unfamiliar one?
These questions are unanswered because the architecture that makes them meaningful — unconstrained generator + complete verifier — did not exist in any published system. Passepartout is not just "another neurosymbolic loop." It occupies a cell in the design space that was previously empty. That is what makes it novel.
* References
- [[id:dddd52a7-adb8-470e-a459-614ade5f76af][Closing the Lisp Gap]] — the context in which this architecture becomes viable
- [[id:c2789c0f-0955-43af-8a4a-f83ba87128fd][Better is Cheaper]] — the cost inversion that makes this loop economical
- [[id:9af13fff-9725-542b-93b1-a555bc74ad72][Lisp Economics]] — why the old cost structure prevented this architecture from being built earlier