Add missing _index.org files for 7 sections: stages, ai-agents-scoping, compliance, impact, social-protocol, verification, resources — rebuild clean after file reorganization

This commit is contained in:
Hermes
2026-06-03 21:50:01 +00:00
parent ac99eed182
commit 9b795df14e
79 changed files with 156 additions and 77 deletions

View File

@@ -0,0 +1,107 @@
:PROPERTIES:
:CREATED: [2026-05-27 Wed]
:ID: d2722576-fc9b-4bd3-bc2f-f5692b561b4e
:END:
#+title: Who Is Closest to Passepartout?
#+filetags: :passepartout:academia:comparison:neurosymbolic:verification:
A survey of academic researchers whose work overlaps with Passepartout's architecture along specific dimensions. The conclusion: no academic group combines all four architectural properties that define Passepartout's design. The closest groups each hold one or two pieces; none combine all.
* The Four Architectural Properties
1. **LLM-level generator with full creative freedom** — the generator synthesizes entire implementations from specifications, not individual tactic steps or hole-fillings.
2. **Theorem-prover verification with complete functional correctness** — the verifier checks all execution paths against the full spec, not bounded verification via SMT solvers.
3. **Asymmetric authority** — the symbolic component (prover) is the final authority and cannot be overridden by the neural component.
4. **Counterexample-guided retry loop** — when the prover rejects an implementation, it returns a concrete counterexample that the generator uses to reformulate.
* The Academic Landscape
**LLM + Theorem Prover Loops**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Sean Welleck | CMU | ImProver 2 | Self-improving LMs generating proof steps verified by Lean | Generator fills tactic holes in existing proofs, not full implementations. Camp B. |
| Timon Gehr | ETH | COPRA, Thor | LLM interacts with theorem prover kernel | Same constraint: tactic-level. Neural component generates one move at a time. |
| Kaiyu Yang | Princeton | LINC | Neural network learns symbolic rules, prover checks consistency | Neural component is a *learner* discovering rules from data, not a generator synthesizing from spec. Different abstraction level. |
All three are Camp B in the loop taxonomy (constrained generator + complete verifier). None gives the LLM freedom to synthesize full implementations. Welleck's ImProver is the closest in spirit — the loop iterates, the prover is authoritative — but the scope of what the generator produces is orders of magnitude smaller than what Passepartout's design requires.
**Synthesis + Verification (non-LLM)**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Armando Solar-Lezama | MIT | Sketch | Synthesis-aided verification: partial program → solver fills holes → assertions checked | Generator is constraint-based SAT/SMT, not an LLM. Verification is bounded (solver capacity). |
| Emina Torlak | UW | Rosette | Solver-aided languages for synthesis + verification | Same constraints as Sketch. Bounded, non-LLM. |
| Swarat Chaudhuri | UT Austin | Neurosymbolic Programming | Neural networks guide program synthesis, symbolic analysis verifies | Uses SMT for bounded verification, not theorem prover for complete. Neural-symbolic are symmetric collaborators, not asymmetric authority. |
Chaudhuri is the closest overall academic neighbor. His group explicitly works on combining neural and symbolic components, with symbolic verification of neural-generated candidates. But the verification is bounded (SMT), not complete (theorem prover), and the loop does not have Passepartout's asymmetric authority design.
**Lisp as Infrastructure for Verification**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Christian Schafmeister | Temple | Clasp | Common Lisp through LLVM; interactive Lisp for serious computation | Lisp infrastructure, not a neurosymbolic loop. No ACL2 integration. |
| Kaufmann, Moore | UT Austin / Retired | ACL2 | The theorem prover itself | Pure symbolic verification. No LLM loop. |
Schafmeister is aligned with Passepartout on the "why Lisp" question — interactive development, uniform representation, C++ interop for performance — but does not work on agentic verification loops.
**Autonomous Code Modification Loops**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Kevin Ellis | Cornell | DreamCoder | Neural program synthesis loop: generate → execute → learn | Verifier is interpreter (does it run?), not prover (is it correct?). Camp A. |
| Andrej Karpathy | Anthropic | autoresearch | LLM modifies code, runs experiments, keeps/discards based on metric | Verifier is val_bpb — a single empirical number. No specification, no formal guarantee. Camp C. |
Both prove the viability of the autonomous loop concept but use the weakest possible verifiers (execution and empirical metrics).
**The Bitter Lesson / Temporal Credit Assignment (Sutton)**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Richard Sutton | Alberta / Keen Technologies | TD learning, eligibility traces, Alberta Plan | The fundamental problem in verification — *an action was checked, but the consequence plays out hours later; was the action correct?* — is the same problem TD learning solves in RL: assigning credit to actions based on delayed outcomes. Sutton's temporal credit assignment work is the theory you would need to extend Passepartout from per-action gates to trajectory-level verification. His Bitter Lesson (scale beats engineered knowledge at sufficient compute) is the most commonly cited argument against the symbolic verification approach Passepartout bets on. | The Bitter Lesson is not anti-knowledge — it says methods that improve with more computation eventually dominate. Passepartout's gate is a deliberately small engineered knowledge system that *won't* benefit from more compute (the ACL2 lemmas don't get more correct with more hardware). That's acceptable because the gate is a narrow bottleneck (permit/deny). The LLM layer inside the gate *does* benefit from scale. The architecture already respects the Bitter Lesson by placing the scalable piece where scale helps and the non-scalable piece where deductive certainty matters. Sutton's Alberta Plan (world model + reward + learning algorithm) parallels Passepartout's Stage 6 (world model + gate + verified fine-tuning), but Sutton's agents learn by pure reward while Passepartout's learn by reward constrained by verified policy. Sutton would likely argue that a learned safety policy at scale would outcompete the gate. Passepartout's bet is that access control, message authentication, and compliance should never be probabilistic, even at infinite scale.
**Integrate-Symbolic-Into-Neural (Garcez)**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Artur d'Avila Garcez | City, Univ. of London | NeSy frameworks, NSL | Pioneer of neural-symbolic computation since 1990s. Book: "Neural-Symbolic Cognitive Reasoning." Runs NeSy workshop series. | His approach *integrates* symbolic knowledge into neural networks (logic regularization, knowledge distillation). Symbolic rules are a training signal, not a runtime verifier. The neural component can override symbolic constraints through the loss landscape. No asymmetric authority, no theorem prover, no complete verification. His camp is "make neural networks behave more symbolically." Passepartout's camp is "make neural networks accountable to symbolic verification." Opposite architectural philosophy. |
Garcez's position in the design space is closest to Camp A (no independent verifier). The symbolic rules guide learning but do not veto outputs at runtime. His work is foundational for the field of neural-symbolic computation, but his *architectural philosophy* is the inverse of Passepartout's. He wants the symbolic inside the neural. Passepartout wants them separate with the symbolic holding authority.
**Theorist of the Hybrid Thesis (Marcus)**
| Researcher | Institution | System | Match | Divergence |
|------------|-------------|--------|-------|------------|
| Gary Marcus | NYU Emeritus, Robust.AI | None (theorist/critic) | Longest-standing public advocate for hybrid AI. Since "The Algebraic Mind" (2001) and "Rebooting AI" (2019), he has argued deep learning alone cannot achieve systematicity, composition, or reasoning. He identified the *need* for the approach Passepartout implements. As of May 2026, he is publicly asking why LLM agent frameworks are not using LEAN as a theorem-prover verifier — the same engineering gap Passepartout occupies. | He does not propose a specific architecture or loop design. His background is cognitive science and developmental psychology, not formal verification. The "symbolic component" he advocates is abstract — structured knowledge representations, not ACL2 theorem proving. He has no answer to the cost/feasibility question (the "Better is Cheaper" argument is Passepartout's contribution, not Marcus's). He is a theorist of the problem, not an architect of the solution — though his May 2026 tweet shows he is now engaging with the engineering question directly. |
Marcus occupies a category that does not appear in the loop taxonomy (Camps A-D) because he does not define a loop. He identifies the *need* for hybrid AI with genuine symbolic authority. Passepartout is the engineering response to the thesis Marcus has been arguing since before most of the field would admit the limitations existed. His May 2026 tweet asking "they aren't using LEAN in one of those many tools?" is the theorist noticing the empty cell Passepartout was designed to fill.
* The Gap
| Property | Passepartout | Closest academic | Academic's limiter |
|----------|-------------|-----------------|-------------------|
| Generator freedom | Full synthesis from spec | ImProver (Welleck) | Fills tactic holes only |
| Verification completeness | Complete (theorem prover) | Sketch (Solar-Lezama) | Bounded (SMT) |
| Asymmetric authority | Neural cannot override prover | Neurosymbolic Prog (Chaudhuri) | Symmetric collaboration |
| Counterexample feedback | Structured from prover to LLM | ImProver (Welleck) | Pass/fail at tactic level |
| Two symbolic layers | Gates + prover independent | None | No second layer exists |
No academic group combines all four properties. The closest — Chaudhuri — has three of five (neural + symbolic + verification) but fails on verification completeness (SMT not ACL2), asymmetric authority (symmetric not asymmetric), and the two-layer gate design.
* What This Means
The gap is either:
1. **A genuinely empty cell in the design space.** The combination is novel, the components have not converged in one system before, and Passepartout's design is early.
2. **A sign that the combination is not as valuable as the components.** No major academic lab has invested in this specific loop because the cost of writing complete formal specifications exceeds the benefit of complete formal verification, given the alternative of bounded verification (SMT) with cheaper spec costs.
The way to distinguish (1) from (2) is to build the architecture and measure whether the spec-writing cost is amortized over enough synthesized implementations to justify it. Passepartout's answer is: yes, because specs are written once and implementations are generated for every deployment context. The academic literature has not tested this claim.
* References
- [[id:be9bccc7-5adf-4d0d-8ee4-8855892189bf][Neurosymbolic Loop Architectures]] — the taxonomy that positions these comparisons
- [[id:ee8f3b2a-4c7d-4e1b-9b0a-6d8f2e3c1a5b][Neurosymbolic AI Paper Library]] — papers referenced above are in the local library

View File

@@ -0,0 +1,76 @@
:PROPERTIES:
:CREATED: [2026-06-03 Tue]
:ID: 3129eae6-f9f2-40fe-a419-8c1af728c86d
:END:
#+title: Faster Theorem Proving — Engineering Approaches
#+filetags: :theorem-proving:engineering:performance:ACL2:HOL:search:verification:
* Architecture
Proof engineering has two phases fundamentally different in cost:
- **Proof checking** — verifying that a candidate proof is correct. Polynomial in proof size, typically microseconds. Never the bottleneck.
- **Proof search** — finding the proof. Ranges from polynomial (decidable fragments) to undecidable (general first-order logic). Always the bottleneck.
Every optimization targets search, not checking.
* Practical levers, ranked by impact
**1. Incremental verification.** Only re-prove what changed. Maintain a dependency graph of theorems. On code change, mark downstream theorems stale and re-prove only those. For Passepartout's self-modification loop — where most changes are small relative to the full codebase — this is the single biggest win. A theorem dependency graph is cheap to maintain (proved theorems record which axioms or earlier theorems they depend on) and reduces each verify cycle from "full proof search" to "diff search."
**2. ATP oracle (Sledgehammer pattern).** First-order automated theorem provers (E, Vampire, Z3) solve the majority of verification conditions (type safety, memory safety, simple invariants) in milliseconds. The architecture: translate the goal to first-order logic, send to ATPs in parallel, reconstruct the proof in the HOL kernel on success. Isabelle/HOL's Sledgehammer has proven this works at scale. For Passepartout, where most properties are first-order expressible, this alone covers 70-80% of verify calls.
**3. Decision procedures for decidable fragments.** Each decidable fragment gets a dedicated solver:
- Linear arithmetic → Presburger arithmetic or Z3
- Equality with uninterpreted functions → congruence closure
- Propositional logic → SAT solver or BDDs
- Bit vectors → SAT with bit-blasting
These run in polynomial or near-polynomial time. The kernel must either trust the solver's output (verified oracle via reflection) or reconstruct the proof in kernel primitives (LCF-style, slower but fully trusted). Reflection — where the kernel itself runs the decision procedure and certifies the result — eliminates the reconstruction overhead entirely.
**4. Term rewriting DB.** Every proved equality becomes a rewrite rule at no extra cost. The DB is indexed by the outermost function symbol — O(1) lookup by symbol, then pattern match on the term structure. Over a library of thousands of proved theorems, the rewrite engine gets faster without any algorithmic improvement because the rewrite DB covers more cases with each addition.
The rewrite engine is the hottest inner loop in any prover. Its performance depends on pattern matching speed, which is pointer-chasing through a tree — the worst case for modern CPU cache hierarchies. The optimization: store terms in contiguous arrays (a vector of CONS cells, as Lisp already does via its own heap) so the pattern matcher walks linear memory rather than chasing pointers. SBCL's GC compacts automatically, which helps. A dedicated term store in a typed array would be faster.
**5. LLM for search guidance, not correctness.** The LLM suggests the next lemma or tactic (cheap, heuristic, runs on GPU). The kernel verifies the result (expensive, reliable, runs on CPU). This separates the problem: the LLM compensates for the prover's blindness, the prover compensates for the LLM's unreliability. Google's AlphaProof and the Thor project for Coq both work this way.
The LLM's suggestions improve over time as the proof library grows — embedding search over existing proofs finds the closest proved theorem and adapts its structure. This compounds because proved theorems are both the correctness foundation and the search guidance database.
**6. Parallel subgoal dispatch.** Independent subgoals are sent to separate cores. Each runs a full prover instance with its own term store and rewrite DB. No inter-core communication during search. Shared-memory CPU (AMD EPYC, Threadripper) is ideal here — the rewrite DB lives in L3 cache accessible by all cores without copying. A distributed-memory mesh (Tenstorrent P150) fights the term-rewriting hot loop because term trees are pointer-chasing and don't fit per-core SRAM. The proven split: GPU for LLM guidance, many-core shared-memory CPU for symbolic verification.
**7. Proof by analogy / structure reuse.** Closely related theorems share proof structure. Given a new goal, find the nearest proved theorem (embedding similarity on the proof library), instantiate its proof template, try the adapted proof. The LLM is a natural engine for this — it can recognize structural patterns that a syntactic search would miss. The agent says "this new property looks like the one we proved for the social protocol's identity layer" and reuses that proof structure with adjusted parameters.
**8. Cache-friendly data structures.** Pattern matching is pointer-chasing. Modern CPUs are optimized for linear access (prefetchers, wide cache lines). A discrimination tree or path index for rewrite rules — which walks the /rule store/ as a tree rather than the /term/ as a tree — brings the hot path into L1 cache. ACL2's rewrite system is not cache-optimized. A CL implementation using typed arrays for term storage and path-indexed rewrite lookup would see 10-100× speedup on the inner loop, on the same hardware, with no algorithmic change.
* Hardware split
The right hardware division for an LLM-guided prover:
| Component | Hardware | Workload |
|-----------|----------|----------|
| Search guidance | GPU (NVIDIA) | Neural inference — suggest next lemma, tactic, or proof structure |
| Term rewriting | CPU (many-core, shared memory) | Symbolic — pattern matching, substitution, kernel operations |
| Decision procedures | CPU (shared memory) or FPGA | SAT/SMT — propositional and arithmetic solving |
| Parallel dispatch | CPU cores | Embarrassingly parallel — fan out independent subgoals |
The key insight: the GPU is for the LLM that suggests proofs. The CPU is for the prover that verifies them. One is neural and benefits from streaming throughput; the other is symbolic and benefits from shared cache and low-latency pointer access. Collapsing both into the same distributed-memory architecture (Tenstorrent mesh) helps the neural side at the cost of the symbolic side.
* Compounding effect
An LLM-guided prover gets faster over time in a way a conventional prover does not:
1. Every proved theorem adds a rewrite rule → rewrite engine covers more cases
2. Every proved theorem adds an analogy candidate → LLM has more structural templates
3. Every proved theorem adds a ground truth → verified KB grows, reducing future search space
4. The prover's own correctness is used to optimize the prover (self-modification, verified by the prover)
This is not a claim about algorithmic improvement. It is a claim about /coverage compounding/ — the rewrite DB, analogy library, and verified rule set all grow monotonically, and each makes the next proof faster because there is more to build on. The first proof of a new domain is the hardest. The hundredth is nearly free.
* Relationship to CL Modernization
The CL Modernization project's Phase 0 (verified HOL kernel) is the smallest and most constrained instance of this problem — ~500-800 lines, well-specified, does not need most of these optimizations. Phase 4 (self-verifying CL stack) requires all of them, because the compiler verification bootstrapping problem is the hardest instance of proof search that exists. The approaches above are ordered by implementation feasibility within Passepartout's timeline: incremental verification and Sledgehammer integration (months), decision procedures and rewrite DB optimization (months to a year), LLM guidance (available now via existing LLMs, improves with domain-specific fine-tuning), cache-friendly data structures (available now, implementation effort), proof analogy (the hardest, because it requires a mature proof library to draw from).
References:
- [[id:971cd9e7-2cc5-4743-8042-2469dbe4078f][CL Modernization]] — the project that builds the prover
- [[id:84a537b4-4256-50c8-91f5-dd5b4538418f][Verification appliance]] — hardware for verification

View File

@@ -0,0 +1,126 @@
:PROPERTIES:
:CREATED: [2026-05-27 Wed]
:ID: be9bccc7-5adf-4d0d-8ee4-8855892189bf
:END:
#+title: Neurosymbolic Loop Architectures
#+filetags: :passepartout:neurosymbolic:research:verification:architectures:
Taxonomy of loop architectures that combine neural components (LLMs, neural program synthesis) with symbolic components (interpreters, theorem provers, constraint solvers). Each architecture is defined by its answer to a single question: *who verifies whom, and what gets verified?*
* The Six Architectures
**1. Neural-Guided Program Synthesis (DreamCoder et al.)**
Path: generate → execute → filter → train.
The neural network proposes candidate programs. The symbolic executor runs them. The neural network learns from which ones succeeded. The symbolic side is a ground-truth interpreter — a Lisp evaluator, a lambda calculus reducer — but it only answers "does this program execute without error?" It does not answer "is this program correct for any spec." The neural network learns heuristics for what tends to work, but there is no formal correctness guarantee.
Guarantee: empirical (the program ran on the test cases the synthesizer generated).
Bottleneck: search space — too many candidates, too little signal from just "ran without error."
Neural authority: equal to symbolic (both are parts of a loss function).
**2. Neural Theorem Proving (GPT-f, Thor, COPRA, Magnushammer)**
Path: generate tactic → check with prover kernel → backpropagate search guidance.
The neural network proposes proof steps (tactics). The theorem prover kernel (Lean, Isabelle, Metamath) checks them. The neural network learns which tactics work in which proof states. The symbolic side is the final arbiter of correctness — the kernel cannot be wrong.
The constraint: the neural network never generates full programs. It proposes the next single move in a proof that a human already set up. The loop terminates when the kernel accepts.
Guarantee: formal (step-level — the kernel checks each tactic, but the theorem to prove was written by a human).
Bottleneck: tactic prediction quality — the neural net must suggest the right move in a combinatorially large search space.
Neural authority: none over verification — the kernel can reject any tactic. But the neural net has no creative freedom; it fills narrow holes in an already-structured proof.
**3. Differentiable Program Synthesis (τ-MAML, neural Turing machines)**
Path: end-to-end differentiable computation graph.
The program is represented as a continuous computation graph. The symbolic structure is learned jointly with the parameters. There is no separation between neural and symbolic components — they are fused.
Guarantee: none (the "symbolic" structure is a regularizer, not a verifier).
Bottleneck: gradient signal — complex program structure is hard to learn through backpropagation alone.
Neural authority: total (there is no independent symbolic verifier).
**4. LLM + Partial Verifier (ReAct, Reflexion, STaR, self-consistency, Tree-of-Thought)**
Path: generate → check (partial) → retry or halt.
The LLM generates a response. A symbolic layer extracts structured claims. A verifier — often another LLM, sometimes a constraint solver — checks for consistency. If the verifier flags a problem, the LLM retries.
The verifier is partial. It cannot prove the response is correct. It can only detect certain classes of inconsistency (contradictions, type errors, out-of-range values). The user is asked to trust the unverified parts.
Guarantee: partial (some consistency checks pass; no claim of complete correctness).
Bottleneck: LLM cost per loop iteration, and the verifier misses anything it was not programmed to check.
Neural authority: can override the symbolic verifier by producing output the verifier is not designed to catch.
**5. Neural Optimization of Symbolic Programs (DSPy)**
Path: declarative program skeleton → neural optimizer searches prompt/tool-use space → loss function measures output quality → update optimizer.
The symbolic structure (the program skeleton with typed modules) is written by the human. A neural optimizer searches the space of prompts, few-shot examples, and module compositions to minimize a loss function. The goal is program-level optimization, not correctness.
Guarantee: empirical (the loss improved on the evaluation set).
Bottleneck: optimization budget (number of program variants the optimizer can try before the user runs out of patience or tokens).
Neural authority: symmetric with symbolic (both are searchable parameters).
**6. Passepartout: Verified Synthesis Loop**
Path: specification + gates → agent synthesizes implementation → ACL2 prover verifies against spec → gate engine checks policy → counterexample feedback to agent → retry or halt.
Contrast with every other architecture:
| Property | Passepartout | All others |
|----------|-------------|------------|
| Generator scope | Full implementations from spec | Tactics (GPT-f), candidate programs (DreamCoder), responses (ReAct) |
| Verification scope | Complete functional correctness | Step-level (GPT-f), execution-only (DreamCoder), partial consistency (ReAct) |
| Verifier authority | Asymmetric — cannot override | None (DreamCoder), symmetric (DSPy), LLM self-evaluates (Reflexion) |
| Feedback from verifier | Counterexamples with structure | Pass/fail (most), tactic-level error (GPT-f) |
| Gate layer | Independent policy verification | None |
Guarantee: formal (complete — the prover checks the implementation against the full spec for all inputs).
Bottleneck: spec quality (the spec must be complete enough for the prover to work with).
Neural authority: none over verification (the prover is the final authority and cannot be overridden), but the neural component has full creative freedom in how to satisfy the spec.
* The Key Design Dimension: Asymmetric Authority
Every existing loop falls into one of three camps:
**Camp A: No independent verifier.** DreamCoder, differentiable synthesis, DSPy, and pure LLM pipelines have no component that can say "this output is wrong" with authority. The system improves empirically; it never proves correctness. This is the largest camp by publication count.
**Camp B: Constrained generator + complete verifier.** GPT-f, Thor, and COPRA have a theorem prover as the authority, but the generator is constrained to single-tactic moves within a human-written proof skeleton. The verifier covers everything the generator produces — but the generator can only produce a tiny part of the solution. The human provides the creative structure.
**Camp C: Unconstrained generator + partial verifier.** ReAct, Reflexion, and STaR give the LLM broad freedom but use a partial verifier that misses large classes of errors. The verifier's incompleteness means the neural component can always override it by producing output the verifier cannot analyze.
**Passepartout occupies a previously empty position: Camp D — unconstrained generator + complete verifier.** No previous published system gives the neural component complete creative freedom (synthesize the entire implementation) while subjecting its output to complete verification (the prover checks every path against the full spec).
* Why This Position Was Empty
The reasons are historical and economic, not technical:
**The LLM came first, the verifier second.** GPT-f (2021) and Thor (2022) were constrained to tactic-level because the LLMs available at the time could not reliably produce correct code at the module level. DreamCoder (2019) used neural program synthesis, not LLMs, and its generator was too weak to implement a full protocol. The hardware needed to run an LLM powerful enough to synthesize an entire TLS implementation, plus an ACL2 prover to verify it, did not exist in the same price range until roughly 2024.
**The verifier scales differently from the generator.** ACL2 can verify a 50,000-line TLS implementation, but doing so requires the spec to be written in a form the prover can consume — which is a human investment at least as large as writing the implementation. The old cost structure (Gabriel's "correctness is expensive") made the Passepartout loop uneconomical for any practical use case. Only the cost inversion described in [[id:c2789c0f-0955-43af-8a4a-f83ba87128fd][Better is Cheaper]] makes it viable.
**The field prioritized breadth over depth.** DSPy, ReAct, and similar systems optimize for broad applicability across many domains with partial guarantees — the "worse is better" of neurosymbolic design. Passepartout optimizes for deep correctness in a narrower domain. The field has not yet asked "what if we take the complete verification path and give the generator full freedom?" because the components to ask that question did not converge until now.
* What This Means for Research
Passepartout's loop is not patentable as a combination of existing parts — the individual components (LLM, ACL2, gates) are well-known. But the *unoccupied position in the design space* is a research contribution that no paper has described because the loop has only been technically feasible for approximately the last 12-18 months.
The open questions that define the research agenda:
1. **Counterexample bandwidth.** Can the prover produce counterexamples rich enough for the LLM to learn from in a single retry? Prover counterexamples are typically concrete (a specific input where the implementation deviates from spec). The LLM must generalize from one concrete failure to a correct implementation. This is harder for the LLM than what GPT-f faces (the prover tells it exactly which tactic failed and why).
2. **Spec completeness threshold.** How incomplete can a spec be and still produce correct implementations? If the spec is vague (in natural language with formal fragments), the prover cannot check everything, and the loop degrades to ReAct-level partial verification. The research question is the minimum spec density needed to enter the complete-verification regime.
3. **Gate interaction with verification.** What happens when a spec passes the prover but violates a gate (organizational policy)? The gate is a second symbolic layer with different authority — it checks policy, not correctness. Does the gate's failure feedback go to the agent (regenerate), the human (update policy), or both? This interaction has no existing literature because no system has two independent symbolic verification layers.
4. **Specification as bottleneck.** The field has spent decades studying how hard it is to write code. Passepartout replaces "writing code" with "writing specifications." Is specification-writing easier than implementation-writing at the same level of quality? Or does the difficulty just shift from a familiar bottleneck to an unfamiliar one?
These questions are unanswered because the architecture that makes them meaningful — unconstrained generator + complete verifier — did not exist in any published system. Passepartout is not just "another neurosymbolic loop." It occupies a cell in the design space that was previously empty. That is what makes it novel.
* References
- [[id:dddd52a7-adb8-470e-a459-614ade5f76af][Closing the Lisp Gap]] — the context in which this architecture becomes viable
- [[id:c2789c0f-0955-43af-8a4a-f83ba87128fd][Better is Cheaper]] — the cost inversion that makes this loop economical
- [[id:9af13fff-9725-542b-93b1-a555bc74ad72][Lisp Economics]] — why the old cost structure prevented this architecture from being built earlier

View File

@@ -0,0 +1,10 @@
:PROPERTIES:
:CREATED: [2026-06-03 Tue]
:ID: 8cb760e2-37c6-4a78-af4d-f89f69d1678b
:END:
#+title: Stages
#+filetags: :passepartout:architecture:stages:roadmap:
The staged roadmap for Passepartout — from current conventional computing through the full self-improving Lisp machine vision.
{{< page-list >}}

View File

@@ -0,0 +1,79 @@
:PROPERTIES:
:CREATED: [2026-05-31 Sun]
:ID: a1b2c3d4-e5f6-7a8b-9c0d-1e2f3a4b5c6d
:END:
#+title: Server Rack Build — Working Note
#+filetags: :infrastructure:rack:build:
#+STATUS: draft
* Overview
Building out a 10-20U open rack, server-grade components bought individually over months. This is the first racked node — triple duty as Passepartout host, Proxmox home server, and ZFS array. Node-1 (Protectli, i7, 6 NICs) stays as network edge.
Already have 10Gb networking, that's stable.
* Current topology
- **Node-1 (Protectli)**: Small form factor, i7, 6 NICs, no PCIe, no GPU, limited RAM. Network appliance / router.
- **Node-2 (racked)**: First rack server. Passepartout + Proxmox + ZFS + GPU for local Hermes inference.
* Chassis
- 3U or 4U rackmount
- Room for full-height GPU, hot-swap drive bays, sufficient airflow
- Open rack design, 10-20U growable
* Platform decision (TBD)
| Option | Pros | Cons |
|--------|------|------|
| Intel Xeon 6 (Granite Rapids) | Newest arch, 12-ch DDR5, 136 PCIe 5.0 lanes, AMX AI accelerators | LGA 4710 (new socket, new mobo cost), DDR5 only, expensive |
| AMD EPYC 7002 (Rome) | 128 PCIe 4.0 lanes, 8-ch DDR4, cheap on used market | Older gen, DDR4 (slower, but cheap), no AMX |
| AMD EPYC 9004/9005 (Genoa/Turin) | 160 PCIe 5.0 lanes, 12-ch DDR5 | More expensive than 7002, but current gen |
* GPU decision (TBD)
Local inference for Hermes. Candidates:
| Option | VRAM | Price | Notes |
|--------|------|-------|-------|
| Intel Arc Pro B70 | 32 GB GDDR6 | ~$949 MSRP | Battlemage workstation, air-cooled, 230W, PCIe 5.0 x16. Plug-and-play with standard toolchains. |
| Tenstorrent P150 (Blackhole) | 32 GB GDDR6 | ~$1,399 | RISC-V Tensix, open source stack, 300W. Software less mature, needs tt-forge compilation. 4x QSFP-DD for linking cards. |
| RTX 5090 | 32 GB GDDR7 | ~$2,000 | CUDA, best software ecosystem. Consumer card, may need blower mod for rack. |
| RTX 6000 Ada (used) | 48 GB GDDR6 | ~$4-5K used | More VRAM, enterprise. Higher price even used. |
Key consideration on P150: not CUDA, not a GPU in the conventional sense. Software maturity is the main cost, not the hardware price.
* Memory plan
Start with 2×64GB DDR5 ECC RDIMM, grow to 4×64GB → 8×64GB (full 512GB on 8-channel; or 384GB on 12-channel).
Tradeoff: running fewer DIMMs than full channel count reduces memory bandwidth proportionally. 2 DIMMs on 8-channel = 25% bandwidth. First to suffer: ZFS ARC performance, VM responsiveness. Compute (LLM inference) is fine since GPU has own VRAM.
Alternative: start with 4×64GB to get half bandwidth without crippling storage I/O, then grow to 8×64GB.
* Build order (over months)
1. Rack + chassis + PSU
2. Motherboard + CPU + RAM + boot drives (runs Proxmox + ZFS immediately)
3. HDDs for ZFS array (start with 2, grow)
4. GPU (last piece — when inference workload justifies it)
* Questions still open
- Intel Xeon 6 vs AMD EPYC (which gen)?
- DDR4 (EPYC 7002) vs DDR5 (everything else)?
- GPU: Intel Arc Pro B70 vs Tenstorrent P150 vs RTX 5090?
- Start with 2×64GB or 4×64GB on memory?
- Water cooling for CPU (Xeon 6 TDP may need it) or just air?
- Specific rack model / chassis model?
* Strategic framing
This node is a bootstrap between Stage 0 (current, conventional) and Stages 3-4 (Lisp machine, bare-metal, in-process LLM on dedicated silicon). DDR4's bandwidth ceiling won't matter because:
- Proxmox + ZFS + the Gate (Stage 2) don't stress 8-channel DDR4-3200
- GPU inference uses its own VRAM, not system memory
- By the time the Lisp machine arrives (different hardware entirely), this node graduates to NAS / Proxmox host duty
Part availability risk is acceptable — at 7+ years of life, the build has already paid for itself many times over, and a motherboard failure means re-platforming onto whatever is current, not trying to resurrect DDR4 infrastructure.

View File

@@ -0,0 +1,10 @@
:PROPERTIES:
:CREATED: [2026-06-03 Tue]
:ID: 6883c4d2-b63b-4d2b-b224-2240ae748e7f
:END:
#+title: AI Agent Scoping
#+filetags: :passepartout:strategy:competitive:ai-agents:
Competitive analysis of AI coding agents and assistants — Aider, Claude Code, Codex CLI, Continue, Gemini CLI, Hermes Agent, OpenClaw, OpenCode, and Thoth.
{{< page-list >}}

View File

@@ -1,7 +1,12 @@
#+title: Compliance
#+filetags: :compliance:index:
:PROPERTIES:
:CREATED: [2026-05-24 Sun]
:ID: 1c4c91ec-c465-44ab-bd91-4c3b45909ddb
:END:
:CREATED: [2026-05-23 Sat]
:ID: 7c4c5cca-1c63-4398-9b75-cf221e77dba0
:ID: 36e5b948-e07b-477f-9036-4dfe88254347
:ID: e4a7b3d2-1c9f-4b6e-8a2d-5f3c7e1b9a0c
:END:
#+title: Compliance
#+filetags: :passepartout:compliance:regulatory:
Compliance framework mapping across global regulatory regimes — GDPR, HIPAA, SOC 2, EU AI Act, and more. The main framework map, index, and cross-cutting analyses live here; detailed per-regime pages are in compliance-regimes/.
{{< page-list >}}

View File

@@ -0,0 +1,7 @@
#+title: Compliance
#+filetags: :compliance:index:
:PROPERTIES:
:CREATED: [2026-05-24 Sun]
:ID: 1c4c91ec-c465-44ab-bd91-4c3b45909ddb
:END:

View File

@@ -0,0 +1,10 @@
:PROPERTIES:
:CREATED: [2026-06-03 Tue]
:ID: 96e7a54e-d801-4b6e-bdc9-ea9dbd4fa51d
:END:
#+title: Impact Analysis
#+filetags: :passepartout:strategy:impact:adoption:
Impact assessments for each phase of Passepartout's development — what changes at each stage, for whom, and at what scale.
{{< page-list >}}

View File

@@ -1,29 +0,0 @@
:PROPERTIES:
:CREATED: [2026-05-24 Sun]
:ID: 9af13fff-9725-542b-93b1-a555bc74ad72
:ID: 0b5a8a74-cfd6-542d-bc88-4eb3cd8626f9
:END:
#+title: Lisp Economics
#+filetags: :passepartout:economics:lisp:history:C:viability:cost:marginal:zero:
The 1980s trade-off was: C is cheap enough for the market. Correctness is a luxury the market cannot afford. The 2020s trade-off is: C is expensive for the market. Incorrectness has become the dominant cost of software. Lisp's verification infrastructure is now the cheaper option.
Four transformations flipped the economics:
1. **Memory is free.** 40MB runtime is noise on a $20 Raspberry Pi with 8GB RAM. In 1980, DRAM was ~$5,000/MB.
2. **Transistors are free.** Modern ARM Cortex-A72 has billions of transistors. GC and type dispatch cost nothing because the transistors are there whether used or not.
3. **Complexity saturates human verification.** Systems are tens of millions of lines. Testing is necessary but insufficient — zero-day vulnerabilities prove bugs survive all testing. Formal verification is the only known path.
4. **Cost of failure exceeds cost of verification.** A single breach costs millions. Regulation mandates provable compliance. Proving correctness is cheaper than not proving it.
The [[id:84a537b4-4256-50c8-91f5-dd5b4538418f][verification appliance]] (AGPL symbolic engine + RISC-V Lisp μcode on FPGA) costs $5,000/year and replaces $500,000/year in compliance audits, breach litigation, and regulatory fines. This cost structure — zero marginal cost per additional user — is what makes Lisp economically viable at scale. The [[id:13e6ae54-2d24-5aa0-b1cd-a7e8e749aa70][self-driving Lisp Machine]] is the hardware endpoint of this economic logic. For the biological analogy that explains why Lisp architecture is a natural outcome of complexity pressure, see [[id:2afd9a3c-e96a-54c7-ac77-a05a28065b4b][biology parallels]]. For the historical precedent, see the [[id:00ab3a4d-e3de-5605-a67d-12935bb36ab5][comparison with Symbolics Genera]]. The [[id:5f55bbe6-d243-5766-8ccf-5c5cc88a6542][impact on the AI industry]] is the market-side consequence.
* Cost Structure — Zero Marginal Cost
- **One-time cost:** [[id:45ea493b-94ad-5885-aa65-0c846e5c3c1d][gate-rule encoding]] for a domain (from hours for codified domains up to months for tacit domains)
- **Near-zero marginal cost:** ACL2 proof + Screamer consistency check + VivaceGraph lookup per interaction — all CPU-native, all in-image
- **No recurring LLM API costs** for the 80% symbolic reasoning layer
- **After [[id:efc76898-03f7-57ba-923d-35d65da88bb7][sufficiency flip]]:** pennies per day vs dollars per day for LLM-only
The cost curve inverts: generation is expensive, verification is cheap. This is the inversion [[id:28c46769-c14b-42aa-ac7a-69d310157f8f][Passepartout]] exploits.
Token demand shifts from "every interaction burns tokens" to "only unfamiliar interactions burn tokens." Steady-state per-user LLM consumption drops by an order of magnitude.

View File

@@ -0,0 +1,10 @@
:PROPERTIES:
:CREATED: [2026-06-03 Tue]
:ID: 75bc14db-d241-4af8-a4e0-f14aff654d17
:END:
#+title: Social Protocol
#+filetags: :passepartout:social-protocol:network:identity:
The Passepartout Social Protocol — identity, contracts, governance, and exchange mechanisms for the personal intelligence network.
{{< page-list >}}

View File

@@ -0,0 +1,10 @@
:PROPERTIES:
:CREATED: [2026-06-03 Tue]
:ID: 89909ac6-8b4f-4a60-bbeb-52992c8f5135
:END:
#+title: Verification
#+filetags: :passepartout:verification:prover:ACL2:HOL:sufficiency:
Verification infrastructure — the sufficiency flip, verification appliance, verification monopoly, and the verified skill marketplace.
{{< page-list >}}