12 KiB
Who Is Closest to Passepartout?
A survey of academic researchers whose work overlaps with Passepartout's architecture along specific dimensions. The conclusion: no academic group combines all four architectural properties that define Passepartout's design. The closest groups each hold one or two pieces; none combine all.
The Four Architectural Properties
- LLM-level generator with full creative freedom — the generator synthesizes entire implementations from specifications, not individual tactic steps or hole-fillings.
- Theorem-prover verification with complete functional correctness — the verifier checks all execution paths against the full spec, not bounded verification via SMT solvers.
- Asymmetric authority — the symbolic component (prover) is the final authority and cannot be overridden by the neural component.
- Counterexample-guided retry loop — when the prover rejects an implementation, it returns a concrete counterexample that the generator uses to reformulate.
The Academic Landscape
LLM + Theorem Prover Loops
| Researcher | Institution | System | Match | Divergence |
|---|---|---|---|---|
| Sean Welleck | CMU | ImProver 2 | Self-improving LMs generating proof steps verified by Lean | Generator fills tactic holes in existing proofs, not full implementations. Camp B. |
| Timon Gehr | ETH | COPRA, Thor | LLM interacts with theorem prover kernel | Same constraint: tactic-level. Neural component generates one move at a time. |
| Kaiyu Yang | Princeton | LINC | Neural network learns symbolic rules, prover checks consistency | Neural component is a learner discovering rules from data, not a generator synthesizing from spec. Different abstraction level. |
All three are Camp B in the loop taxonomy (constrained generator + complete verifier). None gives the LLM freedom to synthesize full implementations. Welleck's ImProver is the closest in spirit — the loop iterates, the prover is authoritative — but the scope of what the generator produces is orders of magnitude smaller than what Passepartout's design requires.
Synthesis + Verification (non-LLM)
| Researcher | Institution | System | Match | Divergence |
|---|---|---|---|---|
| Armando Solar-Lezama | MIT | Sketch | Synthesis-aided verification: partial program → solver fills holes → assertions checked | Generator is constraint-based SAT/SMT, not an LLM. Verification is bounded (solver capacity). |
| Emina Torlak | UW | Rosette | Solver-aided languages for synthesis + verification | Same constraints as Sketch. Bounded, non-LLM. |
| Swarat Chaudhuri | UT Austin | Neurosymbolic Programming | Neural networks guide program synthesis, symbolic analysis verifies | Uses SMT for bounded verification, not theorem prover for complete. Neural-symbolic are symmetric collaborators, not asymmetric authority. |
Chaudhuri is the closest overall academic neighbor. His group explicitly works on combining neural and symbolic components, with symbolic verification of neural-generated candidates. But the verification is bounded (SMT), not complete (theorem prover), and the loop does not have Passepartout's asymmetric authority design.
Lisp as Infrastructure for Verification
| Researcher | Institution | System | Match | Divergence |
|---|---|---|---|---|
| Christian Schafmeister | Temple | Clasp | Common Lisp through LLVM; interactive Lisp for serious computation | Lisp infrastructure, not a neurosymbolic loop. No ACL2 integration. |
| Kaufmann, Moore | UT Austin / Retired | ACL2 | The theorem prover itself | Pure symbolic verification. No LLM loop. |
Schafmeister is aligned with Passepartout on the "why Lisp" question — interactive development, uniform representation, C++ interop for performance — but does not work on agentic verification loops.
Autonomous Code Modification Loops
| Researcher | Institution | System | Match | Divergence |
|---|---|---|---|---|
| Kevin Ellis | Cornell | DreamCoder | Neural program synthesis loop: generate → execute → learn | Verifier is interpreter (does it run?), not prover (is it correct?). Camp A. |
| Andrej Karpathy | Anthropic | autoresearch | LLM modifies code, runs experiments, keeps/discards based on metric | Verifier is val_bpb — a single empirical number. No specification, no formal guarantee. Camp C. |
Both prove the viability of the autonomous loop concept but use the weakest possible verifiers (execution and empirical metrics).
The Bitter Lesson / Temporal Credit Assignment (Sutton)
| Researcher | Institution | System | Match | Divergence |
|---|---|---|---|---|
| Richard Sutton | Alberta / Keen Technologies | TD learning, eligibility traces, Alberta Plan | The fundamental problem in verification — an action was checked, but the consequence plays out hours later; was the action correct? — is the same problem TD learning solves in RL: assigning credit to actions based on delayed outcomes. Sutton's temporal credit assignment work is the theory you would need to extend Passepartout from per-action gates to trajectory-level verification. His Bitter Lesson (scale beats engineered knowledge at sufficient compute) is the most commonly cited argument against the symbolic verification approach Passepartout bets on. | The Bitter Lesson is not anti-knowledge — it says methods that improve with more computation eventually dominate. Passepartout's gate is a deliberately small engineered knowledge system that won't benefit from more compute (the ACL2 lemmas don't get more correct with more hardware). That's acceptable because the gate is a narrow bottleneck (permit/deny). The LLM layer inside the gate does benefit from scale. The architecture already respects the Bitter Lesson by placing the scalable piece where scale helps and the non-scalable piece where deductive certainty matters. Sutton's Alberta Plan (world model + reward + learning algorithm) parallels Passepartout's Stage 6 (world model + gate + verified fine-tuning), but Sutton's agents learn by pure reward while Passepartout's learn by reward constrained by verified policy. Sutton would likely argue that a learned safety policy at scale would outcompete the gate. Passepartout's bet is that access control, message authentication, and compliance should never be probabilistic, even at infinite scale. |
Integrate-Symbolic-Into-Neural (Garcez)
| Researcher | Institution | System | Match | Divergence |
|---|---|---|---|---|
| Artur d'Avila Garcez | City, Univ. of London | NeSy frameworks, NSL | Pioneer of neural-symbolic computation since 1990s. Book: "Neural-Symbolic Cognitive Reasoning." Runs NeSy workshop series. | His approach integrates symbolic knowledge into neural networks (logic regularization, knowledge distillation). Symbolic rules are a training signal, not a runtime verifier. The neural component can override symbolic constraints through the loss landscape. No asymmetric authority, no theorem prover, no complete verification. His camp is "make neural networks behave more symbolically." Passepartout's camp is "make neural networks accountable to symbolic verification." Opposite architectural philosophy. |
Garcez's position in the design space is closest to Camp A (no independent verifier). The symbolic rules guide learning but do not veto outputs at runtime. His work is foundational for the field of neural-symbolic computation, but his architectural philosophy is the inverse of Passepartout's. He wants the symbolic inside the neural. Passepartout wants them separate with the symbolic holding authority.
Theorist of the Hybrid Thesis (Marcus)
| Researcher | Institution | System | Match | Divergence |
|---|---|---|---|---|
| Gary Marcus | NYU Emeritus, Robust.AI | None (theorist/critic) | Longest-standing public advocate for hybrid AI. Since "The Algebraic Mind" (2001) and "Rebooting AI" (2019), he has argued deep learning alone cannot achieve systematicity, composition, or reasoning. He identified the need for the approach Passepartout implements. As of May 2026, he is publicly asking why LLM agent frameworks are not using LEAN as a theorem-prover verifier — the same engineering gap Passepartout occupies. | He does not propose a specific architecture or loop design. His background is cognitive science and developmental psychology, not formal verification. The "symbolic component" he advocates is abstract — structured knowledge representations, not ACL2 theorem proving. He has no answer to the cost/feasibility question (the "Better is Cheaper" argument is Passepartout's contribution, not Marcus's). He is a theorist of the problem, not an architect of the solution — though his May 2026 tweet shows he is now engaging with the engineering question directly. |
Marcus occupies a category that does not appear in the loop taxonomy (Camps A-D) because he does not define a loop. He identifies the need for hybrid AI with genuine symbolic authority. Passepartout is the engineering response to the thesis Marcus has been arguing since before most of the field would admit the limitations existed. His May 2026 tweet asking "they aren't using LEAN in one of those many tools?" is the theorist noticing the empty cell Passepartout was designed to fill.
The Gap
| Property | Passepartout | Closest academic | Academic's limiter |
|---|---|---|---|
| Generator freedom | Full synthesis from spec | ImProver (Welleck) | Fills tactic holes only |
| Verification completeness | Complete (theorem prover) | Sketch (Solar-Lezama) | Bounded (SMT) |
| Asymmetric authority | Neural cannot override prover | Neurosymbolic Prog (Chaudhuri) | Symmetric collaboration |
| Counterexample feedback | Structured from prover to LLM | ImProver (Welleck) | Pass/fail at tactic level |
| Two symbolic layers | Gates + prover independent | None | No second layer exists |
No academic group combines all four properties. The closest — Chaudhuri — has three of five (neural + symbolic + verification) but fails on verification completeness (SMT not ACL2), asymmetric authority (symmetric not asymmetric), and the two-layer gate design.
What This Means
The gap is either:
- A genuinely empty cell in the design space. The combination is novel, the components have not converged in one system before, and Passepartout's design is early.
- A sign that the combination is not as valuable as the components. No major academic lab has invested in this specific loop because the cost of writing complete formal specifications exceeds the benefit of complete formal verification, given the alternative of bounded verification (SMT) with cheaper spec costs.
The way to distinguish (1) from (2) is to build the architecture and measure whether the spec-writing cost is amortized over enough synthesized implementations to justify it. Passepartout's answer is: yes, because specs are written once and implementations are generated for every deployment context. The academic literature has not tested this claim.
References
- Neurosymbolic Loop Architectures — the taxonomy that positions these comparisons
- Neurosymbolic AI Paper Library — papers referenced above are in the local library