- LLM proposes code at every bootstrap stage (microcode, CIC kernel, macro layers, gate rules) — symbolic engine verifies before accepting - Weak model = more retries (5-15), strong model = fewer (1-3) Both produce 100% verified output because the symbolic engine catches all mistakes - The critical transition: not better LLMs, but the sufficiency flip applied to hardware. Once enough facts about runtime behavior accumulate, the system proposes microcode optimizations with zero LLM tokens. - Surprise result: a barely competent LLM is sufficient for the full bootstrapping chain. It's slower and costs more in API calls, but reaches the same destination.
941 lines
45 KiB
Org Mode
941 lines
45 KiB
Org Mode
#+TITLE: Passepartout — Patents, Moats, Economics, Design Implications
|
|
#+AUTHOR: Hermes agent distillation of 2026-05-21 discussion with Amr
|
|
#+FILETAGS: :passepartout:agent:economics:ip:licensing:
|
|
#+STARTUP: content
|
|
|
|
* Summary
|
|
|
|
Discussion about the economic and strategic implications of Passepartout's
|
|
architecture — a self-bootstrapping agent that combines deterministic safety
|
|
gates (0 LLM tokens per verification), Merkle-tree memory with provenance,
|
|
a symbolic fact store with sufficiency criterion, and ACL2-based macro layer
|
|
bootstrapping for provable reasoning.
|
|
|
|
The central claim: this architecture decouples intelligence from LLM API
|
|
consumption. The probabilistic engine (LLM) handles ~10% input/output
|
|
translation; the symbolic engine handles ~80% of reasoning at near-zero
|
|
marginal cost. The cost curve inverts: generation is expensive, verification
|
|
is cheap.
|
|
|
|
* Patentability
|
|
|
|
** Likely patentable
|
|
|
|
- **Probabilistic-deterministic split with deterministic gates between LLM
|
|
proposal and execution.** The LLM proposes, the gate stack decides. Each
|
|
gate is a pure Lisp function costing 0 LLM tokens. Every competitor uses
|
|
prompt-based guardrails. The specific 11-vector gate stack (secret
|
|
exposure, path protection, self-build boundary, shell safety, network
|
|
exfiltration, privacy tags, Lisp syntax, credential vault, tool permissions,
|
|
policy, protocol validation) is a specific novel implementation.
|
|
|
|
- **Foveal-peripheral context model with Org-tree structured retrieval.**
|
|
Depth ≤ 2 always; full render on foveal node; full render on semantic
|
|
similarity to foveal; full render on temporal relevance (modified today,
|
|
upcoming deadlines); everything else title-only. Targets 2,000-4,000 tokens.
|
|
No agent does this.
|
|
|
|
- **Merkle-tree memory with copy-on-write snapshots and operation-level
|
|
undo/redo.** Every memory-object is content-addressed. Snapshots are
|
|
deep-copies. Undo/redo at the individual operation level. Applied to an
|
|
agent's reasoning loop.
|
|
|
|
- **Gate-to-fact bootstrap with sufficiency criterion.** Mechanically
|
|
extracting facts from the gate stack's own data structures (protected paths,
|
|
shell blocked patterns, network whitelist) as the seed of an ontology. A
|
|
measurable sufficiency threshold that flips the system from LLM-proposes
|
|
to Screamer-deduces.
|
|
|
|
- **Macro-layer-as-skill bootstrapping architecture.** Encoding theorem-proving
|
|
capability as hot-reloadable skills where each layer is verified by the layer
|
|
below. The proof forest is a Merkle-versioned dependency tree.
|
|
|
|
** Likely not patentable (known techniques in expected applications)
|
|
|
|
- ACL2 itself (decades old)
|
|
- Screamer for consistency checking (constraint solving on a triple store is
|
|
an obvious application)
|
|
- Hot-reloadable skills (Lisp images have been hot-reloadable for 40 years)
|
|
- Org-mode as a data format
|
|
- Multi-layer signal authentication (known in network security)
|
|
|
|
** Counterargument from prior art
|
|
|
|
A patent examiner will argue that:
|
|
- "Thin harness, fat skills" is the standard OS microkernel architecture
|
|
applied to an AI agent
|
|
- Foveal-peripheral context is locality of reference (standard in OS design)
|
|
- Merkle-tree memory is content-addressed storage (standard in distributed
|
|
systems)
|
|
- Deterministic gate stack is capability-based security (going back to
|
|
KeyKOS in the 1980s)
|
|
|
|
The defense: these principles have never been *combined* in an AI agent, and
|
|
the combination produces emergent effects (cost curve inversion, sufficiency
|
|
flip, self-repairing bootstrapping chain) that no single principle produces
|
|
alone. Good patent claims would cover the specific combination, not the
|
|
individual components.
|
|
|
|
** Strongest single claim
|
|
|
|
An AI agent system comprising:
|
|
1. A probabilistic language model
|
|
2. A stack of deterministic safety gates operating at zero LLM-token cost
|
|
between the model's proposal and execution
|
|
3. A Merkle-versioned memory store from which gate outcomes are mechanically
|
|
extracted as facts
|
|
4. A symbolic reasoning engine seeded by those facts with a measurable
|
|
sufficiency criterion that determines when the probabilistic model can
|
|
be bypassed
|
|
|
|
Each element is known. The combination is novel and non-obvious.
|
|
|
|
* Licensing Strategy
|
|
|
|
** AGPLv3 for the public repository
|
|
|
|
AGPLv3 closes the ASP loophole (Section 13): anyone who modifies the
|
|
software and offers it over a network must release their modified source.
|
|
This protects against proprietary forks that extract value without
|
|
contributing back.
|
|
|
|
Crucially: AGPL is a *product requirement*, not a concession to openness.
|
|
The system's value proposition is provable correctness — every decision has
|
|
Merkle provenance, the proof forest is visible, the sufficiency meter is
|
|
readable. This claim is structurally incredible with closed source. An
|
|
enterprise buyer needs to inspect the gate stack, verify the Merkle
|
|
implementation, and confirm ACL2 integration is sound. AGPL makes this
|
|
possible without signing an NDA.
|
|
|
|
** AGPL only covers modifications to code, not:
|
|
|
|
- Gate rules specific to a domain (these are data, not code)
|
|
- The fact store (empirical data generated from usage)
|
|
- Ontology categories (design decisions stored as configuration)
|
|
- Proprietary skills loaded at runtime (AGPL boundary on plugin systems
|
|
is legally unsettled)
|
|
|
|
** Dual license model
|
|
|
|
- AGPLv3 for open source — builds ecosystem, trust, and community
|
|
- Commercial license for enterprises that cannot accept AGPL (blanket
|
|
policies against AGPL infection) — MySQL/SugarCRM/GraphQL model
|
|
|
|
* Moats
|
|
|
|
** Re-evaluated: time is not the primary moat
|
|
|
|
Initial assumption: the bootstrapping chain (gate outcomes → facts →
|
|
Screamer rules → ACL2 theorems → macro layers) takes months to build,
|
|
giving first-mover advantage.
|
|
|
|
Challenge: a Phase 4+ Passepartout fed on Wikipedia + Wikidata can build
|
|
a general ontology in two weeks. Entity resolution is batch work. Structural
|
|
consistency verification is minutes. The organic growth advantage collapses
|
|
for general knowledge.
|
|
|
|
** Actual moats (weaker than initially assumed)
|
|
|
|
1. **Domain-specific gate rules** — thin. A few hundred lines of Lisp data
|
|
encoding deployment-specific path patterns, shell safety rules, and
|
|
volume layouts. Write once, trivial to copy. Not a real moat.
|
|
|
|
2. **Empirical decision history** — every HITL decision is a Merkle fact.
|
|
"On date T, user approved action X under context Y." A fresh instance
|
|
has none of this. Makes *your* instance more valuable but doesn't
|
|
prevent competition — it's a switching cost, not a barrier to entry.
|
|
|
|
3. **Evaluation harness (regression suite)** — thousands of test cases
|
|
accumulated from every bug fix. Cannot be ingested from public data.
|
|
Built only by using the system, breaking it, fixing it, and adding a
|
|
test. Strongest residual moat, but even this can be partially
|
|
compressed through public benchmarks (SWE-bench, etc.).
|
|
|
|
4. **Infrastructure integration** — the specific Docker compose layouts,
|
|
Traefik router patterns, Authentik provider configurations, backup
|
|
policies encoded as gate rules over months of use. A competitor's
|
|
infrastructure is different; their generic Passepartout does not know
|
|
your topology.
|
|
|
|
** Strongest competitor strategy
|
|
|
|
Not copying your gate rules — offering the same architecture as a service
|
|
with their own pre-seeded general knowledge, a generic safety baseline,
|
|
and a consulting engagement to customize gate rules for each customer.
|
|
The AGPL prevents closing the architecture but does not prevent offering
|
|
it as a service with a customization layer.
|
|
|
|
** The defensible business is services, not product
|
|
|
|
The defensible entity is "the organization that best understands how to
|
|
adapt Passepartout to your domain" — not "the organization that owns
|
|
Passepartout." The Lisp Machine appliance (hardware + certification) and
|
|
evaluation harness certification service are the closest thing to product
|
|
defensibility.
|
|
|
|
* Economics and Monetization
|
|
|
|
** Cost structure
|
|
|
|
- One-time cost: gate-rule encoding for a domain (from hours for codified
|
|
domains — FAR, HIPAA, ISO standards — up to months for tacit domains)
|
|
- The LLM translates codified rules directly: ingest regulation → produce
|
|
gate rule plist → ACL2 verifies consistency → human reviews. This is
|
|
translation, not reasoning.
|
|
- For non-codified knowledge (craft expertise, organizational culture):
|
|
Phase 3 archivist loop over time
|
|
- Near-zero marginal cost: ACL2 proof + Screamer consistency check +
|
|
VivaceGraph lookup per interaction — all CPU-native, all in-image
|
|
- No recurring LLM API costs for the 80% symbolic reasoning layer
|
|
- After sufficiency flip: pennies per day vs dollars per day for LLM-only
|
|
|
|
** Revenue models by field
|
|
|
|
| Field | Why Passepartout | Revenue Model |
|
|
|-------+------------------+---------------|
|
|
| Industrial infrastructure (refineries, power grids, manufacturing) | Offline operation, provably safe, near-zero marginal cost, mandatory audit trail | Lisp Machine appliance + SCADA certification package |
|
|
| Healthcare administration (billing, claims, prior authorization) | Rule-heavy domain, privacy-mandated, audit-driven, high per-transaction cost today | Subscription for regulatory gate packages (CPT/ICD-10/HIPAA rules), updated when CMS publishes new rules |
|
|
| Software supply chain (CI/CD security, SBOM verification) | First-order structural verification — ACL2 is natural fit, CI/CD pipeline is already a sequence of gate-checkable steps | Evaluation harness as certification service — "run our 10,000-task suite and get a provable score" |
|
|
| Regulatory compliance (GDPR, SOC2, SOX, GxP) | Rule-completeness, active enforcement (not document-based), provable audit trail | Subscription for regulation-specific gate packages — GDPR package, SOC2 package, FedRAMP package, updated when regulations change |
|
|
| Defense and classified environments | Air-gapped operation, classification-level gate rules, Merkle provenance is court-admissible evidence | Government contract + hardened appliance with hardware root of trust |
|
|
|
|
** Critical insight: encoding cost drops to near-zero for codified domains **
|
|
|
|
Laws, regulations, standards, procedures, and technical specifications are
|
|
already written down in structured text. The LLM does not need to *reason*
|
|
about them — it needs to *translate* them into gate rules and ACL2 theorems.
|
|
|
|
Example: The US Federal Acquisition Regulation (FAR) is ~2,000 pages of
|
|
"thou shalt" and "thou shalt not" statements. A frontier LLM can ingest
|
|
the FAR and produce a plist of gate rules:
|
|
- (if contract > $250K AND not small-business-set-aside → :deny)
|
|
- (if sole-source AND no justification-documented → :deny, produce-justification)
|
|
|
|
ACL2 then verifies the rule set for internal consistency (Phase 6). Screamer
|
|
checks against existing compliance facts. The human reviews the bootstrap
|
|
output and approves or corrects individual rules.
|
|
|
|
The key distinction: the LLM is not *extracting knowledge from prose* in the
|
|
way Phase 3 archivist does (which is open-ended, noisy, requires grounding).
|
|
It is *translating a known rule system into a formal representation* — a
|
|
mechanical transformation of structured text into structured rules. The
|
|
result is not "the LLM's best guess at the rules" but "the rule set as
|
|
stated in the source document, mechanically transcribed."
|
|
|
|
For domains where the knowledge is codified as text, the gate-rule encoding
|
|
time drops from weeks to hours. The only bottleneck is human review of the
|
|
output — and the system can assist here by surfacing contradictions for
|
|
resolution rather than requiring a full line-by-line audit.
|
|
|
|
** What can actually be monetized (TLDR)
|
|
|
|
1. **Pre-loaded bootstrapping chains for specific verticals** — domain gate
|
|
rules, pre-seeded fact stores, mature proof forests. Saves the buyer
|
|
months of bootstrapping. Distributed as data packages under commercial
|
|
license, not AGPL.
|
|
|
|
2. **Evaluation harness as certification service** — "Bring your agent,
|
|
we'll run it through our suite and give a Merkle-verified score."
|
|
The regression suite grows with every deployment; a competitor's
|
|
regression suite starts empty.
|
|
|
|
3. **Hardened Lisp Machine appliance** — RISC-V soft-core with Lisp
|
|
microcode, pre-loaded mature Passepartout, certified for specific
|
|
verticals (IEC 62443 for industrial, HIPAA for healthcare). Value is
|
|
in integration and certification, not the AGPL software.
|
|
|
|
4. **Verified skill marketplace** — marketplace where skills are verified
|
|
(sandbox + ACL2 non-contradiction proof) before listing. Marketplace
|
|
takes a cut. Value is in the verification infrastructure, not the
|
|
skills themselves.
|
|
|
|
5. **Support and consulting** — the Red Hat model. AGPL code is free;
|
|
training, custom gate rules, ontology design, and emergency support
|
|
are paid.
|
|
|
|
* Design and Architectural Implications
|
|
|
|
** The self-improving system
|
|
|
|
Passepartout bootstraps two feedback loops:
|
|
|
|
- **Empirical loop:** gate outcomes → facts → Screamer-verified patterns →
|
|
sufficiency flip → auto-extraction. Knowledge grows without the LLM
|
|
touching most of it.
|
|
|
|
- **Logical loop:** ACL2 theorems → macro layers (generators, metafunctions,
|
|
induction DSL, abstract theories) → richer proof strategies → better
|
|
verification. Reasoning capacity grows without changing the prover binary.
|
|
|
|
These loops intersect at the fact store: proven theorems become facts, richer
|
|
facts generate better proof strategies, better strategies verify more facts.
|
|
The system upgrades itself.
|
|
|
|
** The 10-80-10 becomes approximately true
|
|
|
|
- 10%: LLM handles input translation (natural language → structured goal)
|
|
and output formatting (structured result → natural language)
|
|
- 80%: Symbolic engine handles reasoning — Screamer plans, ACL2 verifies,
|
|
VivaceGraph retrieves facts. Zero LLM tokens.
|
|
- The cost curve inverts: verification is cheaper than generation.
|
|
|
|
** Key implications
|
|
|
|
1. **Verification becomes cheaper than generation.** Once macro layers are
|
|
mature, proving a new rule non-contradictory costs near-zero. The LLM
|
|
proposes; the symbolic engine accepts or rejects.
|
|
|
|
2. **Trust scales with use.** Every interaction produces a structurally
|
|
verified outcome. Non-lossy fact base grows. Proof forest thickens. An
|
|
auditor can inspect the Merkle tree of gate outcomes and trace any
|
|
decision to its root theorem.
|
|
|
|
3. **Degradation is reversible.** Every proof layer is a hot-reloadable
|
|
skill. Every fact has provenance. A bad metafunction is unloaded;
|
|
theorems proven under it are flagged for re-verification; the fact
|
|
store retains the pre-upgrade ontology version.
|
|
|
|
4. **The system can diagnose its own logical frontier.** If ACL2 keeps
|
|
failing on a class of properties, and the failure mode is structural
|
|
(not solvable by more macros), the fact store accumulates a pattern:
|
|
"These N properties are first-order inexpressible." This signals the
|
|
human: the system needs a CIC prover (dependent types) for this domain.
|
|
The system cannot transcend its logic without external intervention —
|
|
but it can surface the boundary precisely.
|
|
|
|
** The Lisp Machine endpoint
|
|
|
|
If the system designs and builds itself on Lisp Machine hardware:
|
|
- The same system that proves theorems also optimizes the microcode
|
|
- No OS boundary, no driver layer — system and proof environment are one
|
|
- A RISC-V soft-core with Lisp microcode is manufacturable at older fab
|
|
nodes (28nm, 45nm) — sovereign intelligence without GPU supply chains
|
|
|
|
** Social implications
|
|
|
|
- **Concentration of reasoning.** The macro layers become opaque to anyone
|
|
who doesn't understand the bootstrapping history. The system understands
|
|
its own reasoning better than its users do.
|
|
|
|
- **Cost advantage widens inequality asymmetrically.** The first instance
|
|
to reach maturity requires significant gate-rule design (from hours for
|
|
codified domains to months for tacit ones). After that, replication is
|
|
cheap. Organizations that invest early have a permanent cost advantage
|
|
over those that wait for a turnkey product.
|
|
|
|
- **Sovereign artifact.** A self-building system on its own hardware does
|
|
not depend on cloud APIs, GPU supply chains, or proprietary model
|
|
weights. Its intelligence is generated, verified, and sustained locally.
|
|
Enables sovereign AI for nations without GPU access.
|
|
|
|
* Open Questions
|
|
|
|
1. Can CIC (dependent type theory) be implemented as a Passepartout skill,
|
|
verified for crash-freedom and rule fidelity by ACL2, and integrated
|
|
into the existing fact store API? The Gödelian boundary: ACL2 can
|
|
verify the kernel's implementation but not its soundness in any
|
|
absolute sense — but this matches current practice (Lean 4's ~500 line
|
|
C++ kernel is trusted, not proved).
|
|
|
|
2. Can the system generate novel proof strategies? A sufficiently rich
|
|
abstract theory layer + Screamer could propose: "Proofs in domain X
|
|
all use induction schema Y. Generalizing to Z would prove new
|
|
properties across A, B, C." The LLM translates to a metafunction;
|
|
ACL2 verifies it; the prover gains a new tactic invented by itself.
|
|
|
|
3. What is the social contract for a system that can truthfully say
|
|
"I know this is correct" — and "I know what I don't know"?
|
|
Most current AI systems can do neither.
|
|
|
|
* Impact on the AI and GPU Industry
|
|
|
|
If a symbolic-bootstrapping architecture becomes popular — especially now
|
|
that codified domains can be ingested at near-zero encoding cost — the
|
|
industry structure shifts fundamentally.
|
|
|
|
** Token demand compresses
|
|
|
|
The entire AI industry (OpenAI, Anthropic, Google — ~$50B API revenue) is
|
|
built on per-token pricing: metered cognition. A mature Passepartout
|
|
reduces token consumption to the unfamiliar 10% I/O boundary. Token demand
|
|
shifts from "every interaction burns tokens" to "only unfamiliar
|
|
interactions burn tokens." Steady-state per-user LLM consumption drops by
|
|
an order of magnitude.
|
|
|
|
** GPU inference demand plateaus in regulated industries
|
|
|
|
GPU inference is driven by two things: training and per-request inference.
|
|
Training demand is unaffected (frontier models still train on clusters).
|
|
Inference demand drops 80-90% in any sector where the rule book is
|
|
published — which covers most economically significant sectors (finance,
|
|
healthcare, industrial, government procurement, legal compliance).
|
|
|
|
Nvidia's growth narrative shifts from "every transaction goes through a
|
|
GPU" to "every training run needs a GPU, and the generative 20% needs
|
|
inference." A smaller inference TAM than current market pricing assumes.
|
|
|
|
** Hyperscaler competition shifts
|
|
|
|
The competitive thesis "AI is the next OS, and we own the compute layer"
|
|
weakens if the most valuable AI workloads run on a $500 RISC-V board on
|
|
your premises. The hyperscalers respond by:
|
|
- Offering Passepartout as a managed service (AGPL allows this)
|
|
- Differentiating on the frontier I/O API and world model API
|
|
- Competing on gate rule libraries for specific industries
|
|
|
|
The race shifts from "who has the most H100s" to "who has the best
|
|
domain-specific gate rules." Google's industry data advantage matters
|
|
more than Azure's raw compute.
|
|
|
|
** New hardware tier: verification appliances
|
|
|
|
A new category emerges: CPU-native verification appliances running a Lisp
|
|
microcode on RISC-V cores. Low volume (hundreds of thousands/year),
|
|
high margin ($5K-50K/unit), high switching costs. The Sun Microsystems
|
|
model, not the Intel model. Manufacturable at older fab nodes (28nm,
|
|
45nm) — no dependency on TSMC's leading edge.
|
|
|
|
** The key uncertainty and its resolution
|
|
|
|
Original question: how long does gate-rule encoding take?
|
|
|
|
Resolution: for codified domains, near-zero. The LLM translates published
|
|
regulations into formal rules in one pass — it is a mechanical transformation,
|
|
not open-ended reasoning. The bottleneck only exists for tacit, oral, unwritten
|
|
knowledge (craft expertise, organizational culture).
|
|
|
|
Consequence for the transition timeline: Phase 2 (sufficiency) happens
|
|
within months for any domain whose rule book is published. The disruption
|
|
accelerates from years to quarters.
|
|
|
|
* Broader Insights
|
|
|
|
** The historical fork: why C won economically in the 1980s
|
|
|
|
C won because the economics of 1980s hardware made Lisp's overhead
|
|
unaffordable:
|
|
|
|
- **Memory cost.** DRAM was ~$5,000/MB in 1980. Lisp's runtime (SBCL today
|
|
is ~40MB) was unthinkable. C's runtime fit in 64KB.
|
|
- **CPU speed.** 1-10MHz. Every instruction counted. Lisp's GC, type dispatch,
|
|
and dynamic allocation consumed cycles that C spent on actual work.
|
|
- **Software scale.** Programs were thousands of lines, not millions. A single
|
|
developer could hold the entire program in their head and verify correctness
|
|
by reading it. Testing was sufficient. Formal verification was unnecessary
|
|
overhead.
|
|
- **Market dynamics.** The PC market was expanding exponentially. Speed to
|
|
market, volume, and unit cost mattered more than correctness. A buggy $500
|
|
PC sold more units than a correct $50,000 Lisp Machine.
|
|
- **Hardware ecosystem.** RISC (reduced instruction set) was the revolution.
|
|
Simpler chips, higher clock speeds, cheaper fabrication. RISC CPUs are
|
|
optimized for C's execution model because C was the dominant systems
|
|
language when RISC was designed.
|
|
|
|
Lisp lost not because it was worse, but because the market optimized for
|
|
a different axis: raw throughput per dollar, not correctness per line.
|
|
|
|
** What changed to make Lisp viable now
|
|
|
|
Four transformations flipped the economics:
|
|
|
|
1. Memory is free. 40MB runtime is noise on a $20 Raspberry Pi with 8GB
|
|
RAM. The cost of the runtime is now zero at any relevant scale.
|
|
|
|
2. Transistors are free. A modern ARM Cortex-A72 has billions of
|
|
transistors. The GC, type dispatch, and dynamic dispatch that Lisp
|
|
needs are executed in dedicated silicon within the CPU — they cost
|
|
nothing because the transistors are there whether used or not.
|
|
|
|
3. Software complexity saturates human verification. Systems are now
|
|
tens of millions of lines. No single person can hold them in their
|
|
head. Testing is necessary but insufficient — zero-day vulnerabilities
|
|
prove that bugs survive all testing. Formal verification is no longer
|
|
overhead; it is the only known path to correctness at this scale.
|
|
|
|
4. The cost of failure is now higher than the cost of verification. A
|
|
single breach costs millions. A compliance failure shuts down a
|
|
factory. Regulation (GDPR, SOX, HIPAA, FedRAMP) mandates provable
|
|
compliance. The cost of proving correctness is now cheaper than the
|
|
cost of not proving it.
|
|
|
|
** The key insight
|
|
|
|
The 1980s trade-off was: C is cheap enough for the market. Correctness
|
|
is a luxury the market cannot afford.
|
|
|
|
The 2020s trade-off is: C is expensive for the market. Incorrectness
|
|
has become the dominant cost of software. Lisp's verification infrastructure
|
|
is now the cheaper option.
|
|
|
|
This is the inversion Passepartout exploits: the verification appliance
|
|
(AGPL symbolic engine + RISC-V Lisp μcode on FPGA) costs $5,000/year and
|
|
replaces $500,000/year in compliance audits, breach litigation, and
|
|
regulatory fines. The 1980s math said Lisp was too expensive at any price.
|
|
The 2020s math says Lisp is the only affordable option.
|
|
|
|
The remaining question is not whether the economics flipped — it's whether
|
|
anyone builds the bridge from today's AGPL software to tomorrow's
|
|
verification appliance. Passepartout is that bridge.
|
|
|
|
** Distribution and lifecycle management
|
|
|
|
Passepartout diverges in two independent dimensions across instances:
|
|
code (engine, macro layers, skills) and knowledge (gate rules, fact store,
|
|
ontology, proof forest, empirical decision history). This creates a
|
|
distribution problem no current AI system faces — most agents ship as
|
|
static binaries with no per-instance knowledge divergence.
|
|
|
|
*** Distribution tiers
|
|
|
|
Code only (AGPL repo — GitHub)
|
|
- The core engine, ACL2 macro layers as skills, generic gate rules.
|
|
- No domain knowledge, no pre-seeded fact store, no proof forest.
|
|
- User bootstraps from scratch — the system learns from their gate outcomes.
|
|
- Audience: hobbyists, researchers, early adopters willing to invest time.
|
|
|
|
Code + basic knowledge (commercial data package)
|
|
- Pre-seeded fact store for a domain (healthcare, finance, infrastructure).
|
|
- Curated gate rules, ontology categories, and initial ACL2 theorem set.
|
|
- The buyer gets sufficiency faster — weeks instead of months to reach
|
|
the Phase 4 flip in their domain.
|
|
- Audience: enterprises that want fast deployment without bootstrapping.
|
|
|
|
Code + knowledge + verification appliance (hardware product)
|
|
- RISC-V + Lisp μcode on FPGA or custom ASIC.
|
|
- Pre-loaded with full domain-specific knowledge package.
|
|
- Hardened, certified, no-cloud, tamper-proof HSM root of trust.
|
|
- Audience: regulated industries that need provable compliance out of
|
|
the box.
|
|
|
|
*** The upgrade problem
|
|
|
|
Once instances diverge in both code and knowledge, a naive `git pull`
|
|
breaks things:
|
|
|
|
1. A new macro layer expects a fact structure the old store doesn't have
|
|
→ re-verification fails, system degrades to pre-macro state
|
|
|
|
2. Two instances of the same version develop different ontologies
|
|
→ a shared gate rule package produces different outcomes on each
|
|
|
|
3. Knowledge learned in v1 must be verified against v2's code
|
|
→ old facts carry `:ontology-version`, ACL2 checks they're still
|
|
consistent under the new code's rules
|
|
|
|
4. A security patch changes the gate stack
|
|
→ all existing gate-outcome facts must be re-verified against the
|
|
new gate vectors
|
|
|
|
*** How Passepartout's architecture handles this
|
|
|
|
The architecture already has the primitives for safe upgrades:
|
|
|
|
- **Ontology versioning (Phase 5).** Every fact stores the ontology
|
|
version at assertion. On upgrade, facts with old versions are flagged
|
|
for re-verification. ACL2 checks consistency against new code. Facts
|
|
that survive are promoted to the new version. Facts that fail are
|
|
flagged for review.
|
|
|
|
- **Degradation, not crash.** If an upgrade breaks the fact store, the
|
|
system degrades to the pre-macro symbolic state (hash-table fallback
|
|
for VivaceGraph, text-scan fallback for Screamer). It still works —
|
|
it just can't prove as much. The TUI displays: "Symbolic index: legacy
|
|
mode. 2,134 facts pending re-verification under new ontology."
|
|
|
|
- **Reversible upgrades (Phase 0 undo).** Every upgrade produces a
|
|
Merkle snapshot before applying. If re-verification fails, the system
|
|
rolls back to the pre-upgrade state. The failed upgrade becomes a
|
|
memory-object with `:provenance :failed-upgrade` — empirical data
|
|
for the next upgrade attempt.
|
|
|
|
- **Delta distribution.** Upgrades are delivered as diffs against the
|
|
current ontology version, not full replacements. A code upgrade ships
|
|
with a migration script: "This version changes the gate rule structure
|
|
from (action pattern) to (action pattern context meta). Here is the
|
|
automated migration for your existing gate rules." The user's
|
|
empirical data is preserved across the migration.
|
|
|
|
*** The real challenge: divergence of empirical knowledge
|
|
|
|
The hardest upgrade is not code — it's the fact store. Two instances
|
|
that started from the same code but served different users for a year
|
|
have diverged ontologies. Instance A's gate rules encode "emergency
|
|
restart = docker compose restart" (hospital IT). Instance B's gate rules
|
|
encode "emergency restart = emergency_shutdown.py" (factory floor).
|
|
|
|
A code upgrade that changes how emergency restarts are processed cannot
|
|
be safely applied to both instances with the same migration. The
|
|
migration must be per-instance, tested against the instance's specific
|
|
facts.
|
|
|
|
The pattern that solves this: **the upgrade is itself verified by the
|
|
upgraded system before committing.** The distributor ships: "Here is
|
|
the new `emergency-restart` gate vector. Run it against your existing
|
|
gate rules. ACL2 will report which rules are compatible and which need
|
|
human review." The instance operator reviews only the incompatible
|
|
subset.
|
|
|
|
This is slower than `git pull` but avoids the catastrophe of a silent
|
|
incompatibility. An enterprise running Passepartout in a compliance
|
|
context would not accept an unverified upgrade anyway.
|
|
|
|
*** The business model for upgrades
|
|
|
|
- Code upgrades: free (AGPL). Core engine improvements benefit everyone.
|
|
- Migration scripts: subscription. The instance operator pays for the
|
|
verified migration path from their current ontology version to the
|
|
new one. The migration script is generated by running the new code
|
|
against the instance's fact store in a sandbox.
|
|
- Domain knowledge package upgrades: subscription. When HIPAA updates,
|
|
the healthcare gate rule package updates. The new rules are verified
|
|
against the instance's existing fact store before shipping.
|
|
- Verification appliance firmware: bundled with hardware. The Lisp
|
|
Machine's microcode is updated as the engine evolves. The update is
|
|
signed and verified against the hardware root of trust before
|
|
applying.
|
|
|
|
** Large refactoring in a neurosymbolic planner — semantic equivalence
|
|
|
|
*** The workflow
|
|
|
|
ACL2 proves semantic equivalence of programs written in its own
|
|
logic — which includes Passepartout's own source code. When the
|
|
system refactors its own skills, ACL2 can prove the new function
|
|
produces the same outputs for all inputs as the old one. This is
|
|
standard ACL2 practice (verifying compiler optimizations, sort
|
|
algorithm replacements).
|
|
|
|
For other languages (Python, Java, JavaScript), the path is:
|
|
1. Model the critical subset (API surface, contracts, data
|
|
transformations) in ACL2 as a logical specification
|
|
2. Prove the specification is preserved across the refactoring
|
|
3. The actual implementation stays in the target language —
|
|
ACL2 proves the structural contract, not the runtime behavior
|
|
|
|
The CIC prover upgrade (Lean-in-Lisp, planned as future work)
|
|
would extend this to dependent-type-level equivalence proofs
|
|
across language boundaries — verifying that a Rust API binding
|
|
correctly wraps a C library, or that a Python refactoring
|
|
preserves the type-level contract of the original.
|
|
|
|
** The self-driving Lisp Machine on FPGA or Tenstorrent
|
|
|
|
A Tenstorrent P150 (~72 RISC-V Tensix cores on a PCIe card) or
|
|
a mid-range FPGA (AMD Alveo, Intel Agilex) offers enough
|
|
hardware to run a full Passepartout image with Lisp microcode
|
|
acceleration. The host Linux system provides boot, I/O, and
|
|
thermal management; the accelerator card provides the Lisp
|
|
execution fabric.
|
|
|
|
*** What it can do today
|
|
|
|
- **Run the full symbolic engine.** ACL2, Screamer, VivaceGraph,
|
|
and the fact store are pure Lisp — they run on any Lisp backend.
|
|
The RISC-V cores on a Tenstorrent or the soft-core on an FPGA
|
|
provide enough compute for real-time gate verification and
|
|
constraint solving.
|
|
|
|
- **Hot-reload skills and macro layers.** The Lisp image loads
|
|
skills, tangles Org files, compiles ACL2 books, and registers
|
|
metafunctions — all without reboot. The FPGA fabric can be
|
|
reprogrammed with new microcode in milliseconds.
|
|
|
|
- **Manage its own knowledge base.** The fact store grows and
|
|
evolves. Gate rules are proposed by the LLM and verified by
|
|
ACL2. Ontology versions are tracked. The system knows what
|
|
it knows and what changed.
|
|
|
|
- **Roll back failed upgrades.** Merkle snapshots provide
|
|
instant undo for both software state and FPGA configuration.
|
|
|
|
*** What it needs to cross the threshold to self-driving
|
|
|
|
The system is not yet fully self-driving because three things
|
|
still require external intervention:
|
|
|
|
1. **The LLM dependency.** The 10% I/O translation (natural
|
|
language → structured goal, structured result → natural
|
|
language) requires an LLM. A small local model (Phi-4,
|
|
Qwen 2.5) on the host or card can serve this. The symbolic
|
|
engine handles everything else. Once sufficiency flips
|
|
(Phase 4), even the LLM is rarely needed.
|
|
|
|
2. **Hardware driver development.** The FPGA microcode (tagged
|
|
memory, hardware GC, Lisp dispatch in hardware) is currently
|
|
written by humans. The system could eventually propose new
|
|
microcode patterns from profiling data — "your GC accounts
|
|
for 12% of runtime; here is a hardware GC barrier that
|
|
reduces it to 3%" — but the synthesis and verification of
|
|
hardware descriptions (VHDL, Verilog) requires a separate
|
|
toolchain.
|
|
|
|
3. **The initial bootstrap.** The first FPGA load, the first
|
|
Linux boot, the first Lisp image — these are done by a
|
|
human or a pre-existing system. Once bootstrapped, the
|
|
system manages itself. The threshold is crossed when the
|
|
system can design, compile, and load its own FPGA microcode
|
|
from within the running image.
|
|
|
|
*** The threshold
|
|
|
|
The self-driving threshold is crossed when the system can
|
|
synthesize and load its own FPGA microcode or Tensix dispatch
|
|
programs from within the running Lisp image. At that point:
|
|
|
|
- The system profiles its own gate verification latency
|
|
- It proposes a new microcoded instruction for the hot path
|
|
- It compiles Verilog from ACL2-verified specifications
|
|
- It reprograms the FPGA fabric via PCIe DMA from within SBCL
|
|
- It benchmarks the new instruction against the old one
|
|
- If throughput improves, the new microcode becomes permanent
|
|
- If not, it rolls back and tries another approach
|
|
|
|
This is not science fiction — it is the natural extension of
|
|
an architecture that already hot-reloads its own code, tracks
|
|
its own performance telemetry, and verifies its own changes
|
|
before committing them. The hardware description language is
|
|
the last abstraction boundary.
|
|
|
|
*** What stops it from being full science fiction
|
|
|
|
| Barrier | Status | Path |
|
|
|---------|--------|------|
|
|
| LLM dependency | Phase 4 flip reduces it to near-zero | Already designed |
|
|
| Hardware microcode synthesis | Most speculative | Requires hardware DSL verified by ACL2, then compiled to FPGA bitstream |
|
|
| Initial bootstrap | One-time human action | After first load, system manages itself |
|
|
| Power and thermal | Handled by host Linux | Unchanged |
|
|
| PCIe DMA from SBCL | Feasible with sb-alien + libpcie | Needs driver, but well-understood |
|
|
|
|
The Tenstorrent approach is particularly interesting because
|
|
its Tensix cores are *already* RISC-V processors. The microcode
|
|
is not FPGA logic — it's a RISC-V program. The system can write
|
|
RISC-V assembly, compile it with the RISC-V toolchain, load it
|
|
onto the Tensix cores, and benchmark the result. This is
|
|
dramatically simpler than FPGA synthesis because it's software,
|
|
not hardware.
|
|
|
|
A Tenstorrent P150 running Passepartout would be: 72 RISC-V
|
|
cores running Lisp microcode, one core dedicated to the ACL2
|
|
prover, one to Screamer, the rest to gate verification and
|
|
fact store operations. The host Linux system handles I/O and
|
|
the LLM. The system designs its own core dispatch logic,
|
|
loads it onto idle cores, and verifies the result with ACL2
|
|
before committing.
|
|
|
|
** How the LLM's coding ability affects the bootstrapping timeline
|
|
|
|
The LLM writes code at every stage of the bootstrapping:
|
|
1. The Lisp Machine's microcode (RISC-V dispatch, GC barriers,
|
|
tagged memory operations)
|
|
2. The CIC prover kernel (if built as a skill)
|
|
3. ACL2 macro layers for new domains
|
|
4. Gate rules for previously uncodified domains
|
|
5. The initial self-optimization proposals
|
|
|
|
At each stage, the symbolic engine (ACL2, Screamer, gate stack)
|
|
verifies the LLM's output before accepting it. The LLM proposes;
|
|
the symbolic engine disposes.
|
|
|
|
This means the LLM's coding ability is a **speed multiplier, not
|
|
a gate**. A weak LLM (3B local model) produces correct code
|
|
after N retries where the symbolic engine catches the mistakes
|
|
and feeds them back. A strong LLM (Claude Sonnet, DeepSeek, GPT)
|
|
produces correct code after fewer retries. The cost difference
|
|
is in API calls and wall-clock time, not in the correctness of
|
|
the final output — the symbolic engine guarantees that.
|
|
|
|
| Scenario | LLM quality | Retries per unit | Wall-clock per unit | Correctness |
|
|
|----------|-------------|------------------|---------------------|-------------|
|
|
| Bootstrapping with local 3B model | Low | 5-15 | 10x slower | 100% (verified) |
|
|
| Bootstrapping with frontier API | High | 1-3 | 1x | 100% (verified) |
|
|
| Bootstrapping after sufficiency flip | None (symbolic only) | 0 | Instant | 100% (verified) |
|
|
|
|
The critical transition is between row 2 and row 3: once the
|
|
symbolic engine has accumulated enough non-lossy facts about
|
|
the Lisp Machine's hardware behavior (latency profiles, GC
|
|
patterns, instruction timings), it can propose microcode
|
|
optimizations without any LLM involvement. ACL2 proves the
|
|
optimization preserves correctness; Screamer checks it against
|
|
known hardware constraints; the gate stack verifies it won't
|
|
damage the running system. Zero tokens.
|
|
|
|
This is the sufficiency flip applied to hardware. The timeline
|
|
to reach it depends on how many facts the system can gather
|
|
about its own runtime behavior, not on how good the LLM is.
|
|
|
|
** The surprising result
|
|
|
|
An LLM that is just barely competent at coding (enough to
|
|
generate syntactically valid RISC-V or Lisp that passes the
|
|
symbolic engine's checks after a few retries) is sufficient
|
|
for the entire bootstrapping chain. It takes longer — more
|
|
retries, more wall-clock — but it reaches the same endpoint:
|
|
a system that designs its own microcode without any LLM.
|
|
|
|
The LLM's coding ability determines how many API dollars and
|
|
calendar months the bootstrap requires. It does not determine
|
|
whether the bootstrap succeeds. A patient operator with a
|
|
3B local model and a Tenstorrent card reaches the same
|
|
destination as an operator with bottomless API credits — the
|
|
second arrives faster, but both arrive.
|
|
|
|
Large refactoring projects (extract module, rename API, split monolith)
|
|
are the hardest test for any AI agent. Current approaches (Claude Code,
|
|
Copilot) handle them probabilistically — every step costs tokens, and
|
|
there is no formal guarantee the final system is consistent.
|
|
|
|
Passepartout's Phase 7 planner (10-80-10) transforms this: the symbolic
|
|
engine handles planning, ordering, and structural verification; the LLM
|
|
handles only the code transformation itself.
|
|
|
|
*** The workflow
|
|
|
|
1. **Codebase ingestion.** The scanner walks the entire codebase, builds
|
|
an abstract syntax graph (not just flat files — imports, type
|
|
dependencies, function call chains, test coverage). Each module
|
|
becomes a fact in the fact store with `:provenance :codebase-scan`.
|
|
|
|
2. **Goal translation (LLM, 10%).** "Extract authentication into its own
|
|
service" becomes a structured goal plist:
|
|
(:goal :extract-module
|
|
:source :auth
|
|
:target-service auth-service
|
|
:files (:affected (:app/auth/* :app/middleware/auth.clj :tests/*))
|
|
:constraints (:no-breaking-api (:public-api :unchanged))
|
|
:verification (:all-tests-pass :api-contract-preserved))
|
|
|
|
3. **Constraint satisfaction (Screamer, 80%).** Screamer expresses the
|
|
refactoring as a constraint satisfaction problem:
|
|
- Variables: file modifications ordered by dependency (auth middleware
|
|
must be updated after auth module is extracted but before its callers)
|
|
- Constraints: no circular dependencies, no step that creates broken
|
|
intermediate state, no test file modified before its source
|
|
- Objective: minimize total steps while respecting all constraints
|
|
|
|
Screamer returns a viable plan or reports unsolvability with the
|
|
conflicting constraints — for example, "auth middleware and auth
|
|
module have a circular type dependency that must be resolved before
|
|
extraction."
|
|
|
|
4. **Plan verification (ACL2, part of 80%).** ACL2 proves:
|
|
- No dependency cycles in the plan (A must run before B, B before C,
|
|
C before A → rejected)
|
|
- No deadlocks (two modules waiting on each other)
|
|
- Every planned write is within the refactoring scope (no stray
|
|
modifications to unrelated files)
|
|
- The gate stack will not reject any planned command (no blocked
|
|
patterns in the refactoring scripts)
|
|
|
|
5. **Incremental execution.** The planner executes each step:
|
|
a. Take Merkle snapshot of the current state
|
|
b. LLM proposes the code transformation for this step
|
|
c. Gate stack checks the proposal (no forbidden file writes,
|
|
no shell commands outside the allowed refactoring scope)
|
|
d. Transformation is applied
|
|
e. Tests run. If they pass → commit the snapshot, update the
|
|
fact store with the new codebase graph. If they fail → roll
|
|
back to the previous snapshot, flag the LLM's proposal as
|
|
`:provenance :failed-transformation`, and attempt a corrected
|
|
proposal.
|
|
|
|
6. **Final verification.** After all steps complete, ACL2 re-verifies
|
|
the entire codebase fact store — all dependencies, all public API
|
|
surfaces, all constraints. The result is a Merkle chain from
|
|
"before" to "after" with every intermediate state verified.
|
|
|
|
*** What the symbolic engine cannot do
|
|
|
|
The fundamental limit: **first-order logic cannot prove semantic
|
|
equivalence of arbitrary programs.** ACL2 can verify that the
|
|
refactoring plan has no structural flaws, that the dependency graph
|
|
is acyclic, that every step is within scope, and that tests pass.
|
|
It cannot prove "the extracted auth service behaves identically to
|
|
the inline auth module from the caller's perspective" for a
|
|
general-purpose language.
|
|
|
|
This gap is filled by:
|
|
- **Tests as empirical verification.** The planner runs the full test
|
|
suite after each step. A passing test suite is not a proof of
|
|
correctness, but combined with ACL2's structural verification, it
|
|
is strong empirical evidence.
|
|
- **API contract checking.** For refactoring that preserves public APIs,
|
|
ACL2 can verify that the type signatures, argument counts, and
|
|
return types of the extracted module's public interface match
|
|
exactly — a structural equivalence that does not require semantic
|
|
reasoning.
|
|
- **Human review of semantic concerns.** The planner flags steps that
|
|
involve semantic choices (e.g., "the extracted auth service now
|
|
handles token refresh differently"). These steps are presented to
|
|
the developer for review with full provenance: before state, after
|
|
state, and the diff. The developer's approval or rejection becomes
|
|
a Merkle fact with `:provenance :human-reviewed`.
|
|
|
|
*** What makes this better than Claude Code
|
|
|
|
| Dimension | Claude Code | Passepartout Planner |
|
|
|-----------|-------------|---------------------|
|
|
| Planning | Prompt-based, implicit | Screamer CSP, explicit, verified |
|
|
| Step ordering | Greedy, every step costs tokens | Dependency-ordered, zero-token constraint check |
|
|
| Rollback | Limited (git reset, no per-step) | Merkle snapshot per step, instant rollback |
|
|
| Scope control | Prompt-based ("only touch auth files") | ACL2-verified write scope, cannot escape |
|
|
| Cost | $0.50-$2 per refactoring session | Near-zero (CPU cycles for planning, LLM for code only) |
|
|
| Final proof | None — you trust that tests caught everything | Merkle chain from before→after, ACL2-verified |
|
|
|
|
A software ecosystem changing hardware economics has never happened before.
|
|
Passepartout's most realistic path: verification appliances for regulated
|
|
industries — RISC-V cores with Lisp microcode on FPGA, sold as hardened
|
|
devices for healthcare compliance, defense, and industrial control.
|
|
|
|
Not a general-purpose Lisp Machine replacing laptops. A specialized device
|
|
where correctness is worth paying for. If such appliances sell in the
|
|
hundreds of thousands, the economics of a custom Lisp ASIC start to make
|
|
sense. The reversal is not Lisp returning as a general platform, but Lisp
|
|
winning a vertical important enough to justify its own silicon.
|
|
|
|
The path: Passepartout software (AGPL) → creates demand for verified
|
|
infrastructure → verification appliance (FPGA, RISC-V + Lisp μcode) →
|
|
high-volume niche → custom ASIC economics viable → Lisp native hardware
|
|
exists for the first time since the Symbolics era.
|
|
|
|
** Lisp vs C for embedded systems
|
|
|
|
- Lisp can match C for low-level work through compile-to-C paths (ECL,
|
|
PreScheme) or tiny Lisps (uLisp, FemtoLisp, BitLisp for RISC-V)
|
|
- The GC is the hard wall for hard real-time; mitigated by pre-allocation,
|
|
no-alloc hot paths, or real-time GC
|
|
- Most practical path: "Lisp as macro language for C" — generate C from
|
|
Lisp macros, ship the compiled binary. This is how NASA's Deep Space 1
|
|
worked: Lisp planning on Earth generated commands for C flight software.
|
|
- The Lisp Machine on commodity FPGA (RISC-V softcore + Lisp μcode on
|
|
Artix-7 / iCE40) is the ambitious path — Lisp down to the metal for $50.
|
|
|
|
** Microbiology works like Lisp, not C
|
|
|
|
Striking parallels:
|
|
|
|
1. Homoiconicity — DNA is code and data in the same molecule; no separate
|
|
source and binary
|
|
2. Hot-reloadable image — alternative splicing, epigenetic marks,
|
|
post-translational modifications change the running program without
|
|
restart
|
|
3. Automatic memory management — proteasomes degrade misfolded proteins,
|
|
autophagy recycles organelles; the cell never calls free()
|
|
4. Interpreted dynamic language — DNA → RNA → ribosome (interpreter) →
|
|
protein; no static compilation step
|
|
5. Self-modifying source — CRISPR, transposons, DNA repair modify the
|
|
genome at runtime; eval on the genome
|
|
6. Duck typing — protein folding depends on chemical environment, not
|
|
type declarations; interfaces are shape-matching, not compiler-checked
|
|
7. Concurrent real-time GC — apoptosis breaks down cell components for
|
|
recycling by neighboring cells; the collector is external to the object
|
|
|
|
Biology chose the Lisp model because it is more robust, adaptable, and
|
|
evolvable. Evolution paid for the overhead (GC, interpretation, dynamic
|
|
dispatch) with parallelism and redundancy. It optimized for survival in
|
|
an unpredictable environment, not peak single-thread throughput.
|
|
|
|
Biology is the proof that the Lisp model can be efficient at planetary
|
|
scale, running on hardware that self-assembles from food. The ceiling
|
|
Passepartout aims at is still far below the system that wrote itself
|
|
in DNA.
|