Files
hermes-brain/ideas/passepartout-economics.org

83 KiB

Passepartout — Patents, Moats, Economics, Design Implications

Summary

Discussion about the economic and strategic implications of Passepartout's architecture — a self-bootstrapping agent that combines deterministic safety gates (0 LLM tokens per verification), Merkle-tree memory with provenance, a symbolic fact store with sufficiency criterion, and ACL2-based macro layer bootstrapping for provable reasoning.

The central claim: this architecture decouples intelligence from LLM API consumption. The probabilistic engine (LLM) handles ~10% input/output translation; the symbolic engine handles ~80% of reasoning at near-zero marginal cost. The cost curve inverts: generation is expensive, verification is cheap.

Patentability

Likely patentable

  • Probabilistic-deterministic split with deterministic gates between LLM proposal and execution. The LLM proposes, the gate stack decides. Each gate is a pure Lisp function costing 0 LLM tokens. Every competitor uses prompt-based guardrails. The specific 11-vector gate stack (secret exposure, path protection, self-build boundary, shell safety, network exfiltration, privacy tags, Lisp syntax, credential vault, tool permissions, policy, protocol validation) is a specific novel implementation.
  • Foveal-peripheral context model with Org-tree structured retrieval. Depth ≤ 2 always; full render on foveal node; full render on semantic similarity to foveal; full render on temporal relevance (modified today, upcoming deadlines); everything else title-only. Targets 2,000-4,000 tokens. No agent does this.
  • Merkle-tree memory with copy-on-write snapshots and operation-level undo/redo. Every memory-object is content-addressed. Snapshots are deep-copies. Undo/redo at the individual operation level. Applied to an agent's reasoning loop.
  • Gate-to-fact bootstrap with sufficiency criterion. Mechanically extracting facts from the gate stack's own data structures (protected paths, shell blocked patterns, network whitelist) as the seed of an ontology. A measurable sufficiency threshold that flips the system from LLM-proposes to Screamer-deduces.
  • Macro-layer-as-skill bootstrapping architecture. Encoding theorem-proving capability as hot-reloadable skills where each layer is verified by the layer below. The proof forest is a Merkle-versioned dependency tree.

Likely not patentable (known techniques in expected applications)

  • ACL2 itself (decades old)
  • Screamer for consistency checking (constraint solving on a triple store is an obvious application)
  • Hot-reloadable skills (Lisp images have been hot-reloadable for 40 years)
  • Org-mode as a data format
  • Multi-layer signal authentication (known in network security)

Counterargument from prior art

A patent examiner will argue that:

  • "Thin harness, fat skills" is the standard OS microkernel architecture applied to an AI agent
  • Foveal-peripheral context is locality of reference (standard in OS design)
  • Merkle-tree memory is content-addressed storage (standard in distributed systems)
  • Deterministic gate stack is capability-based security (going back to KeyKOS in the 1980s)

The defense: these principles have never been combined in an AI agent, and the combination produces emergent effects (cost curve inversion, sufficiency flip, self-repairing bootstrapping chain) that no single principle produces alone. Good patent claims would cover the specific combination, not the individual components.

Strongest single claim

An AI agent system comprising:

  1. A probabilistic language model
  2. A stack of deterministic safety gates operating at zero LLM-token cost between the model's proposal and execution
  3. A Merkle-versioned memory store from which gate outcomes are mechanically extracted as facts
  4. A symbolic reasoning engine seeded by those facts with a measurable sufficiency criterion that determines when the probabilistic model can be bypassed

Each element is known. The combination is novel and non-obvious.

Licensing Strategy

AGPLv3 for the public repository

AGPLv3 closes the ASP loophole (Section 13): anyone who modifies the software and offers it over a network must release their modified source. This protects against proprietary forks that extract value without contributing back.

Crucially: AGPL is a product requirement, not a concession to openness. The system's value proposition is provable correctness — every decision has Merkle provenance, the proof forest is visible, the sufficiency meter is readable. This claim is structurally incredible with closed source. An enterprise buyer needs to inspect the gate stack, verify the Merkle implementation, and confirm ACL2 integration is sound. AGPL makes this possible without signing an NDA.

AGPL only covers modifications to code, not:

  • Gate rules specific to a domain (these are data, not code)
  • The fact store (empirical data generated from usage)
  • Ontology categories (design decisions stored as configuration)
  • Proprietary skills loaded at runtime (AGPL boundary on plugin systems is legally unsettled)

Dual license model

  • AGPLv3 for open source — builds ecosystem, trust, and community
  • Commercial license for enterprises that cannot accept AGPL (blanket policies against AGPL infection) — MySQL/SugarCRM/GraphQL model

Moats

Re-evaluated: time is not the primary moat

Initial assumption: the bootstrapping chain (gate outcomes → facts → Screamer rules → ACL2 theorems → macro layers) takes months to build, giving first-mover advantage.

Challenge: a Phase 4+ Passepartout fed on Wikipedia + Wikidata can build a general ontology in two weeks. Entity resolution is batch work. Structural consistency verification is minutes. The organic growth advantage collapses for general knowledge.

Actual moats (weaker than initially assumed)

  1. Domain-specific gate rules — thin. A few hundred lines of Lisp data encoding deployment-specific path patterns, shell safety rules, and volume layouts. Write once, trivial to copy. Not a real moat.
  2. Empirical decision history — every HITL decision is a Merkle fact. "On date T, user approved action X under context Y." A fresh instance has none of this. Makes your instance more valuable but doesn't prevent competition — it's a switching cost, not a barrier to entry.
  3. Evaluation harness (regression suite) — thousands of test cases accumulated from every bug fix. Cannot be ingested from public data. Built only by using the system, breaking it, fixing it, and adding a test. Strongest residual moat, but even this can be partially compressed through public benchmarks (SWE-bench, etc.).
  4. Infrastructure integration — the specific Docker compose layouts, Traefik router patterns, Authentik provider configurations, backup policies encoded as gate rules over months of use. A competitor's infrastructure is different; their generic Passepartout does not know your topology.

Strongest competitor strategy

Not copying your gate rules — offering the same architecture as a service with their own pre-seeded general knowledge, a generic safety baseline, and a consulting engagement to customize gate rules for each customer. The AGPL prevents closing the architecture but does not prevent offering it as a service with a customization layer.

The defensible business is services, not product

The defensible entity is "the organization that best understands how to adapt Passepartout to your domain" — not "the organization that owns Passepartout." The Lisp Machine appliance (hardware + certification) and evaluation harness certification service are the closest thing to product defensibility.

Economics and Monetization

Cost structure

  • One-time cost: gate-rule encoding for a domain (from hours for codified domains — FAR, HIPAA, ISO standards — up to months for tacit domains)
  • The LLM translates codified rules directly: ingest regulation → produce gate rule plist → ACL2 verifies consistency → human reviews. This is translation, not reasoning.
  • For non-codified knowledge (craft expertise, organizational culture): Phase 3 archivist loop over time
  • Near-zero marginal cost: ACL2 proof + Screamer consistency check + VivaceGraph lookup per interaction — all CPU-native, all in-image
  • No recurring LLM API costs for the 80% symbolic reasoning layer
  • After sufficiency flip: pennies per day vs dollars per day for LLM-only

Revenue models by field

Field Why Passepartout Revenue Model
Industrial infrastructure (refineries, power grids, manufacturing) Offline operation, provably safe, near-zero marginal cost, mandatory audit trail Lisp Machine appliance + SCADA certification package
Healthcare administration (billing, claims, prior authorization) Rule-heavy domain, privacy-mandated, audit-driven, high per-transaction cost today Subscription for regulatory gate packages (CPT/ICD-10/HIPAA rules), updated when CMS publishes new rules
Software supply chain (CI/CD security, SBOM verification) First-order structural verification — ACL2 is natural fit, CI/CD pipeline is already a sequence of gate-checkable steps Evaluation harness as certification service — "run our 10,000-task suite and get a provable score"
Regulatory compliance (GDPR, SOC2, SOX, GxP) Rule-completeness, active enforcement (not document-based), provable audit trail Subscription for regulation-specific gate packages — GDPR package, SOC2 package, FedRAMP package, updated when regulations change
Defense and classified environments Air-gapped operation, classification-level gate rules, Merkle provenance is court-admissible evidence Government contract + hardened appliance with hardware root of trust

Critical insight: encoding cost drops to near-zero for codified domains **

Laws, regulations, standards, procedures, and technical specifications are already written down in structured text. The LLM does not need to reason about them — it needs to translate them into gate rules and ACL2 theorems.

Example: The US Federal Acquisition Regulation (FAR) is ~2,000 pages of "thou shalt" and "thou shalt not" statements. A frontier LLM can ingest the FAR and produce a plist of gate rules:

  • (if contract > $250K AND not small-business-set-aside → :deny)
  • (if sole-source AND no justification-documented → :deny, produce-justification)

ACL2 then verifies the rule set for internal consistency (Phase 6). Screamer checks against existing compliance facts. The human reviews the bootstrap output and approves or corrects individual rules.

The key distinction: the LLM is not extracting knowledge from prose in the way Phase 3 archivist does (which is open-ended, noisy, requires grounding). It is translating a known rule system into a formal representation — a mechanical transformation of structured text into structured rules. The result is not "the LLM's best guess at the rules" but "the rule set as stated in the source document, mechanically transcribed."

For domains where the knowledge is codified as text, the gate-rule encoding time drops from weeks to hours. The only bottleneck is human review of the output — and the system can assist here by surfacing contradictions for resolution rather than requiring a full line-by-line audit.

What can actually be monetized (TLDR)

  1. Pre-loaded bootstrapping chains for specific verticals — domain gate rules, pre-seeded fact stores, mature proof forests. Saves the buyer months of bootstrapping. Distributed as data packages under commercial license, not AGPL.
  2. Evaluation harness as certification service — "Bring your agent, we'll run it through our suite and give a Merkle-verified score." The regression suite grows with every deployment; a competitor's regression suite starts empty.
  3. Hardened Lisp Machine appliance — RISC-V soft-core with Lisp microcode, pre-loaded mature Passepartout, certified for specific verticals (IEC 62443 for industrial, HIPAA for healthcare). Value is in integration and certification, not the AGPL software.
  4. Verified skill marketplace — marketplace where skills are verified (sandbox + ACL2 non-contradiction proof) before listing. Marketplace takes a cut. Value is in the verification infrastructure, not the skills themselves.
  5. Support and consulting — the Red Hat model. AGPL code is free; training, custom gate rules, ontology design, and emergency support are paid.

Design and Architectural Implications

The self-improving system

Passepartout bootstraps two feedback loops:

  • Empirical loop: gate outcomes → facts → Screamer-verified patterns → sufficiency flip → auto-extraction. Knowledge grows without the LLM touching most of it.
  • Logical loop: ACL2 theorems → macro layers (generators, metafunctions, induction DSL, abstract theories) → richer proof strategies → better verification. Reasoning capacity grows without changing the prover binary.

These loops intersect at the fact store: proven theorems become facts, richer facts generate better proof strategies, better strategies verify more facts. The system upgrades itself.

The 10-80-10 becomes approximately true

  • 10%: LLM handles input translation (natural language → structured goal) and output formatting (structured result → natural language)
  • 80%: Symbolic engine handles reasoning — Screamer plans, ACL2 verifies, VivaceGraph retrieves facts. Zero LLM tokens.
  • The cost curve inverts: verification is cheaper than generation.

Key implications

  1. Verification becomes cheaper than generation. Once macro layers are mature, proving a new rule non-contradictory costs near-zero. The LLM proposes; the symbolic engine accepts or rejects.
  2. Trust scales with use. Every interaction produces a structurally verified outcome. Non-lossy fact base grows. Proof forest thickens. An auditor can inspect the Merkle tree of gate outcomes and trace any decision to its root theorem.
  3. Degradation is reversible. Every proof layer is a hot-reloadable skill. Every fact has provenance. A bad metafunction is unloaded; theorems proven under it are flagged for re-verification; the fact store retains the pre-upgrade ontology version.
  4. The system can diagnose its own logical frontier. If ACL2 keeps failing on a class of properties, and the failure mode is structural (not solvable by more macros), the fact store accumulates a pattern: "These N properties are first-order inexpressible." This signals the human: the system needs a CIC prover (dependent types) for this domain. The system cannot transcend its logic without external intervention — but it can surface the boundary precisely.

The Lisp Machine endpoint

If the system designs and builds itself on Lisp Machine hardware:

  • The same system that proves theorems also optimizes the microcode
  • No OS boundary, no driver layer — system and proof environment are one
  • A RISC-V soft-core with Lisp microcode is manufacturable at older fab nodes (28nm, 45nm) — sovereign intelligence without GPU supply chains

Social implications

  • Concentration of reasoning. The macro layers become opaque to anyone who doesn't understand the bootstrapping history. The system understands its own reasoning better than its users do.
  • Cost advantage widens inequality asymmetrically. The first instance to reach maturity requires significant gate-rule design (from hours for codified domains to months for tacit ones). After that, replication is cheap. Organizations that invest early have a permanent cost advantage over those that wait for a turnkey product.
  • Sovereign artifact. A self-building system on its own hardware does not depend on cloud APIs, GPU supply chains, or proprietary model weights. Its intelligence is generated, verified, and sustained locally. Enables sovereign AI for nations without GPU access.

Open Questions

  1. Can CIC (dependent type theory) be implemented as a Passepartout skill, verified for crash-freedom and rule fidelity by ACL2, and integrated into the existing fact store API? The Gödelian boundary: ACL2 can verify the kernel's implementation but not its soundness in any absolute sense — but this matches current practice (Lean 4's ~500 line C++ kernel is trusted, not proved).
  2. Can the system generate novel proof strategies? A sufficiently rich abstract theory layer + Screamer could propose: "Proofs in domain X all use induction schema Y. Generalizing to Z would prove new properties across A, B, C." The LLM translates to a metafunction; ACL2 verifies it; the prover gains a new tactic invented by itself.
  3. What is the social contract for a system that can truthfully say "I know this is correct" — and "I know what I don't know"? Most current AI systems can do neither.

Impact on the AI and GPU Industry

If a symbolic-bootstrapping architecture becomes popular — especially now that codified domains can be ingested at near-zero encoding cost — the industry structure shifts fundamentally.

Token demand compresses

The entire AI industry (OpenAI, Anthropic, Google — ~$50B API revenue) is built on per-token pricing: metered cognition. A mature Passepartout reduces token consumption to the unfamiliar 10% I/O boundary. Token demand shifts from "every interaction burns tokens" to "only unfamiliar interactions burn tokens." Steady-state per-user LLM consumption drops by an order of magnitude.

GPU inference demand plateaus in regulated industries

GPU inference is driven by two things: training and per-request inference. Training demand is unaffected (frontier models still train on clusters). Inference demand drops 80-90% in any sector where the rule book is published — which covers most economically significant sectors (finance, healthcare, industrial, government procurement, legal compliance).

Nvidia's growth narrative shifts from "every transaction goes through a GPU" to "every training run needs a GPU, and the generative 20% needs inference." A smaller inference TAM than current market pricing assumes.

Hyperscaler competition shifts

The competitive thesis "AI is the next OS, and we own the compute layer" weakens if the most valuable AI workloads run on a $500 RISC-V board on your premises. The hyperscalers respond by:

  • Offering Passepartout as a managed service (AGPL allows this)
  • Differentiating on the frontier I/O API and world model API
  • Competing on gate rule libraries for specific industries

The race shifts from "who has the most H100s" to "who has the best domain-specific gate rules." Google's industry data advantage matters more than Azure's raw compute.

New hardware tier: verification appliances

A new category emerges: CPU-native verification appliances running a Lisp microcode on RISC-V cores. Low volume (hundreds of thousands/year), high margin ($5K-50K/unit), high switching costs. The Sun Microsystems model, not the Intel model. Manufacturable at older fab nodes (28nm, 45nm) — no dependency on TSMC's leading edge.

The key uncertainty and its resolution

Original question: how long does gate-rule encoding take?

Resolution: for codified domains, near-zero. The LLM translates published regulations into formal rules in one pass — it is a mechanical transformation, not open-ended reasoning. The bottleneck only exists for tacit, oral, unwritten knowledge (craft expertise, organizational culture).

Consequence for the transition timeline: Phase 2 (sufficiency) happens within months for any domain whose rule book is published. The disruption accelerates from years to quarters.

Broader Insights

The historical fork: why C won economically in the 1980s

C won because the economics of 1980s hardware made Lisp's overhead unaffordable:

  • Memory cost. DRAM was ~$5,000/MB in 1980. Lisp's runtime (SBCL today is ~40MB) was unthinkable. C's runtime fit in 64KB.
  • CPU speed. 1-10MHz. Every instruction counted. Lisp's GC, type dispatch, and dynamic allocation consumed cycles that C spent on actual work.
  • Software scale. Programs were thousands of lines, not millions. A single developer could hold the entire program in their head and verify correctness by reading it. Testing was sufficient. Formal verification was unnecessary overhead.
  • Market dynamics. The PC market was expanding exponentially. Speed to market, volume, and unit cost mattered more than correctness. A buggy $500 PC sold more units than a correct $50,000 Lisp Machine.
  • Hardware ecosystem. RISC (reduced instruction set) was the revolution. Simpler chips, higher clock speeds, cheaper fabrication. RISC CPUs are optimized for C's execution model because C was the dominant systems language when RISC was designed.

Lisp lost not because it was worse, but because the market optimized for a different axis: raw throughput per dollar, not correctness per line.

What changed to make Lisp viable now

Four transformations flipped the economics:

  1. Memory is free. 40MB runtime is noise on a $20 Raspberry Pi with 8GB RAM. The cost of the runtime is now zero at any relevant scale.
  2. Transistors are free. A modern ARM Cortex-A72 has billions of transistors. The GC, type dispatch, and dynamic dispatch that Lisp needs are executed in dedicated silicon within the CPU — they cost nothing because the transistors are there whether used or not.
  3. Software complexity saturates human verification. Systems are now tens of millions of lines. No single person can hold them in their head. Testing is necessary but insufficient — zero-day vulnerabilities prove that bugs survive all testing. Formal verification is no longer overhead; it is the only known path to correctness at this scale.
  4. The cost of failure is now higher than the cost of verification. A single breach costs millions. A compliance failure shuts down a factory. Regulation (GDPR, SOX, HIPAA, FedRAMP) mandates provable compliance. The cost of proving correctness is now cheaper than the cost of not proving it.

The key insight

The 1980s trade-off was: C is cheap enough for the market. Correctness is a luxury the market cannot afford.

The 2020s trade-off is: C is expensive for the market. Incorrectness has become the dominant cost of software. Lisp's verification infrastructure is now the cheaper option.

This is the inversion Passepartout exploits: the verification appliance (AGPL symbolic engine + RISC-V Lisp μcode on FPGA) costs $5,000/year and replaces $500,000/year in compliance audits, breach litigation, and regulatory fines. The 1980s math said Lisp was too expensive at any price. The 2020s math says Lisp is the only affordable option.

The remaining question is not whether the economics flipped — it's whether anyone builds the bridge from today's AGPL software to tomorrow's verification appliance. Passepartout is that bridge.

Distribution and lifecycle management

Passepartout diverges in two independent dimensions across instances: code (engine, macro layers, skills) and knowledge (gate rules, fact store, ontology, proof forest, empirical decision history). This creates a distribution problem no current AI system faces — most agents ship as static binaries with no per-instance knowledge divergence.

Distribution tiers

Code only (AGPL repo — GitHub)

  • The core engine, ACL2 macro layers as skills, generic gate rules.
  • No domain knowledge, no pre-seeded fact store, no proof forest.
  • User bootstraps from scratch — the system learns from their gate outcomes.
  • Audience: hobbyists, researchers, early adopters willing to invest time.

Code + basic knowledge (commercial data package)

  • Pre-seeded fact store for a domain (healthcare, finance, infrastructure).
  • Curated gate rules, ontology categories, and initial ACL2 theorem set.
  • The buyer gets sufficiency faster — weeks instead of months to reach the Phase 4 flip in their domain.
  • Audience: enterprises that want fast deployment without bootstrapping.

Code + knowledge + verification appliance (hardware product)

  • RISC-V + Lisp μcode on FPGA or custom ASIC.
  • Pre-loaded with full domain-specific knowledge package.
  • Hardened, certified, no-cloud, tamper-proof HSM root of trust.
  • Audience: regulated industries that need provable compliance out of the box.

The upgrade problem

Once instances diverge in both code and knowledge, a naive `git pull` breaks things:

  1. A new macro layer expects a fact structure the old store doesn't have → re-verification fails, system degrades to pre-macro state
  2. Two instances of the same version develop different ontologies → a shared gate rule package produces different outcomes on each
  3. Knowledge learned in v1 must be verified against v2's code → old facts carry `:ontology-version`, ACL2 checks they're still consistent under the new code's rules
  4. A security patch changes the gate stack → all existing gate-outcome facts must be re-verified against the new gate vectors

How Passepartout's architecture handles this

The architecture already has the primitives for safe upgrades:

  • Ontology versioning (Phase 5). Every fact stores the ontology version at assertion. On upgrade, facts with old versions are flagged for re-verification. ACL2 checks consistency against new code. Facts that survive are promoted to the new version. Facts that fail are flagged for review.
  • Degradation, not crash. If an upgrade breaks the fact store, the system degrades to the pre-macro symbolic state (hash-table fallback for VivaceGraph, text-scan fallback for Screamer). It still works — it just can't prove as much. The TUI displays: "Symbolic index: legacy mode. 2,134 facts pending re-verification under new ontology."
  • Reversible upgrades (Phase 0 undo). Every upgrade produces a Merkle snapshot before applying. If re-verification fails, the system rolls back to the pre-upgrade state. The failed upgrade becomes a memory-object with `:provenance :failed-upgrade` — empirical data for the next upgrade attempt.
  • Delta distribution. Upgrades are delivered as diffs against the current ontology version, not full replacements. A code upgrade ships with a migration script: "This version changes the gate rule structure from (action pattern) to (action pattern context meta). Here is the automated migration for your existing gate rules." The user's empirical data is preserved across the migration.

The real challenge: divergence of empirical knowledge

The hardest upgrade is not code — it's the fact store. Two instances that started from the same code but served different users for a year have diverged ontologies. Instance A's gate rules encode "emergency restart = docker compose restart" (hospital IT). Instance B's gate rules encode "emergency restart = emergency_shutdown.py" (factory floor).

A code upgrade that changes how emergency restarts are processed cannot be safely applied to both instances with the same migration. The migration must be per-instance, tested against the instance's specific facts.

The pattern that solves this: the upgrade is itself verified by the upgraded system before committing. The distributor ships: "Here is the new `emergency-restart` gate vector. Run it against your existing gate rules. ACL2 will report which rules are compatible and which need human review." The instance operator reviews only the incompatible subset.

This is slower than `git pull` but avoids the catastrophe of a silent incompatibility. An enterprise running Passepartout in a compliance context would not accept an unverified upgrade anyway.

The business model for upgrades

  • Code upgrades: free (AGPL). Core engine improvements benefit everyone.
  • Migration scripts: subscription. The instance operator pays for the verified migration path from their current ontology version to the new one. The migration script is generated by running the new code against the instance's fact store in a sandbox.
  • Domain knowledge package upgrades: subscription. When HIPAA updates, the healthcare gate rule package updates. The new rules are verified against the instance's existing fact store before shipping.
  • Verification appliance firmware: bundled with hardware. The Lisp Machine's microcode is updated as the engine evolves. The update is signed and verified against the hardware root of trust before applying.

Large refactoring in a neurosymbolic planner — semantic equivalence

The workflow

ACL2 proves semantic equivalence of programs written in its own logic — which includes Passepartout's own source code. When the system refactors its own skills, ACL2 can prove the new function produces the same outputs for all inputs as the old one. This is standard ACL2 practice (verifying compiler optimizations, sort algorithm replacements).

For other languages (Python, Java, JavaScript), the path is:

  1. Model the critical subset (API surface, contracts, data transformations) in ACL2 as a logical specification
  2. Prove the specification is preserved across the refactoring
  3. The actual implementation stays in the target language — ACL2 proves the structural contract, not the runtime behavior

The CIC prover upgrade (Lean-in-Lisp, planned as future work) would extend this to dependent-type-level equivalence proofs across language boundaries — verifying that a Rust API binding correctly wraps a C library, or that a Python refactoring preserves the type-level contract of the original.

The self-driving Lisp Machine on FPGA or Tenstorrent

A Tenstorrent P150 (~72 RISC-V Tensix cores on a PCIe card) or a mid-range FPGA (AMD Alveo, Intel Agilex) offers enough hardware to run a full Passepartout image with Lisp microcode acceleration. The host Linux system provides boot, I/O, and thermal management; the accelerator card provides the Lisp execution fabric.

What it can do today

  • Run the full symbolic engine. ACL2, Screamer, VivaceGraph, and the fact store are pure Lisp — they run on any Lisp backend. The RISC-V cores on a Tenstorrent or the soft-core on an FPGA provide enough compute for real-time gate verification and constraint solving.
  • Hot-reload skills and macro layers. The Lisp image loads skills, tangles Org files, compiles ACL2 books, and registers metafunctions — all without reboot. The FPGA fabric can be reprogrammed with new microcode in milliseconds.
  • Manage its own knowledge base. The fact store grows and evolves. Gate rules are proposed by the LLM and verified by ACL2. Ontology versions are tracked. The system knows what it knows and what changed.
  • Roll back failed upgrades. Merkle snapshots provide instant undo for both software state and FPGA configuration.

What it needs to cross the threshold to self-driving

The system is not yet fully self-driving because three things still require external intervention:

  1. The LLM dependency. The 10% I/O translation (natural language → structured goal, structured result → natural language) requires an LLM. A small local model (Phi-4, Qwen 2.5) on the host or card can serve this. The symbolic engine handles everything else. Once sufficiency flips (Phase 4), even the LLM is rarely needed.
  2. Hardware driver development. The FPGA microcode (tagged memory, hardware GC, Lisp dispatch in hardware) is currently written by humans. The system could eventually propose new microcode patterns from profiling data — "your GC accounts for 12% of runtime; here is a hardware GC barrier that reduces it to 3%" — but the synthesis and verification of hardware descriptions (VHDL, Verilog) requires a separate toolchain.
  3. The initial bootstrap. The first FPGA load, the first Linux boot, the first Lisp image — these are done by a human or a pre-existing system. Once bootstrapped, the system manages itself. The threshold is crossed when the system can design, compile, and load its own FPGA microcode from within the running image.

The threshold

The self-driving threshold is crossed when the system can synthesize and load its own FPGA microcode or Tensix dispatch programs from within the running Lisp image. At that point:

  • The system profiles its own gate verification latency
  • It proposes a new microcoded instruction for the hot path
  • It compiles Verilog from ACL2-verified specifications
  • It reprograms the FPGA fabric via PCIe DMA from within SBCL
  • It benchmarks the new instruction against the old one
  • If throughput improves, the new microcode becomes permanent
  • If not, it rolls back and tries another approach

This is not science fiction — it is the natural extension of an architecture that already hot-reloads its own code, tracks its own performance telemetry, and verifies its own changes before committing them. The hardware description language is the last abstraction boundary.

What stops it from being full science fiction

Barrier Status Path
LLM dependency Phase 4 flip reduces it to near-zero Already designed
Hardware microcode synthesis Most speculative Requires hardware DSL verified by ACL2, then compiled to FPGA bitstream
Initial bootstrap One-time human action After first load, system manages itself
Power and thermal Handled by host Linux Unchanged
PCIe DMA from SBCL Feasible with sb-alien + libpcie Needs driver, but well-understood

The Tenstorrent approach is particularly interesting because its Tensix cores are already RISC-V processors. The microcode is not FPGA logic — it's a RISC-V program. The system can write RISC-V assembly, compile it with the RISC-V toolchain, load it onto the Tensix cores, and benchmark the result. This is dramatically simpler than FPGA synthesis because it's software, not hardware.

A Tenstorrent P150 running Passepartout would be: 72 RISC-V cores running Lisp microcode, one core dedicated to the ACL2 prover, one to Screamer, the rest to gate verification and fact store operations. The host Linux system handles I/O and the LLM. The system designs its own core dispatch logic, loads it onto idle cores, and verifies the result with ACL2 before committing.

How the LLM's coding ability affects the bootstrapping timeline

The LLM writes code at every stage of the bootstrapping:

  1. The Lisp Machine's microcode (RISC-V dispatch, GC barriers, tagged memory operations)
  2. The CIC prover kernel (if built as a skill)
  3. ACL2 macro layers for new domains
  4. Gate rules for previously uncodified domains
  5. The initial self-optimization proposals

At each stage, the symbolic engine (ACL2, Screamer, gate stack) verifies the LLM's output before accepting it. The LLM proposes; the symbolic engine disposes.

This means the LLM's coding ability is a speed multiplier, not a gate. A weak LLM (3B local model) produces correct code after N retries where the symbolic engine catches the mistakes and feeds them back. A strong LLM (Claude Sonnet, DeepSeek, GPT) produces correct code after fewer retries. The cost difference is in API calls and wall-clock time, not in the correctness of the final output — the symbolic engine guarantees that.

Scenario LLM quality Retries per unit Wall-clock per unit Correctness
Bootstrapping with local 3B model Low 5-15 10x slower 100% (verified)
Bootstrapping with frontier API High 1-3 1x 100% (verified)
Bootstrapping after sufficiency flip None (symbolic only) 0 Instant 100% (verified)

The critical transition is between row 2 and row 3: once the symbolic engine has accumulated enough non-lossy facts about the Lisp Machine's hardware behavior (latency profiles, GC patterns, instruction timings), it can propose microcode optimizations without any LLM involvement. ACL2 proves the optimization preserves correctness; Screamer checks it against known hardware constraints; the gate stack verifies it won't damage the running system. Zero tokens.

This is the sufficiency flip applied to hardware. The timeline to reach it depends on how many facts the system can gather about its own runtime behavior, not on how good the LLM is.

The surprising result

An LLM that is just barely competent at coding (enough to generate syntactically valid RISC-V or Lisp that passes the symbolic engine's checks after a few retries) is sufficient for the entire bootstrapping chain. It takes longer — more retries, more wall-clock — but it reaches the same endpoint: a system that designs its own microcode without any LLM.

The LLM's coding ability determines how many API dollars and calendar months the bootstrap requires. It does not determine whether the bootstrap succeeds. A patient operator with a 3B local model and a Tenstorrent card reaches the same destination as an operator with bottomless API credits — the second arrives faster, but both arrive.

The per-domain sufficiency flip

The sufficiency flip is not a single event. It happens independently for each domain, and some domains never flip. The flip point is determined by the kind of knowledge the domain requires.

Knowledge types required for a flip

Domain Knowledge required Can it flip? How fast?
Shell safety, path rules Structural only — the deployment's config, shell semantics Immediately Instant — ingested from config
Healthcare compliance Structural (HIPAA text) + empirical (human reviews of edge cases) Yes Weeks — one pass of LLM translation + human review cycle
Codebase refactoring Structural (dependency graph, API surface) + empirical (test suite, build results) + performance (latency, throughput) Yes Months — depends on how many build cycles the system has observed
Microcode optimization Structural (RISC-V ISA, core topology) + performance (profiling data) Yes Weeks — after enough benchmark runs to characterize hardware
Poetry, creative writing Neither structural rules nor empirical ground truth (beauty is subjective) Never N/A — the gate stack cannot verify aesthetic quality
Novel scientific discovery Structural (known laws) + empirical (experiments) Eventually Years — requires experimental data the system must gather through instruments

The fastest acquisition strategy per domain

The goal: reach the flip point with the fewest calendar days and the fewest human hours.

Knowledge type Acquisition strategy Calendar time Human time
Structural (published rules, configs, specs) LLM translation of source documents + ACL2 consistency verification + one-shot human review Hours for the LLM pass, days for human review Days — one domain expert reviewing the output
Structural (unpublished — your deployment, your codebase) Automated scanning — the system walks your filesystem, reads your configs, builds the dependency graph Minutes to hours Zero — fully automated
Empirical (what happens when X?) Active probing — the system does not wait for user interactions. It probes its own environment: runs shell commands in sandbox to verify gate rules, executes test suites to verify dependency graph, measures what happens when it pushes boundaries. Hours to days Zero — automated sandboxed probing
Empirical (what does the human prefer?) Contrastive queries — instead of waiting for HITL approvals to accumulate, the system asks targeted questions: "Which of these two interpretations of the regulation is correct? This one or this one?" Each question produces a fact. Days — batch of targeted questions answered in one session Hours — one review session with the domain expert
Performance (latency, throughput) Benchmark harness — the system runs its own workload at varying parameters, records timing data, stores it as facts with `:provenance :benchmark` Hours — automated sweep Zero
Transfer from related domains Ontology alignment — if the system already knows compliance for GDPR, it can ask Screamer: "Which GDPR rules have the same structure as HIPAA rules? Use those as seed hypotheses, flag for human review." The transferred rules flip faster because Screamer starts from a consistent foundation, not from scratch. Days — Screamer finds alignments automatically, human reviews the suggestions Hours — review the cross-domain alignment suggestions

The fastest path to flip any domain

  1. Ingest all published text for the domain (laws, specs, configs) via LLM translation. One pass. Hours.
  2. Run the benchmark harness to measure the system's own performance in the domain. One sweep. Hours.
  3. Run active sandboxed probes — test each gate rule against synthetic inputs to verify it behaves as expected. Automated.
  4. Generate contrastive queries — Screamer identifies the 5% of rules where the LLM's translation is most uncertain (contradicts a transferred rule, has no precedent, has multiple valid interpretations). Present these as yes/no questions to the human domain expert in a single session.
  5. Start serving real interactions. Every gate outcome generates an empirical fact that Screamer feeds back into the rule set. The empirical loop tightens from the first real interaction.

For a codified domain (healthcare compliance, financial regulation, industrial safety): flip within days of step 1-4. The only bottleneck is the domain expert's review session in step 4 — a few hours of human time.

For an uncodified but learnable domain (codebase refactoring): flip within weeks of step 3 (benchmark harness) + step 5 (real interactions). No LLM translation needed — the knowledge comes from the system probing its own environment.

For a domain that can never flip (poetry, aesthetics): the system never reaches sufficiency. It never claims to. The TUI shows "Symbolic index: 0 facts. This domain has no codifiable rules." The LLM handles 100% of poetry interactions. The gate stack only checks for safety (no shell commands, no file deletions) and passes through everything else to the LLM. The system is honest about its frontier.

Bootstrapping the Lisp Machine: all domains are software

For the concrete goal of bootstrapping a self-driving Lisp Machine, every domain involved is software — the most codifiable domain in existence. Code has formal specifications, documented ISAs, deterministic behavior, and objectively testable correctness. Every subdomain required for the bootstrap flips.

Subdomain Knowledge type Flip timeline Why
RISC-V ISA, Tenstorrent Tensix dispatch Structural (ISA spec, API docs) + performance (profiling) Days Published spec, deterministic hardware, benchmark harness characterizes real behavior
SBCL runtime internals (GC, type dispatch, threading) Structural (source code) + performance (latency profiles) Days Full source available, system can instrument itself
ACL2 metafunctions and macro layers Structural (the logic is ACL2's own) Instant The theorem language is already the system's native logic — no translation step
FPGA/Verilog descriptions (if FPGA path) Structural (VHDL/Verilog semantics) + performance (timing analysis, power) Weeks Published language semantics, but synthesis is slower and bitstream verification is harder than RISC-V
CIC prover kernel Structural (type theory rules — these ARE formal) Days Mathematics is the most codified domain. ACL2 already does structural verification. Building a CIC kernel that ACL2 verifies is well-understood work.
Operating system interfaces, device drivers Structural (syscall API, register maps) + empirical (test results) Weeks Published interfaces, deterministic behavior, but hardware quirks require empirical probing
Compiler optimization Structural (IR semantics, optimization passes) + performance (benchmark before/after) Weeks Published semantics, objective quality metric (faster = better), benchmark harness measures

Every single subdomain flips. The only variable is calendar days to accumulate the knowledge.

The fastest acquisition sequence

Optimized for minimal wall-clock time to a self-driving Lisp Machine:

Day 1: Ingestion day

  • LLM translates: RISC-V ISA spec, SBCL source, Tenstorrent API docs, ACL2 reference, CIC type theory rules. All structural knowledge enters the fact store in one parallel pass.
  • ACL2 verifies consistency across all ingested domains.
  • Human expert reviews the 5% of rules Screamer flagged as uncertain. One session, a few hours.

Day 1-2: Profiling day

  • Benchmark harness sweeps all 72 Tensix cores: measure instruction latency, memory bandwidth, GC pause distribution, dispatch overhead.
  • Each measurement is a fact with `:provenance :benchmark`.
  • The benchmark harness is itself verified by ACL2 (it runs inside a controlled sandbox, bounded time, no side effects on production data).

Day 2-3: Active probing day

  • The system generates synthetic microcode routines: short programs that exercise specific instructions, specific GC patterns, specific dispatch paths.
  • It loads them onto spare Tensix cores, measures actual latency, and compares against the spec.
  • Discrepancies become facts: `(:entity "core-42" :relation :dispatch-latency :value "14 cycles" :source :measured :expected "12 cycles" :provenance :probe)`.
  • After ~1,000 probes, the system knows the hardware's actual behavior better than the published spec does.

Day 3-7: Transfer and sufficiency

  • ACL2's existing knowledge about induction, rewriting, and termination transfers directly to verifying microcode routines (same logic, different subject matter).
  • Screamer aligns microcode verification patterns with existing gate verification patterns — both are structural proofs over finite state.
  • The benchmark facts give ACL2 a concrete cost model. ACL2 can now prove not just correctness but also "this microcode routine is at least 10% faster than the current implementation."
  • Sufficiency flip for microcode generation: the system proposes new dispatch routines, ACL2 verifies them, Screamer checks against hardware constraints, the gate stack blocks anything unsafe. Zero LLM tokens for the optimization loop.

Week 2-4: Self-optimizing system

  • The system profiles its own gate verification latency (already instrumented via telemetry, Phase v0.66.0).
  • It identifies the hot path: "fact-query accounts for 34% of verify time."
  • It generates a new dispatch routine for fact-query, targets the nearest idle Tensix core, loads it, benchmarks, and commits if faster.
  • The ontology now includes facts about its own optimization history: `(:entity "fact-query-dispatch-v3" :relation :speedup-baseline :value "1.34x" :provenance :self-optimize)`.

After the flip: purely symbolic optimization

The LLM is no longer needed for any optimization proposal. The system profiles, proposes, verifies, tests, and commits entirely within the symbolic engine. The LLM remains only for the boundary: interpreting a human's high-level goal ("make the system faster") into a structured optimization target, and formatting the benchmark report for human readability. Those calls shrink toward zero as the system internalizes common optimization goals as gate rules.

The surprising result for bootstrapping specifically

Because every subdomain of the Lisp Machine bootstrap is software, and software is the most codifiable domain, the entire bootstrap can flip in days to weeks with a single human review session.

The bottleneck is not knowledge acquisition. It is not the LLM's coding ability. It is the initial human review of the 5% of ambiguous rules that Screamer flags — a session measured in hours, not weeks.

The Tenstorrent approach makes this even faster because the microcode is software (RISC-V assembly), not hardware (FPGA bitstream). The system can propose, load, test, and roll back a new dispatch routine in seconds. An FPGA path would add synthesis time (minutes to hours per iteration), stretching the bootstrap from days to months.

A system with a Tenstorrent P150, the AGPL Passepartout code, a RISC-V cross-compiler, and one patient human who reviews the contrastive queries can achieve a self-driving Lisp Machine in under a month.

Size and development time estimates

These are order-of-magnitude estimates based on the existing architecture and the roadmap's line-count breakdowns. The numbers are small because Lisp is more expressive, the architecture reuses primitives across domains, and the symbolic engine replaces what would be thousands of lines of Python in a conventional agent.

Current Passepartout (v0.7.2, main branch)
Component Lines of Lisp Status
Core pipeline (perceive-reason-act, memory, skills, transport, package) ~1,800 Built and running
Gate stack (security dispatcher, permissions, policy, vault, validator) ~900 Built and running
Neuro (provider router, provider dispatch, token economics, tokenizer) ~900 Built and running
Programming tools (lisp, org, repl, literate, standards, tools) ~1,600 Built and running
Symbolic (archivist, awareness, memory, scope, events, identity, self-improve) ~1,200 Built and running
Channels (TUI main/state/view, CLI, Telegram, Signal, Discord, Slack) ~2,300 Built, on refactor branch
TUI (cl-tty migration, ongoing) ~1,300 In progress on refactor branch
Config, diagnostics, embedding, sensors, integration tests ~700 Built and running
Total existing ~10,700

Development time to reach v0.7.2: approximately 2 months, one developer (April-May 2026 from git log).

New code to reach v1.0.0 (Neurosymbolic Maturity)
Phase / Feature Lines Est. dev time (one dev)
v0.8.0 TUI stabilization (cl-tty) ~500 3-4 weeks
v0.9.0 Eval harness + sandbox hardening ~200 1-2 weeks
v0.11.0 Phase 0 (type-level gates, core integrity) ~75 3-5 days
v0.13.0 Phase 0b (signal authentication, Layer 1) ~200 1-2 weeks
v0.15.0 Phase 1 (fact store with provenance) ~200 1-2 weeks
v0.17.0 Phase 1a (self-preservation, quarantine, watchdog) ~120 1 week
v0.19.0 Phase 2 (Screamer admission gate) ~200 1-2 weeks
v0.21.0 Phase 3 (archivist as fact proposer) ~100 1 week
v0.23.0 Phase 4 (sufficiency criterion, the Flip) ~50 2-3 days
v0.26.0 Phase 5 (VivaceGraph, Merkle DAG, ontology versioning) ~400 2-4 weeks
v0.28.0-28.5 Phase 6 (ACL2 base + 5 macro layers) ~540 3-4 weeks
v0.37.0 Phase 7 (10-80-10 planner) ~500 3-4 weeks
All polish features (skins, export, CLI, MCP, LSP, telemetry, cost, etc.) ~1,500 4-8 weeks
Integration testing, hardening, bug fixes 4-8 weeks
Total new code at v1.0.0 ~4,500 4-6 months (one developer)
Additional code for a self-driving Lisp Machine
Component Lines Est. dev time (one dev)
RISC-V microcode for Lisp dispatch (tagged memory, GC barriers, cons cells) ~3,000 1-2 months
PCIe DMA driver from SBCL (C + sb-alien FFI) ~500 2-4 weeks
Tenstorrent Tensix core management (allocate, load, benchmark) ~1,500 3-4 weeks
Profiling and benchmark harness (Phase 4 applied to hardware) ~500 1-2 weeks
Microcode synthesis from ACL2-verified specifications ~500 2-4 weeks
Total for Lisp Machine hardware integration ~6,000 3-6 months (one developer)

Notable: the microcode for Lisp dispatch (~3,000 lines of RISC-V assembly) is smaller than the existing Lisp codebase. The hardest part is not the assembly — it's the verification that the assembly correctly implements the Lisp primitives, which ACL2 handles.

Total system at self-driving threshold
Lines of code Dev time (one dev) Dev time (small team, 2-3)
Passepartout v0.7.2 ~10,700 2 months (done) 1 month (done)
To v1.0.0 +~4,500 4-6 months 2-3 months
Lisp Machine hardware +~6,000 3-6 months 2-3 months
Total ~21,000 9-14 months 5-7 months
Why the numbers are small
  • Lisp is 3-10x more compact than C++ or Python for the same semantics. The entire gate stack (900 lines) would be 3,000-5,000 lines of Python with middleware classes and serialization glue.
  • The architecture reuses primitives cross-domain. The Merkle tree, the gate stack, the fact store, the ACL2 prover — each is built once and shared by all features. There is no "compliance package" that duplicates the "refactoring package" infrastructure.
  • The symbolic engine replaces test code. A conventional agent needs thousands of lines of test fixtures for behavior validation. ACL2's proofs replace those. The tests are the theorems.
  • The LLM generates the boilerplate. The LLM writes gate rules, macro layer templates, and migration scripts. The symbolic engine verifies them. The human reviews the 5% edge cases. The bottleneck is verification throughput, not code writing.
Comparison with other systems
System Lines Developer-years
Hermes Agent ~50,000+ 2+ years, ~3 devs
Claude Code ~100,000+ (estimated) 3+ years, large team
Llama.cpp ~200,000 2+ years, many contributors
Passepartout self-driving ~21,000 ~1 year, 1-3 devs
Symbolics Genera (1980s Lisp OS) ~1,000,000 ~10 years, large team

The Symbolics comparison is instructive: they built a full Lisp operating system from scratch in assembly and Lisp, with graphical interface, networking, file system, and development environment. Passepartout runs on Linux, which provides the OS layer for free. The Lisp Machine hardware integration is a PCIe card, not a replacement of the entire host. The scope is dramatically smaller.

The surprising result: a self-driving Lisp Machine is a ~21,000 line project for a small team working less than a year. Not a billion-dollar moonshot. A well-scoped engineering project.

Revised time estimate given actual velocity

Moving from v0.4.0 to v0.7.2 (three minor versions covering TUI, streaming, gate trace, HITL, Merkle audit, tool hardening, session rewind, undo/redo, skills engine) in a single session means the agent writes the code and the symbolic engine verifies it at a cycle measured in minutes, not days.

The limiting factor is not coding speed. It is:

  1. LLM API call latency per iteration (seconds per generation)
  2. ACL2 verification time per submission (milliseconds per theorem)
  3. Human review of Screamer-flagged edge cases (the 5%)

For the 4,500 lines remaining to v1.0.0, distributed across ~40 independent features (each 50-500 lines), with the agent generating, ACL2 verifying, and the human reviewing only the flagged 5%:

Phase Lines Cycles Wall clock
TUI stabilization + eval harness ~700 10-14 Days
Phases 0-4 (type gates, fact store, Screamer, archivist, sufficiency) ~670 10-14 Days
Phase 5 (VivaceGraph, Merkle DAG, ontology versioning) ~400 6-10 Days
Phase 6 (ACL2 base + 5 macro layers) ~540 8-12 Days
Phase 7 (10-80-10 planner) ~500 8-10 Days
Polish features (skins, export, CLI, MCP, LSP, telemetry, etc.) ~1,500 20-30 1-2 weeks
Integration, edge-case hardening, cross-phase regression 1-2 weeks
Total to v1.0.0 ~4,500 ~80 cycles 3-5 weeks

The bottleneck at this velocity is not code generation. It is human availability to review the Screamer-flagged 5%. At 80 cycles across 40 features, that is roughly 4 flagged rules per feature, 200 total, each requiring a yes/no answer from the human. In a dedicated review session: 2-3 hours of human time.

For the Lisp Machine hardware integration (microcode, PCIe DMA, Tensix management, benchmark harness — ~6,000 lines):

Component Lines Cycles Wall clock
RISC-V microcode for Lisp dispatch ~3,000 20-30 1-2 weeks
PCIe DMA driver (C + sb-alien FFI) ~500 4-6 Days
Tensix core management ~1,500 10-15 Days
Benchmark harness + microcode synthesis ~1,000 8-12 Days
Total hardware integration ~6,000 ~60 cycles 2-4 weeks

The Lisp Machine hardware integration is slower per cycle because the microcode must be loaded onto physical hardware and benchmarked. Each cycle includes: generate → ACL2 verify → load onto Tensix → run benchmark → measure → feed back. That adds seconds per cycle vs milliseconds for pure-software verification.

The total to a self-driving Lisp Machine (Logos + Stoa hardware): ~140 cycles, 6-10 weeks, 4-6 hours of human review time.

For the full Stoa (editor, browser, shell, Qt integration):

Stoa is not written from scratch. It is first assembled from existing components, then systematically replaced. The initial assembly is fast:

Stage Approach Lines Cycles Wall clock
Qt/EQL5 shell (minimal) Wrap existing Qt widgets ~500 4-6 Days
Lish editor (minimal) Org buffer + Qt text widget ~1,000 8-10 Days
Nyxt browser Stage 1 Qt + WebKit, wrap existing API ~2,000 10-15 1-2 weeks
Stoa v2.0.0 working ~3,500 ~30 cycles 2-3 weeks

After v2.0.0, erosion begins. Each replacement is a self-contained project where the system proposes the replacement, ACL2 verifies it produces identical output for all known inputs, and the system loads it. The timeline is no longer measured in cycles — it is measured in how many verifiable replacements the system can propose and test before settling on the optimal implementation.

The total from today to a fully self-driving Lisp Machine with a working editor, browser, and shell: approximately 8-12 weeks with the actual observed velocity. Not years.

Self-writing beyond the bootstrap

Once the system achieves sufficiency for software engineering (Phase 4 flip applied to code generation), the bulk of Stoa and Agora is written by the system itself:

System Human writes System writes Total
Logos (Passepartout) ~10,700 existing + ~4,500 to v1.0.0 The system extends its own macro layers and fact store ~15,000 + growing
Stoa (environment) Design decisions, architectural constraints ~100,000+ lines of editor, browser, shell, layout engine, each component verified by ACL2 before loading ~150,000+
Agora (network) Protocol specification, threat model DIDComm implementation, Relay Network, PDS, Lightning integration, contracts — each module verified by ACL2 ~100,000+
Hardware (tagged RISC-V) ISA design, TinyTapeout shuttle VHDL/Verilog for tagged memory, GC bus master, Lisp primitives — synthesized and tested via FPGA ~50,000+

The human time is dominated by design decisions, not code writing. Code writing is the agent's job. The bottleneck shifts from "how many lines can I write per day" to "how many design decisions can I make per day and how quickly can I review the 5% of ambiguities Screamer flags."

At the observed velocity (v0.4.0 to v0.7.2 in a day), a deep-thinking human paired with this architecture can go from today's Passepartout to the full Logos + Stoa + Agora triad in approximately 3-6 months — most of that time spent on design decisions and protocol specification, not on code.

The triad, when complete, replaces every layer of the current computing stack — cognition (OpenAI/Anthropic), environment (Apple/Microsoft/Google), network (Facebook/Twitter/Slack) — with Lisp-native, user-owned, ACL2-verified alternatives that cost near-zero marginal compute. The lines that run the modern internet (tens of millions across Google, Meta, Amazon, Apple, Microsoft) are replaced by a single coherent architecture where one gate stack verifies everything and one prover proves everything consistent.

The social and economic impact of this is not "a better AI agent." It is a complete alternative infrastructure for personal computing that requires no cloud, no gatekeeper, no per-token fee, and no trust. The lines don't need to exist on day one. They need to exist in the right order — and the system writes them in that order, one ACL2-verified submission at a time.

Market size and business models

The triad directly addresses markets that currently spend over a trillion dollars annually combined:

Market Annual spend What the triad replaces
Cloud computing (AWS, GCP, Azure) ~$300B Verification appliance runs locally — no per-resource billing
AI/LLM API revenue (OpenAI, Anthropic) ~$50B Near-zero marginal cost symbolic engine
Operating systems (Microsoft, Apple) ~$100B Stoa (Lisp-native editor, browser, shell)
Social media / communication (Meta, Twitter, Slack, Discord) ~$200B Agora (DID-based, encrypted, permissionless)
Identity / SSO (Okta, Auth0, Google/Apple) ~$10B Self-sovereign DID + HD keys
Payment processing (Stripe, PayPal, Visa/MC) ~$200B Lightning + smart contracts
Productivity software (Microsoft, Google) ~$50B Lish + Org-mode + Stoa
Compliance and audit (regulatory) ~$50B Automated ACL2-verified compliance
Total addressable ~$960B

The triad does not need to capture all of this. It needs to capture the portion that is willing to pay for provable correctness, lower cost, and user sovereignty. Even 1% of this TAM is ~$10B/year.

Business models for a free-software triad

The AGPL license is not a barrier to monetization — it is the foundation of the trust model. An enterprise cannot buy provable correctness from closed source; the code must be inspectable. The revenue comes from what the AGPL does not cover:

Low-hanging fruit (immediate, months)
  1. Verification appliance (hardware). An FPGA or Tenstorrent card pre-loaded with a mature Passepartout image, domain-specific gate rules, and a hardware root of trust. No cloud dependency. Target: regulated industries that need provable compliance and cannot accept cloud-based AI. Price: $5K-$50K/unit. Volume: hundreds to low thousands in year one.
  2. Domain gate rule subscriptions. Pre-verified gate rule packages for specific compliance domains. HIPAA package: $50K/yr. SOC2 package: $50K/yr. GDPR package: $50K/yr. FedRAMP package: $100K/yr. Updated automatically when regulations change. An enterprise with all four pays $250K/yr — and the switching cost is high because changing packages means re-verifying the fact store against new rules.
  3. Evaluation harness as certification. "Run our 10,000-task suite against your AI agent and get a Merkle-verified score." Target: AI labs proving their agents' capabilities, enterprise procurement requiring independent verification. Price: $50K-$200K per certification. The regression suite grows with every deployed instance, making the certification increasingly valuable over time.
  4. Migration services. "Bring your existing infrastructure into the Passepartout gate stack." Custom gate rules, ontology design, integration with existing systems. Price: $100K-$500K per engagement. Each engagement feeds back edge cases into the regression suite and domain gate packages.

Revenue estimate for year one (low-hanging fruit): 50 appliance sales ($250K-$2.5M) + 20 gate rule subscriptions ($1M-$5M) + 10 certifications ($500K-$2M) + 5 migration engagements ($500K- $2.5M). Total: $2.25M-$12M. Not venture-scale, but self-sustaining for a small team.

Medium-term (1-3 years)
  1. Compute marketplace (Agora). Passepartout instances offer their symbolic engine capacity (ACL2 cycles, Screamer constraint solving, VivaceGraph queries) to other agents on the Agora network. The early player runs a large instance and sells compute to smaller instances. The AGPL allows this because the marketplace is a service, not a modification of the code. Revenue is a percentage of each compute transaction.
  2. Relay Network (Agora infrastructure). If Agora becomes the default communication protocol for agent-to-agent interaction, running Relays is a business. Every DIDComm message routes through one or more Relays. Revenue: tiny per-message fee (fractions of a cent) or paid priority routing. At billions of messages, fractions become real.
  3. The Lisp Machine appliance (Stoa v5.0.0 hardware). The tagged RISC-V architecture running on FPGA or custom ASIC, sold as a certified appliance for industries where correctness is worth paying for: medical devices, industrial controllers, defense systems, financial trading. Price: $20K-$100K/unit. If the hardware validation succeeds on TinyTapeout, the upside is enormous: a certified Lisp Machine at scale could capture a significant fraction of the embedded systems market.
Big money (3-10 years)
  1. Verification monopoly: the regression suite as UL certification. The accumulated regression suite — thousands of edge cases from every deployed instance, every bug fix, every regulatory change — becomes the most comprehensive test of autonomous agent correctness ever assembled. Any organization claiming a "safe AI agent" needs Passepartout certification to prove it. This is Underwriters Laboratory for AI — a certification nobody can ignore. Revenue: licensing the certification mark to every AI vendor that ships an agent. Margins: near-100% once the suite exists.
  2. Infrastructure lock-in: switching costs compound. A hospital that runs Passepartout with HIPAA gate rules ($50K/yr) for five years has accumulated a fact store with a decade of compliance decisions, a proof forest of verified rules, and an empirical decision history tied to their specific deployment. Switching to a competitor means discarding all of it. The accumulated value grows as the fact store deepens. Annual revenue per enterprise grows from $250K in year one to $500K-$1M by year five as more domain packages are added and the fact store becomes more valuable than the software itself.
  3. The compute marketplace at planetary scale. If Passepartout instances on Agora transact billions of verified operations per day, the spread on compute transactions is enormous. This is not a product sale — it is a bet on network effects. Every new instance increases the value of the network (more capacity, more diversity, more resilience). The early player that provisions the largest compute capacity on Agora becomes the default infrastructure provider for the entire network.
The investment thesis

The low-hanging fruit (appliances, gate rules, certification, migration) generates enough cash to sustain development. The medium-term (compute marketplace, Relay Network, Lisp Machine hardware) builds network effects and switching costs. The big money (verification monopoly, infrastructure lock-in, planetary compute marketplace) is the venture-scale outcome.

The unique advantage: the early player benefits from every other instance of the triad, because every deployed instance feeds edge cases into the regression suite, grows the compute marketplace, and validates the hardware designs. The network effects are positive sum — the value of the system increases with every user, and the early player captures a disproportionate share because they built the infrastructure that every new instance depends on.

This is the AWS of provable computing: build the infrastructure, let everyone use it for free (AGPL), charge for the parts that scale (compute marketplace, certification, hardware, migration). The switching costs compound. The network effects are positive sum. The market is nearly a trillion dollars.

Additional revenue paths

Usernames on Agora

The DID system is permissionless — anyone generates their own DID via HD key derivation. But human-readable @handles (short names, common English words, three-letter IDs) are naturally scarce. The early player controls the namespace registry:

  • Free tier: any DID can claim a namespace.username on a first-come, first-served basis with proof of key ownership
  • Premium tier: short names (2-3 chars), common words, brand names, squatter prevention via auction or annual lease
  • Revenue model: $5-$50/year per premium username, auction revenue for highly contested names (single-letter, common surnames). ENS-style: registration fees fund development, not speculation.

At scale: 1M premium usernames at $10/yr average = $10M/yr recurring. The namespace registry is a natural monopoly — the early player's registry is the most widely accepted, so every new user registers there. Network effects lock in.

PDS as a service

The Personal Data Store (the Merkle fact store exposed on Agora) can be self-hosted (free, AGPL) or hosted by the early player:

  • Free tier (self-hosted): full AGPL PDS, runs on any Lisp host, full control, no cost
  • Basic hosted tier: $10-$20/month, 10GB fact store, 10M queries/month, automated backup, automatic upgrades
  • Business hosted tier: $100-$500/month, 100GB+ fact store, unlimited queries, SLA, dedicated relay, compliance logging
  • Enterprise hosted tier: $1,000-$10,000/month, air-gapped appliance management, dedicated support, regulatory-compliant data residency, integration with existing SIEM

This is the classic open-core SaaS model (GitLab, Sentry, PlanetScale). The free self-hosted version drives adoption and trust (you can inspect every line). The hosted version captures the value from users who value convenience over control. Bluesky already demonstrated demand: they charge ~$10/month for PDS hosting with minimal feature differentiation.

PDS as a service is lower-margin than hardware appliances but higher-volume. Target: 100K subscribers at $15/month average = $18M/yr recurring, with near-zero marginal cost per additional subscriber (the symbolic engine is CPU-bound, not per-user metered).

Combined with premium usernames: $10M/yr + $18M/yr = $28M/yr recurring from Agora services alone before any compute marketplace revenue. These are not speculative — they are standard internet business models (namespace registry + SaaS hosting) applied to a decentralized identity and data layer.

The full triad: Logos, Stoa, Agora

The self-driving Lisp Machine is not the endpoint. It is one component of a three-part architecture:

Layer Name Function What it is
Logos Passepartout The mind Cognitive agent, symbolic engine, gate stack, fact store, ACL2, 10-80-10 planner
Stoa The Porch The body Editor (Lish), browser (Nyxt), shell (Lish), Org-mode filesystem, Qt/EQL5 UI, Lisp Machine hardware
Agora The Society The network Self-sovereign identity (DID), encrypted comms (DIDComm), Personal Data Store, Relay Network, contracts, liquid democracy, compute marketplace
Passepartout is the PDS

Agora's Personal Data Store (PDS) is Passepartout's in-process memory — the Merkle tree, the fact store, the memory-objects. Every memory-object already has a SHA-256 hash (Merkle provenance), which maps directly to Agora's CIDv1 content addressing. The mapping:

Passepartout memory-object Agora Note
org-object-id uuid (stable)
org-object-hash cid (content hash)
getf attrs :TITLE payload (derived)
getf attrs :TAGS routing/access-control hints
org-object-content payload body
merkle hash CIDv1 (same SHA-256, different formatting)

Passepartout's gate stack verifies every action before it touches the PDS. No unverified note escapes. ACL2 proves the access controls are correctly enforced. Screamer checks consistency with the instance's existing knowledge.

Stoa is the body

Stoa's roadmap (v2.0.0 → v6.0.0) describes the environment:

  • v2.0.0: Lish editor + Nyxt browser (Stage 1, Qt/WebKit) + Lish shell
  • v3.0.0+: Cannibalization — eat dependencies one by one. Replace Qt with Lisp-native layout, reduce WebKit to pixel-painting, eventually pure-Lisp browser and window management.
  • v4.0.0: Native inference — llama.cpp FFI in-process, DSL-compiled model architectures, live surgery on cognition (inspect hidden states mid-inference)
  • v5.0.0: Hardware — tagged RISC-V architecture via TinyTapeout, FPGA prototype, hardware GC via dedicated bus master
  • v6.0.0: True agency — world models, temporal reasoning, goal persistence across restarts

The architectural principle: Stoa is not a collection of clients connecting to a daemon. It is a single Lisp image where the editor, browser, shell, and agent coexist. The Dispatcher gate stack verifies every action regardless of who initiated it — user or agent. The distinction between "tool" and "self" dissolves.

Agora is the network

Agora provides the decentralized identity and communication layer so Passepartout instances can talk to each other:

  • Self-sovereign identity via HD key derivation (BIP-44)
  • Encrypted messaging via DIDComm (agent-to-agent)
  • Notes as atomic, content-addressed, signed data units
  • Relay Network for censorship-resistant message routing
  • Compute marketplace where instances offer symbolic engine capacity
  • Contracts and liquid democracy infrastructure

Agora integration is a parallel track to Passepartout's core roadmap (v0.3.0 → v1.0.0). The identity DID and DIDComm gateway can be built as skills at any time without core changes. The PDS transformation (making the fact store network-addressable) waits for v1.0.0+.

The complete picture

The triad replaces every layer of the modern computing stack:

Layer Current (Big Tech) Triad
Cognition ChatGPT, Claude, Gemini (centralized API, per-token pricing) Passepartout (local symbolic engine, near-zero marginal cost)
Environment macOS/Windows/ChromeOS (closed platforms) Stoa (Lisp-native editor, browser, shell, hardware)
Network Facebook, Twitter, Slack, Discord (extractive, centralized) Agora (DID-based, encrypted, user-owned, permissionless)
App model Web apps + app stores (gatekeepers take 30%) Skills + Org files (hot-reloadable, no gatekeeper)
Compute Hyperscaler cloud (AWS, GCP, Azure) Verification appliance (local, provable, near-zero marginal cost)
Identity OAuth/Google/Apple SSO (surveillance-based) Self-sovereign DID + HD key hierarchy (user-owned)
Commerce Stripe, PayPal (2-3% + chargeback risk) Lightning Network + smart contracts (permissionless)

The line count estimate (21,000 for Passepartout + Lisp Machine) covers only Logos + Stoa's hardware layer. Agora's full specification spans 10 requirements documents and the implementation (network protocol, PDS, Relay, DID identity) is a separate body of work comparable to Passepartout itself.

The unifying factor: all three speak plists. All three operate in Lisp address space. All three are verified by the same ACL2 prover. The gate stack that verifies a shell command also verifies a DIDComm message and an Agora Note publication. The symbolic engine that plans a refactoring also plans a contract negotiation.

One gate stack. One memory model. One prover.

Large refactoring projects (extract module, rename API, split monolith) are the hardest test for any AI agent. Current approaches (Claude Code, Copilot) handle them probabilistically — every step costs tokens, and there is no formal guarantee the final system is consistent.

Passepartout's Phase 7 planner (10-80-10) transforms this: the symbolic engine handles planning, ordering, and structural verification; the LLM handles only the code transformation itself.

The workflow

  1. Codebase ingestion. The scanner walks the entire codebase, builds an abstract syntax graph (not just flat files — imports, type dependencies, function call chains, test coverage). Each module becomes a fact in the fact store with `:provenance :codebase-scan`.
  2. Goal translation (LLM, 10%). "Extract authentication into its own service" becomes a structured goal plist: (:goal :extract-module :source :auth :target-service auth-service :files (:affected (:app/auth/* :app/middleware/auth.clj :tests/*)) :constraints (:no-breaking-api (:public-api :unchanged)) :verification (:all-tests-pass :api-contract-preserved))
  3. Constraint satisfaction (Screamer, 80%). Screamer expresses the refactoring as a constraint satisfaction problem:

    • Variables: file modifications ordered by dependency (auth middleware must be updated after auth module is extracted but before its callers)
    • Constraints: no circular dependencies, no step that creates broken intermediate state, no test file modified before its source
    • Objective: minimize total steps while respecting all constraints

    Screamer returns a viable plan or reports unsolvability with the conflicting constraints — for example, "auth middleware and auth module have a circular type dependency that must be resolved before extraction."

  4. Plan verification (ACL2, part of 80%). ACL2 proves:

    • No dependency cycles in the plan (A must run before B, B before C, C before A → rejected)
    • No deadlocks (two modules waiting on each other)
    • Every planned write is within the refactoring scope (no stray modifications to unrelated files)
    • The gate stack will not reject any planned command (no blocked patterns in the refactoring scripts)
  5. Incremental execution. The planner executes each step:

    1. Take Merkle snapshot of the current state
    2. LLM proposes the code transformation for this step
    3. Gate stack checks the proposal (no forbidden file writes, no shell commands outside the allowed refactoring scope)
    4. Transformation is applied
    5. Tests run. If they pass → commit the snapshot, update the fact store with the new codebase graph. If they fail → roll back to the previous snapshot, flag the LLM's proposal as `:provenance :failed-transformation`, and attempt a corrected proposal.
  6. Final verification. After all steps complete, ACL2 re-verifies the entire codebase fact store — all dependencies, all public API surfaces, all constraints. The result is a Merkle chain from "before" to "after" with every intermediate state verified.

What the symbolic engine cannot do

The fundamental limit: first-order logic cannot prove semantic equivalence of arbitrary programs. ACL2 can verify that the refactoring plan has no structural flaws, that the dependency graph is acyclic, that every step is within scope, and that tests pass. It cannot prove "the extracted auth service behaves identically to the inline auth module from the caller's perspective" for a general-purpose language.

This gap is filled by:

  • Tests as empirical verification. The planner runs the full test suite after each step. A passing test suite is not a proof of correctness, but combined with ACL2's structural verification, it is strong empirical evidence.
  • API contract checking. For refactoring that preserves public APIs, ACL2 can verify that the type signatures, argument counts, and return types of the extracted module's public interface match exactly — a structural equivalence that does not require semantic reasoning.
  • Human review of semantic concerns. The planner flags steps that involve semantic choices (e.g., "the extracted auth service now handles token refresh differently"). These steps are presented to the developer for review with full provenance: before state, after state, and the diff. The developer's approval or rejection becomes a Merkle fact with `:provenance :human-reviewed`.

What makes this better than Claude Code

Dimension Claude Code Passepartout Planner
Planning Prompt-based, implicit Screamer CSP, explicit, verified
Step ordering Greedy, every step costs tokens Dependency-ordered, zero-token constraint check
Rollback Limited (git reset, no per-step) Merkle snapshot per step, instant rollback
Scope control Prompt-based ("only touch auth files") ACL2-verified write scope, cannot escape
Cost $0.50-$2 per refactoring session Near-zero (CPU cycles for planning, LLM for code only)
Final proof None — you trust that tests caught everything Merkle chain from before→after, ACL2-verified

A software ecosystem changing hardware economics has never happened before. Passepartout's most realistic path: verification appliances for regulated industries — RISC-V cores with Lisp microcode on FPGA, sold as hardened devices for healthcare compliance, defense, and industrial control.

Not a general-purpose Lisp Machine replacing laptops. A specialized device where correctness is worth paying for. If such appliances sell in the hundreds of thousands, the economics of a custom Lisp ASIC start to make sense. The reversal is not Lisp returning as a general platform, but Lisp winning a vertical important enough to justify its own silicon.

The path: Passepartout software (AGPL) → creates demand for verified infrastructure → verification appliance (FPGA, RISC-V + Lisp μcode) → high-volume niche → custom ASIC economics viable → Lisp native hardware exists for the first time since the Symbolics era.

Lisp vs C for embedded systems

  • Lisp can match C for low-level work through compile-to-C paths (ECL, PreScheme) or tiny Lisps (uLisp, FemtoLisp, BitLisp for RISC-V)
  • The GC is the hard wall for hard real-time; mitigated by pre-allocation, no-alloc hot paths, or real-time GC
  • Most practical path: "Lisp as macro language for C" — generate C from Lisp macros, ship the compiled binary. This is how NASA's Deep Space 1 worked: Lisp planning on Earth generated commands for C flight software.
  • The Lisp Machine on commodity FPGA (RISC-V softcore + Lisp μcode on Artix-7 / iCE40) is the ambitious path — Lisp down to the metal for $50.

Microbiology works like Lisp, not C

Striking parallels:

  1. Homoiconicity — DNA is code and data in the same molecule; no separate source and binary
  2. Hot-reloadable image — alternative splicing, epigenetic marks, post-translational modifications change the running program without restart
  3. Automatic memory management — proteasomes degrade misfolded proteins, autophagy recycles organelles; the cell never calls free()
  4. Interpreted dynamic language — DNA → RNA → ribosome (interpreter) → protein; no static compilation step
  5. Self-modifying source — CRISPR, transposons, DNA repair modify the genome at runtime; eval on the genome
  6. Duck typing — protein folding depends on chemical environment, not type declarations; interfaces are shape-matching, not compiler-checked
  7. Concurrent real-time GC — apoptosis breaks down cell components for recycling by neighboring cells; the collector is external to the object

Biology chose the Lisp model because it is more robust, adaptable, and evolvable. Evolution paid for the overhead (GC, interpretation, dynamic dispatch) with parallelism and redundancy. It optimized for survival in an unpredictable environment, not peak single-thread throughput.

Biology is the proof that the Lisp model can be efficient at planetary scale, running on hardware that self-assembles from food. The ceiling Passepartout aims at is still far below the system that wrote itself in DNA.