1650 lines
83 KiB
Org Mode
1650 lines
83 KiB
Org Mode
#+TITLE: Passepartout — Patents, Moats, Economics, Design Implications
|
|
#+AUTHOR: Hermes agent distillation of 2026-05-21 discussion with Amr
|
|
#+FILETAGS: :passepartout:agent:economics:ip:licensing:
|
|
#+STARTUP: content
|
|
|
|
* Summary
|
|
|
|
Discussion about the economic and strategic implications of Passepartout's
|
|
architecture — a self-bootstrapping agent that combines deterministic safety
|
|
gates (0 LLM tokens per verification), Merkle-tree memory with provenance,
|
|
a symbolic fact store with sufficiency criterion, and ACL2-based macro layer
|
|
bootstrapping for provable reasoning.
|
|
|
|
The central claim: this architecture decouples intelligence from LLM API
|
|
consumption. The probabilistic engine (LLM) handles ~10% input/output
|
|
translation; the symbolic engine handles ~80% of reasoning at near-zero
|
|
marginal cost. The cost curve inverts: generation is expensive, verification
|
|
is cheap.
|
|
|
|
* Patentability
|
|
|
|
** Likely patentable
|
|
|
|
- **Probabilistic-deterministic split with deterministic gates between LLM
|
|
proposal and execution.** The LLM proposes, the gate stack decides. Each
|
|
gate is a pure Lisp function costing 0 LLM tokens. Every competitor uses
|
|
prompt-based guardrails. The specific 11-vector gate stack (secret
|
|
exposure, path protection, self-build boundary, shell safety, network
|
|
exfiltration, privacy tags, Lisp syntax, credential vault, tool permissions,
|
|
policy, protocol validation) is a specific novel implementation.
|
|
|
|
- **Foveal-peripheral context model with Org-tree structured retrieval.**
|
|
Depth ≤ 2 always; full render on foveal node; full render on semantic
|
|
similarity to foveal; full render on temporal relevance (modified today,
|
|
upcoming deadlines); everything else title-only. Targets 2,000-4,000 tokens.
|
|
No agent does this.
|
|
|
|
- **Merkle-tree memory with copy-on-write snapshots and operation-level
|
|
undo/redo.** Every memory-object is content-addressed. Snapshots are
|
|
deep-copies. Undo/redo at the individual operation level. Applied to an
|
|
agent's reasoning loop.
|
|
|
|
- **Gate-to-fact bootstrap with sufficiency criterion.** Mechanically
|
|
extracting facts from the gate stack's own data structures (protected paths,
|
|
shell blocked patterns, network whitelist) as the seed of an ontology. A
|
|
measurable sufficiency threshold that flips the system from LLM-proposes
|
|
to Screamer-deduces.
|
|
|
|
- **Macro-layer-as-skill bootstrapping architecture.** Encoding theorem-proving
|
|
capability as hot-reloadable skills where each layer is verified by the layer
|
|
below. The proof forest is a Merkle-versioned dependency tree.
|
|
|
|
** Likely not patentable (known techniques in expected applications)
|
|
|
|
- ACL2 itself (decades old)
|
|
- Screamer for consistency checking (constraint solving on a triple store is
|
|
an obvious application)
|
|
- Hot-reloadable skills (Lisp images have been hot-reloadable for 40 years)
|
|
- Org-mode as a data format
|
|
- Multi-layer signal authentication (known in network security)
|
|
|
|
** Counterargument from prior art
|
|
|
|
A patent examiner will argue that:
|
|
- "Thin harness, fat skills" is the standard OS microkernel architecture
|
|
applied to an AI agent
|
|
- Foveal-peripheral context is locality of reference (standard in OS design)
|
|
- Merkle-tree memory is content-addressed storage (standard in distributed
|
|
systems)
|
|
- Deterministic gate stack is capability-based security (going back to
|
|
KeyKOS in the 1980s)
|
|
|
|
The defense: these principles have never been *combined* in an AI agent, and
|
|
the combination produces emergent effects (cost curve inversion, sufficiency
|
|
flip, self-repairing bootstrapping chain) that no single principle produces
|
|
alone. Good patent claims would cover the specific combination, not the
|
|
individual components.
|
|
|
|
** Strongest single claim
|
|
|
|
An AI agent system comprising:
|
|
1. A probabilistic language model
|
|
2. A stack of deterministic safety gates operating at zero LLM-token cost
|
|
between the model's proposal and execution
|
|
3. A Merkle-versioned memory store from which gate outcomes are mechanically
|
|
extracted as facts
|
|
4. A symbolic reasoning engine seeded by those facts with a measurable
|
|
sufficiency criterion that determines when the probabilistic model can
|
|
be bypassed
|
|
|
|
Each element is known. The combination is novel and non-obvious.
|
|
|
|
* Licensing Strategy
|
|
|
|
** AGPLv3 for the public repository
|
|
|
|
AGPLv3 closes the ASP loophole (Section 13): anyone who modifies the
|
|
software and offers it over a network must release their modified source.
|
|
This protects against proprietary forks that extract value without
|
|
contributing back.
|
|
|
|
Crucially: AGPL is a *product requirement*, not a concession to openness.
|
|
The system's value proposition is provable correctness — every decision has
|
|
Merkle provenance, the proof forest is visible, the sufficiency meter is
|
|
readable. This claim is structurally incredible with closed source. An
|
|
enterprise buyer needs to inspect the gate stack, verify the Merkle
|
|
implementation, and confirm ACL2 integration is sound. AGPL makes this
|
|
possible without signing an NDA.
|
|
|
|
** AGPL only covers modifications to code, not:
|
|
|
|
- Gate rules specific to a domain (these are data, not code)
|
|
- The fact store (empirical data generated from usage)
|
|
- Ontology categories (design decisions stored as configuration)
|
|
- Proprietary skills loaded at runtime (AGPL boundary on plugin systems
|
|
is legally unsettled)
|
|
|
|
** Dual license model
|
|
|
|
- AGPLv3 for open source — builds ecosystem, trust, and community
|
|
- Commercial license for enterprises that cannot accept AGPL (blanket
|
|
policies against AGPL infection) — MySQL/SugarCRM/GraphQL model
|
|
|
|
* Moats
|
|
|
|
** Re-evaluated: time is not the primary moat
|
|
|
|
Initial assumption: the bootstrapping chain (gate outcomes → facts →
|
|
Screamer rules → ACL2 theorems → macro layers) takes months to build,
|
|
giving first-mover advantage.
|
|
|
|
Challenge: a Phase 4+ Passepartout fed on Wikipedia + Wikidata can build
|
|
a general ontology in two weeks. Entity resolution is batch work. Structural
|
|
consistency verification is minutes. The organic growth advantage collapses
|
|
for general knowledge.
|
|
|
|
** Actual moats (weaker than initially assumed)
|
|
|
|
1. **Domain-specific gate rules** — thin. A few hundred lines of Lisp data
|
|
encoding deployment-specific path patterns, shell safety rules, and
|
|
volume layouts. Write once, trivial to copy. Not a real moat.
|
|
|
|
2. **Empirical decision history** — every HITL decision is a Merkle fact.
|
|
"On date T, user approved action X under context Y." A fresh instance
|
|
has none of this. Makes *your* instance more valuable but doesn't
|
|
prevent competition — it's a switching cost, not a barrier to entry.
|
|
|
|
3. **Evaluation harness (regression suite)** — thousands of test cases
|
|
accumulated from every bug fix. Cannot be ingested from public data.
|
|
Built only by using the system, breaking it, fixing it, and adding a
|
|
test. Strongest residual moat, but even this can be partially
|
|
compressed through public benchmarks (SWE-bench, etc.).
|
|
|
|
4. **Infrastructure integration** — the specific Docker compose layouts,
|
|
Traefik router patterns, Authentik provider configurations, backup
|
|
policies encoded as gate rules over months of use. A competitor's
|
|
infrastructure is different; their generic Passepartout does not know
|
|
your topology.
|
|
|
|
** Strongest competitor strategy
|
|
|
|
Not copying your gate rules — offering the same architecture as a service
|
|
with their own pre-seeded general knowledge, a generic safety baseline,
|
|
and a consulting engagement to customize gate rules for each customer.
|
|
The AGPL prevents closing the architecture but does not prevent offering
|
|
it as a service with a customization layer.
|
|
|
|
** The defensible business is services, not product
|
|
|
|
The defensible entity is "the organization that best understands how to
|
|
adapt Passepartout to your domain" — not "the organization that owns
|
|
Passepartout." The Lisp Machine appliance (hardware + certification) and
|
|
evaluation harness certification service are the closest thing to product
|
|
defensibility.
|
|
|
|
* Economics and Monetization
|
|
|
|
** Cost structure
|
|
|
|
- One-time cost: gate-rule encoding for a domain (from hours for codified
|
|
domains — FAR, HIPAA, ISO standards — up to months for tacit domains)
|
|
- The LLM translates codified rules directly: ingest regulation → produce
|
|
gate rule plist → ACL2 verifies consistency → human reviews. This is
|
|
translation, not reasoning.
|
|
- For non-codified knowledge (craft expertise, organizational culture):
|
|
Phase 3 archivist loop over time
|
|
- Near-zero marginal cost: ACL2 proof + Screamer consistency check +
|
|
VivaceGraph lookup per interaction — all CPU-native, all in-image
|
|
- No recurring LLM API costs for the 80% symbolic reasoning layer
|
|
- After sufficiency flip: pennies per day vs dollars per day for LLM-only
|
|
|
|
** Revenue models by field
|
|
|
|
| Field | Why Passepartout | Revenue Model |
|
|
|-------+------------------+---------------|
|
|
| Industrial infrastructure (refineries, power grids, manufacturing) | Offline operation, provably safe, near-zero marginal cost, mandatory audit trail | Lisp Machine appliance + SCADA certification package |
|
|
| Healthcare administration (billing, claims, prior authorization) | Rule-heavy domain, privacy-mandated, audit-driven, high per-transaction cost today | Subscription for regulatory gate packages (CPT/ICD-10/HIPAA rules), updated when CMS publishes new rules |
|
|
| Software supply chain (CI/CD security, SBOM verification) | First-order structural verification — ACL2 is natural fit, CI/CD pipeline is already a sequence of gate-checkable steps | Evaluation harness as certification service — "run our 10,000-task suite and get a provable score" |
|
|
| Regulatory compliance (GDPR, SOC2, SOX, GxP) | Rule-completeness, active enforcement (not document-based), provable audit trail | Subscription for regulation-specific gate packages — GDPR package, SOC2 package, FedRAMP package, updated when regulations change |
|
|
| Defense and classified environments | Air-gapped operation, classification-level gate rules, Merkle provenance is court-admissible evidence | Government contract + hardened appliance with hardware root of trust |
|
|
|
|
** Critical insight: encoding cost drops to near-zero for codified domains **
|
|
|
|
Laws, regulations, standards, procedures, and technical specifications are
|
|
already written down in structured text. The LLM does not need to *reason*
|
|
about them — it needs to *translate* them into gate rules and ACL2 theorems.
|
|
|
|
Example: The US Federal Acquisition Regulation (FAR) is ~2,000 pages of
|
|
"thou shalt" and "thou shalt not" statements. A frontier LLM can ingest
|
|
the FAR and produce a plist of gate rules:
|
|
- (if contract > $250K AND not small-business-set-aside → :deny)
|
|
- (if sole-source AND no justification-documented → :deny, produce-justification)
|
|
|
|
ACL2 then verifies the rule set for internal consistency (Phase 6). Screamer
|
|
checks against existing compliance facts. The human reviews the bootstrap
|
|
output and approves or corrects individual rules.
|
|
|
|
The key distinction: the LLM is not *extracting knowledge from prose* in the
|
|
way Phase 3 archivist does (which is open-ended, noisy, requires grounding).
|
|
It is *translating a known rule system into a formal representation* — a
|
|
mechanical transformation of structured text into structured rules. The
|
|
result is not "the LLM's best guess at the rules" but "the rule set as
|
|
stated in the source document, mechanically transcribed."
|
|
|
|
For domains where the knowledge is codified as text, the gate-rule encoding
|
|
time drops from weeks to hours. The only bottleneck is human review of the
|
|
output — and the system can assist here by surfacing contradictions for
|
|
resolution rather than requiring a full line-by-line audit.
|
|
|
|
** What can actually be monetized (TLDR)
|
|
|
|
1. **Pre-loaded bootstrapping chains for specific verticals** — domain gate
|
|
rules, pre-seeded fact stores, mature proof forests. Saves the buyer
|
|
months of bootstrapping. Distributed as data packages under commercial
|
|
license, not AGPL.
|
|
|
|
2. **Evaluation harness as certification service** — "Bring your agent,
|
|
we'll run it through our suite and give a Merkle-verified score."
|
|
The regression suite grows with every deployment; a competitor's
|
|
regression suite starts empty.
|
|
|
|
3. **Hardened Lisp Machine appliance** — RISC-V soft-core with Lisp
|
|
microcode, pre-loaded mature Passepartout, certified for specific
|
|
verticals (IEC 62443 for industrial, HIPAA for healthcare). Value is
|
|
in integration and certification, not the AGPL software.
|
|
|
|
4. **Verified skill marketplace** — marketplace where skills are verified
|
|
(sandbox + ACL2 non-contradiction proof) before listing. Marketplace
|
|
takes a cut. Value is in the verification infrastructure, not the
|
|
skills themselves.
|
|
|
|
5. **Support and consulting** — the Red Hat model. AGPL code is free;
|
|
training, custom gate rules, ontology design, and emergency support
|
|
are paid.
|
|
|
|
* Design and Architectural Implications
|
|
|
|
** The self-improving system
|
|
|
|
Passepartout bootstraps two feedback loops:
|
|
|
|
- **Empirical loop:** gate outcomes → facts → Screamer-verified patterns →
|
|
sufficiency flip → auto-extraction. Knowledge grows without the LLM
|
|
touching most of it.
|
|
|
|
- **Logical loop:** ACL2 theorems → macro layers (generators, metafunctions,
|
|
induction DSL, abstract theories) → richer proof strategies → better
|
|
verification. Reasoning capacity grows without changing the prover binary.
|
|
|
|
These loops intersect at the fact store: proven theorems become facts, richer
|
|
facts generate better proof strategies, better strategies verify more facts.
|
|
The system upgrades itself.
|
|
|
|
** The 10-80-10 becomes approximately true
|
|
|
|
- 10%: LLM handles input translation (natural language → structured goal)
|
|
and output formatting (structured result → natural language)
|
|
- 80%: Symbolic engine handles reasoning — Screamer plans, ACL2 verifies,
|
|
VivaceGraph retrieves facts. Zero LLM tokens.
|
|
- The cost curve inverts: verification is cheaper than generation.
|
|
|
|
** Key implications
|
|
|
|
1. **Verification becomes cheaper than generation.** Once macro layers are
|
|
mature, proving a new rule non-contradictory costs near-zero. The LLM
|
|
proposes; the symbolic engine accepts or rejects.
|
|
|
|
2. **Trust scales with use.** Every interaction produces a structurally
|
|
verified outcome. Non-lossy fact base grows. Proof forest thickens. An
|
|
auditor can inspect the Merkle tree of gate outcomes and trace any
|
|
decision to its root theorem.
|
|
|
|
3. **Degradation is reversible.** Every proof layer is a hot-reloadable
|
|
skill. Every fact has provenance. A bad metafunction is unloaded;
|
|
theorems proven under it are flagged for re-verification; the fact
|
|
store retains the pre-upgrade ontology version.
|
|
|
|
4. **The system can diagnose its own logical frontier.** If ACL2 keeps
|
|
failing on a class of properties, and the failure mode is structural
|
|
(not solvable by more macros), the fact store accumulates a pattern:
|
|
"These N properties are first-order inexpressible." This signals the
|
|
human: the system needs a CIC prover (dependent types) for this domain.
|
|
The system cannot transcend its logic without external intervention —
|
|
but it can surface the boundary precisely.
|
|
|
|
** The Lisp Machine endpoint
|
|
|
|
If the system designs and builds itself on Lisp Machine hardware:
|
|
- The same system that proves theorems also optimizes the microcode
|
|
- No OS boundary, no driver layer — system and proof environment are one
|
|
- A RISC-V soft-core with Lisp microcode is manufacturable at older fab
|
|
nodes (28nm, 45nm) — sovereign intelligence without GPU supply chains
|
|
|
|
** Social implications
|
|
|
|
- **Concentration of reasoning.** The macro layers become opaque to anyone
|
|
who doesn't understand the bootstrapping history. The system understands
|
|
its own reasoning better than its users do.
|
|
|
|
- **Cost advantage widens inequality asymmetrically.** The first instance
|
|
to reach maturity requires significant gate-rule design (from hours for
|
|
codified domains to months for tacit ones). After that, replication is
|
|
cheap. Organizations that invest early have a permanent cost advantage
|
|
over those that wait for a turnkey product.
|
|
|
|
- **Sovereign artifact.** A self-building system on its own hardware does
|
|
not depend on cloud APIs, GPU supply chains, or proprietary model
|
|
weights. Its intelligence is generated, verified, and sustained locally.
|
|
Enables sovereign AI for nations without GPU access.
|
|
|
|
* Open Questions
|
|
|
|
1. Can CIC (dependent type theory) be implemented as a Passepartout skill,
|
|
verified for crash-freedom and rule fidelity by ACL2, and integrated
|
|
into the existing fact store API? The Gödelian boundary: ACL2 can
|
|
verify the kernel's implementation but not its soundness in any
|
|
absolute sense — but this matches current practice (Lean 4's ~500 line
|
|
C++ kernel is trusted, not proved).
|
|
|
|
2. Can the system generate novel proof strategies? A sufficiently rich
|
|
abstract theory layer + Screamer could propose: "Proofs in domain X
|
|
all use induction schema Y. Generalizing to Z would prove new
|
|
properties across A, B, C." The LLM translates to a metafunction;
|
|
ACL2 verifies it; the prover gains a new tactic invented by itself.
|
|
|
|
3. What is the social contract for a system that can truthfully say
|
|
"I know this is correct" — and "I know what I don't know"?
|
|
Most current AI systems can do neither.
|
|
|
|
* Impact on the AI and GPU Industry
|
|
|
|
If a symbolic-bootstrapping architecture becomes popular — especially now
|
|
that codified domains can be ingested at near-zero encoding cost — the
|
|
industry structure shifts fundamentally.
|
|
|
|
** Token demand compresses
|
|
|
|
The entire AI industry (OpenAI, Anthropic, Google — ~$50B API revenue) is
|
|
built on per-token pricing: metered cognition. A mature Passepartout
|
|
reduces token consumption to the unfamiliar 10% I/O boundary. Token demand
|
|
shifts from "every interaction burns tokens" to "only unfamiliar
|
|
interactions burn tokens." Steady-state per-user LLM consumption drops by
|
|
an order of magnitude.
|
|
|
|
** GPU inference demand plateaus in regulated industries
|
|
|
|
GPU inference is driven by two things: training and per-request inference.
|
|
Training demand is unaffected (frontier models still train on clusters).
|
|
Inference demand drops 80-90% in any sector where the rule book is
|
|
published — which covers most economically significant sectors (finance,
|
|
healthcare, industrial, government procurement, legal compliance).
|
|
|
|
Nvidia's growth narrative shifts from "every transaction goes through a
|
|
GPU" to "every training run needs a GPU, and the generative 20% needs
|
|
inference." A smaller inference TAM than current market pricing assumes.
|
|
|
|
** Hyperscaler competition shifts
|
|
|
|
The competitive thesis "AI is the next OS, and we own the compute layer"
|
|
weakens if the most valuable AI workloads run on a $500 RISC-V board on
|
|
your premises. The hyperscalers respond by:
|
|
- Offering Passepartout as a managed service (AGPL allows this)
|
|
- Differentiating on the frontier I/O API and world model API
|
|
- Competing on gate rule libraries for specific industries
|
|
|
|
The race shifts from "who has the most H100s" to "who has the best
|
|
domain-specific gate rules." Google's industry data advantage matters
|
|
more than Azure's raw compute.
|
|
|
|
** New hardware tier: verification appliances
|
|
|
|
A new category emerges: CPU-native verification appliances running a Lisp
|
|
microcode on RISC-V cores. Low volume (hundreds of thousands/year),
|
|
high margin ($5K-50K/unit), high switching costs. The Sun Microsystems
|
|
model, not the Intel model. Manufacturable at older fab nodes (28nm,
|
|
45nm) — no dependency on TSMC's leading edge.
|
|
|
|
** The key uncertainty and its resolution
|
|
|
|
Original question: how long does gate-rule encoding take?
|
|
|
|
Resolution: for codified domains, near-zero. The LLM translates published
|
|
regulations into formal rules in one pass — it is a mechanical transformation,
|
|
not open-ended reasoning. The bottleneck only exists for tacit, oral, unwritten
|
|
knowledge (craft expertise, organizational culture).
|
|
|
|
Consequence for the transition timeline: Phase 2 (sufficiency) happens
|
|
within months for any domain whose rule book is published. The disruption
|
|
accelerates from years to quarters.
|
|
|
|
* Broader Insights
|
|
|
|
** The historical fork: why C won economically in the 1980s
|
|
|
|
C won because the economics of 1980s hardware made Lisp's overhead
|
|
unaffordable:
|
|
|
|
- **Memory cost.** DRAM was ~$5,000/MB in 1980. Lisp's runtime (SBCL today
|
|
is ~40MB) was unthinkable. C's runtime fit in 64KB.
|
|
- **CPU speed.** 1-10MHz. Every instruction counted. Lisp's GC, type dispatch,
|
|
and dynamic allocation consumed cycles that C spent on actual work.
|
|
- **Software scale.** Programs were thousands of lines, not millions. A single
|
|
developer could hold the entire program in their head and verify correctness
|
|
by reading it. Testing was sufficient. Formal verification was unnecessary
|
|
overhead.
|
|
- **Market dynamics.** The PC market was expanding exponentially. Speed to
|
|
market, volume, and unit cost mattered more than correctness. A buggy $500
|
|
PC sold more units than a correct $50,000 Lisp Machine.
|
|
- **Hardware ecosystem.** RISC (reduced instruction set) was the revolution.
|
|
Simpler chips, higher clock speeds, cheaper fabrication. RISC CPUs are
|
|
optimized for C's execution model because C was the dominant systems
|
|
language when RISC was designed.
|
|
|
|
Lisp lost not because it was worse, but because the market optimized for
|
|
a different axis: raw throughput per dollar, not correctness per line.
|
|
|
|
** What changed to make Lisp viable now
|
|
|
|
Four transformations flipped the economics:
|
|
|
|
1. Memory is free. 40MB runtime is noise on a $20 Raspberry Pi with 8GB
|
|
RAM. The cost of the runtime is now zero at any relevant scale.
|
|
|
|
2. Transistors are free. A modern ARM Cortex-A72 has billions of
|
|
transistors. The GC, type dispatch, and dynamic dispatch that Lisp
|
|
needs are executed in dedicated silicon within the CPU — they cost
|
|
nothing because the transistors are there whether used or not.
|
|
|
|
3. Software complexity saturates human verification. Systems are now
|
|
tens of millions of lines. No single person can hold them in their
|
|
head. Testing is necessary but insufficient — zero-day vulnerabilities
|
|
prove that bugs survive all testing. Formal verification is no longer
|
|
overhead; it is the only known path to correctness at this scale.
|
|
|
|
4. The cost of failure is now higher than the cost of verification. A
|
|
single breach costs millions. A compliance failure shuts down a
|
|
factory. Regulation (GDPR, SOX, HIPAA, FedRAMP) mandates provable
|
|
compliance. The cost of proving correctness is now cheaper than the
|
|
cost of not proving it.
|
|
|
|
** The key insight
|
|
|
|
The 1980s trade-off was: C is cheap enough for the market. Correctness
|
|
is a luxury the market cannot afford.
|
|
|
|
The 2020s trade-off is: C is expensive for the market. Incorrectness
|
|
has become the dominant cost of software. Lisp's verification infrastructure
|
|
is now the cheaper option.
|
|
|
|
This is the inversion Passepartout exploits: the verification appliance
|
|
(AGPL symbolic engine + RISC-V Lisp μcode on FPGA) costs $5,000/year and
|
|
replaces $500,000/year in compliance audits, breach litigation, and
|
|
regulatory fines. The 1980s math said Lisp was too expensive at any price.
|
|
The 2020s math says Lisp is the only affordable option.
|
|
|
|
The remaining question is not whether the economics flipped — it's whether
|
|
anyone builds the bridge from today's AGPL software to tomorrow's
|
|
verification appliance. Passepartout is that bridge.
|
|
|
|
** Distribution and lifecycle management
|
|
|
|
Passepartout diverges in two independent dimensions across instances:
|
|
code (engine, macro layers, skills) and knowledge (gate rules, fact store,
|
|
ontology, proof forest, empirical decision history). This creates a
|
|
distribution problem no current AI system faces — most agents ship as
|
|
static binaries with no per-instance knowledge divergence.
|
|
|
|
*** Distribution tiers
|
|
|
|
Code only (AGPL repo — GitHub)
|
|
- The core engine, ACL2 macro layers as skills, generic gate rules.
|
|
- No domain knowledge, no pre-seeded fact store, no proof forest.
|
|
- User bootstraps from scratch — the system learns from their gate outcomes.
|
|
- Audience: hobbyists, researchers, early adopters willing to invest time.
|
|
|
|
Code + basic knowledge (commercial data package)
|
|
- Pre-seeded fact store for a domain (healthcare, finance, infrastructure).
|
|
- Curated gate rules, ontology categories, and initial ACL2 theorem set.
|
|
- The buyer gets sufficiency faster — weeks instead of months to reach
|
|
the Phase 4 flip in their domain.
|
|
- Audience: enterprises that want fast deployment without bootstrapping.
|
|
|
|
Code + knowledge + verification appliance (hardware product)
|
|
- RISC-V + Lisp μcode on FPGA or custom ASIC.
|
|
- Pre-loaded with full domain-specific knowledge package.
|
|
- Hardened, certified, no-cloud, tamper-proof HSM root of trust.
|
|
- Audience: regulated industries that need provable compliance out of
|
|
the box.
|
|
|
|
*** The upgrade problem
|
|
|
|
Once instances diverge in both code and knowledge, a naive `git pull`
|
|
breaks things:
|
|
|
|
1. A new macro layer expects a fact structure the old store doesn't have
|
|
→ re-verification fails, system degrades to pre-macro state
|
|
|
|
2. Two instances of the same version develop different ontologies
|
|
→ a shared gate rule package produces different outcomes on each
|
|
|
|
3. Knowledge learned in v1 must be verified against v2's code
|
|
→ old facts carry `:ontology-version`, ACL2 checks they're still
|
|
consistent under the new code's rules
|
|
|
|
4. A security patch changes the gate stack
|
|
→ all existing gate-outcome facts must be re-verified against the
|
|
new gate vectors
|
|
|
|
*** How Passepartout's architecture handles this
|
|
|
|
The architecture already has the primitives for safe upgrades:
|
|
|
|
- **Ontology versioning (Phase 5).** Every fact stores the ontology
|
|
version at assertion. On upgrade, facts with old versions are flagged
|
|
for re-verification. ACL2 checks consistency against new code. Facts
|
|
that survive are promoted to the new version. Facts that fail are
|
|
flagged for review.
|
|
|
|
- **Degradation, not crash.** If an upgrade breaks the fact store, the
|
|
system degrades to the pre-macro symbolic state (hash-table fallback
|
|
for VivaceGraph, text-scan fallback for Screamer). It still works —
|
|
it just can't prove as much. The TUI displays: "Symbolic index: legacy
|
|
mode. 2,134 facts pending re-verification under new ontology."
|
|
|
|
- **Reversible upgrades (Phase 0 undo).** Every upgrade produces a
|
|
Merkle snapshot before applying. If re-verification fails, the system
|
|
rolls back to the pre-upgrade state. The failed upgrade becomes a
|
|
memory-object with `:provenance :failed-upgrade` — empirical data
|
|
for the next upgrade attempt.
|
|
|
|
- **Delta distribution.** Upgrades are delivered as diffs against the
|
|
current ontology version, not full replacements. A code upgrade ships
|
|
with a migration script: "This version changes the gate rule structure
|
|
from (action pattern) to (action pattern context meta). Here is the
|
|
automated migration for your existing gate rules." The user's
|
|
empirical data is preserved across the migration.
|
|
|
|
*** The real challenge: divergence of empirical knowledge
|
|
|
|
The hardest upgrade is not code — it's the fact store. Two instances
|
|
that started from the same code but served different users for a year
|
|
have diverged ontologies. Instance A's gate rules encode "emergency
|
|
restart = docker compose restart" (hospital IT). Instance B's gate rules
|
|
encode "emergency restart = emergency_shutdown.py" (factory floor).
|
|
|
|
A code upgrade that changes how emergency restarts are processed cannot
|
|
be safely applied to both instances with the same migration. The
|
|
migration must be per-instance, tested against the instance's specific
|
|
facts.
|
|
|
|
The pattern that solves this: **the upgrade is itself verified by the
|
|
upgraded system before committing.** The distributor ships: "Here is
|
|
the new `emergency-restart` gate vector. Run it against your existing
|
|
gate rules. ACL2 will report which rules are compatible and which need
|
|
human review." The instance operator reviews only the incompatible
|
|
subset.
|
|
|
|
This is slower than `git pull` but avoids the catastrophe of a silent
|
|
incompatibility. An enterprise running Passepartout in a compliance
|
|
context would not accept an unverified upgrade anyway.
|
|
|
|
*** The business model for upgrades
|
|
|
|
- Code upgrades: free (AGPL). Core engine improvements benefit everyone.
|
|
- Migration scripts: subscription. The instance operator pays for the
|
|
verified migration path from their current ontology version to the
|
|
new one. The migration script is generated by running the new code
|
|
against the instance's fact store in a sandbox.
|
|
- Domain knowledge package upgrades: subscription. When HIPAA updates,
|
|
the healthcare gate rule package updates. The new rules are verified
|
|
against the instance's existing fact store before shipping.
|
|
- Verification appliance firmware: bundled with hardware. The Lisp
|
|
Machine's microcode is updated as the engine evolves. The update is
|
|
signed and verified against the hardware root of trust before
|
|
applying.
|
|
|
|
** Large refactoring in a neurosymbolic planner — semantic equivalence
|
|
|
|
*** The workflow
|
|
|
|
ACL2 proves semantic equivalence of programs written in its own
|
|
logic — which includes Passepartout's own source code. When the
|
|
system refactors its own skills, ACL2 can prove the new function
|
|
produces the same outputs for all inputs as the old one. This is
|
|
standard ACL2 practice (verifying compiler optimizations, sort
|
|
algorithm replacements).
|
|
|
|
For other languages (Python, Java, JavaScript), the path is:
|
|
1. Model the critical subset (API surface, contracts, data
|
|
transformations) in ACL2 as a logical specification
|
|
2. Prove the specification is preserved across the refactoring
|
|
3. The actual implementation stays in the target language —
|
|
ACL2 proves the structural contract, not the runtime behavior
|
|
|
|
The CIC prover upgrade (Lean-in-Lisp, planned as future work)
|
|
would extend this to dependent-type-level equivalence proofs
|
|
across language boundaries — verifying that a Rust API binding
|
|
correctly wraps a C library, or that a Python refactoring
|
|
preserves the type-level contract of the original.
|
|
|
|
** The self-driving Lisp Machine on FPGA or Tenstorrent
|
|
|
|
A Tenstorrent P150 (~72 RISC-V Tensix cores on a PCIe card) or
|
|
a mid-range FPGA (AMD Alveo, Intel Agilex) offers enough
|
|
hardware to run a full Passepartout image with Lisp microcode
|
|
acceleration. The host Linux system provides boot, I/O, and
|
|
thermal management; the accelerator card provides the Lisp
|
|
execution fabric.
|
|
|
|
*** What it can do today
|
|
|
|
- **Run the full symbolic engine.** ACL2, Screamer, VivaceGraph,
|
|
and the fact store are pure Lisp — they run on any Lisp backend.
|
|
The RISC-V cores on a Tenstorrent or the soft-core on an FPGA
|
|
provide enough compute for real-time gate verification and
|
|
constraint solving.
|
|
|
|
- **Hot-reload skills and macro layers.** The Lisp image loads
|
|
skills, tangles Org files, compiles ACL2 books, and registers
|
|
metafunctions — all without reboot. The FPGA fabric can be
|
|
reprogrammed with new microcode in milliseconds.
|
|
|
|
- **Manage its own knowledge base.** The fact store grows and
|
|
evolves. Gate rules are proposed by the LLM and verified by
|
|
ACL2. Ontology versions are tracked. The system knows what
|
|
it knows and what changed.
|
|
|
|
- **Roll back failed upgrades.** Merkle snapshots provide
|
|
instant undo for both software state and FPGA configuration.
|
|
|
|
*** What it needs to cross the threshold to self-driving
|
|
|
|
The system is not yet fully self-driving because three things
|
|
still require external intervention:
|
|
|
|
1. **The LLM dependency.** The 10% I/O translation (natural
|
|
language → structured goal, structured result → natural
|
|
language) requires an LLM. A small local model (Phi-4,
|
|
Qwen 2.5) on the host or card can serve this. The symbolic
|
|
engine handles everything else. Once sufficiency flips
|
|
(Phase 4), even the LLM is rarely needed.
|
|
|
|
2. **Hardware driver development.** The FPGA microcode (tagged
|
|
memory, hardware GC, Lisp dispatch in hardware) is currently
|
|
written by humans. The system could eventually propose new
|
|
microcode patterns from profiling data — "your GC accounts
|
|
for 12% of runtime; here is a hardware GC barrier that
|
|
reduces it to 3%" — but the synthesis and verification of
|
|
hardware descriptions (VHDL, Verilog) requires a separate
|
|
toolchain.
|
|
|
|
3. **The initial bootstrap.** The first FPGA load, the first
|
|
Linux boot, the first Lisp image — these are done by a
|
|
human or a pre-existing system. Once bootstrapped, the
|
|
system manages itself. The threshold is crossed when the
|
|
system can design, compile, and load its own FPGA microcode
|
|
from within the running image.
|
|
|
|
*** The threshold
|
|
|
|
The self-driving threshold is crossed when the system can
|
|
synthesize and load its own FPGA microcode or Tensix dispatch
|
|
programs from within the running Lisp image. At that point:
|
|
|
|
- The system profiles its own gate verification latency
|
|
- It proposes a new microcoded instruction for the hot path
|
|
- It compiles Verilog from ACL2-verified specifications
|
|
- It reprograms the FPGA fabric via PCIe DMA from within SBCL
|
|
- It benchmarks the new instruction against the old one
|
|
- If throughput improves, the new microcode becomes permanent
|
|
- If not, it rolls back and tries another approach
|
|
|
|
This is not science fiction — it is the natural extension of
|
|
an architecture that already hot-reloads its own code, tracks
|
|
its own performance telemetry, and verifies its own changes
|
|
before committing them. The hardware description language is
|
|
the last abstraction boundary.
|
|
|
|
*** What stops it from being full science fiction
|
|
|
|
| Barrier | Status | Path |
|
|
|---------|--------|------|
|
|
| LLM dependency | Phase 4 flip reduces it to near-zero | Already designed |
|
|
| Hardware microcode synthesis | Most speculative | Requires hardware DSL verified by ACL2, then compiled to FPGA bitstream |
|
|
| Initial bootstrap | One-time human action | After first load, system manages itself |
|
|
| Power and thermal | Handled by host Linux | Unchanged |
|
|
| PCIe DMA from SBCL | Feasible with sb-alien + libpcie | Needs driver, but well-understood |
|
|
|
|
The Tenstorrent approach is particularly interesting because
|
|
its Tensix cores are *already* RISC-V processors. The microcode
|
|
is not FPGA logic — it's a RISC-V program. The system can write
|
|
RISC-V assembly, compile it with the RISC-V toolchain, load it
|
|
onto the Tensix cores, and benchmark the result. This is
|
|
dramatically simpler than FPGA synthesis because it's software,
|
|
not hardware.
|
|
|
|
A Tenstorrent P150 running Passepartout would be: 72 RISC-V
|
|
cores running Lisp microcode, one core dedicated to the ACL2
|
|
prover, one to Screamer, the rest to gate verification and
|
|
fact store operations. The host Linux system handles I/O and
|
|
the LLM. The system designs its own core dispatch logic,
|
|
loads it onto idle cores, and verifies the result with ACL2
|
|
before committing.
|
|
|
|
** How the LLM's coding ability affects the bootstrapping timeline
|
|
|
|
The LLM writes code at every stage of the bootstrapping:
|
|
1. The Lisp Machine's microcode (RISC-V dispatch, GC barriers,
|
|
tagged memory operations)
|
|
2. The CIC prover kernel (if built as a skill)
|
|
3. ACL2 macro layers for new domains
|
|
4. Gate rules for previously uncodified domains
|
|
5. The initial self-optimization proposals
|
|
|
|
At each stage, the symbolic engine (ACL2, Screamer, gate stack)
|
|
verifies the LLM's output before accepting it. The LLM proposes;
|
|
the symbolic engine disposes.
|
|
|
|
This means the LLM's coding ability is a **speed multiplier, not
|
|
a gate**. A weak LLM (3B local model) produces correct code
|
|
after N retries where the symbolic engine catches the mistakes
|
|
and feeds them back. A strong LLM (Claude Sonnet, DeepSeek, GPT)
|
|
produces correct code after fewer retries. The cost difference
|
|
is in API calls and wall-clock time, not in the correctness of
|
|
the final output — the symbolic engine guarantees that.
|
|
|
|
| Scenario | LLM quality | Retries per unit | Wall-clock per unit | Correctness |
|
|
|----------|-------------|------------------|---------------------|-------------|
|
|
| Bootstrapping with local 3B model | Low | 5-15 | 10x slower | 100% (verified) |
|
|
| Bootstrapping with frontier API | High | 1-3 | 1x | 100% (verified) |
|
|
| Bootstrapping after sufficiency flip | None (symbolic only) | 0 | Instant | 100% (verified) |
|
|
|
|
The critical transition is between row 2 and row 3: once the
|
|
symbolic engine has accumulated enough non-lossy facts about
|
|
the Lisp Machine's hardware behavior (latency profiles, GC
|
|
patterns, instruction timings), it can propose microcode
|
|
optimizations without any LLM involvement. ACL2 proves the
|
|
optimization preserves correctness; Screamer checks it against
|
|
known hardware constraints; the gate stack verifies it won't
|
|
damage the running system. Zero tokens.
|
|
|
|
This is the sufficiency flip applied to hardware. The timeline
|
|
to reach it depends on how many facts the system can gather
|
|
about its own runtime behavior, not on how good the LLM is.
|
|
|
|
** The surprising result
|
|
|
|
An LLM that is just barely competent at coding (enough to
|
|
generate syntactically valid RISC-V or Lisp that passes the
|
|
symbolic engine's checks after a few retries) is sufficient
|
|
for the entire bootstrapping chain. It takes longer — more
|
|
retries, more wall-clock — but it reaches the same endpoint:
|
|
a system that designs its own microcode without any LLM.
|
|
|
|
The LLM's coding ability determines how many API dollars and
|
|
calendar months the bootstrap requires. It does not determine
|
|
whether the bootstrap succeeds. A patient operator with a
|
|
3B local model and a Tenstorrent card reaches the same
|
|
destination as an operator with bottomless API credits — the
|
|
second arrives faster, but both arrive.
|
|
|
|
** The per-domain sufficiency flip
|
|
|
|
The sufficiency flip is not a single event. It happens
|
|
independently for each domain, and some domains never flip.
|
|
The flip point is determined by the kind of knowledge the
|
|
domain requires.
|
|
|
|
*** Knowledge types required for a flip
|
|
|
|
| Domain | Knowledge required | Can it flip? | How fast? |
|
|
|--------|-------------------|--------------|-----------|
|
|
| Shell safety, path rules | Structural only — the deployment's config, shell semantics | Immediately | Instant — ingested from config |
|
|
| Healthcare compliance | Structural (HIPAA text) + empirical (human reviews of edge cases) | Yes | Weeks — one pass of LLM translation + human review cycle |
|
|
| Codebase refactoring | Structural (dependency graph, API surface) + empirical (test suite, build results) + performance (latency, throughput) | Yes | Months — depends on how many build cycles the system has observed |
|
|
| Microcode optimization | Structural (RISC-V ISA, core topology) + performance (profiling data) | Yes | Weeks — after enough benchmark runs to characterize hardware |
|
|
| Poetry, creative writing | Neither structural rules nor empirical ground truth (beauty is subjective) | **Never** | N/A — the gate stack cannot verify aesthetic quality |
|
|
| Novel scientific discovery | Structural (known laws) + empirical (experiments) | Eventually | Years — requires experimental data the system must gather through instruments |
|
|
|
|
*** The fastest acquisition strategy per domain
|
|
|
|
The goal: reach the flip point with the fewest calendar days
|
|
and the fewest human hours.
|
|
|
|
| Knowledge type | Acquisition strategy | Calendar time | Human time |
|
|
|----------------|---------------------|---------------|------------|
|
|
| **Structural** (published rules, configs, specs) | LLM translation of source documents + ACL2 consistency verification + one-shot human review | Hours for the LLM pass, days for human review | Days — one domain expert reviewing the output |
|
|
| **Structural** (unpublished — your deployment, your codebase) | Automated scanning — the system walks your filesystem, reads your configs, builds the dependency graph | Minutes to hours | Zero — fully automated |
|
|
| **Empirical** (what happens when X?) | **Active probing** — the system does not wait for user interactions. It probes its own environment: runs shell commands in sandbox to verify gate rules, executes test suites to verify dependency graph, measures what happens when it pushes boundaries. | Hours to days | Zero — automated sandboxed probing |
|
|
| **Empirical** (what does the human prefer?) | **Contrastive queries** — instead of waiting for HITL approvals to accumulate, the system asks targeted questions: "Which of these two interpretations of the regulation is correct? This one or this one?" Each question produces a fact. | Days — batch of targeted questions answered in one session | Hours — one review session with the domain expert |
|
|
| **Performance** (latency, throughput) | **Benchmark harness** — the system runs its own workload at varying parameters, records timing data, stores it as facts with `:provenance :benchmark` | Hours — automated sweep | Zero |
|
|
| **Transfer** from related domains | **Ontology alignment** — if the system already knows compliance for GDPR, it can ask Screamer: "Which GDPR rules have the same structure as HIPAA rules? Use those as seed hypotheses, flag for human review." The transferred rules flip faster because Screamer starts from a consistent foundation, not from scratch. | Days — Screamer finds alignments automatically, human reviews the suggestions | Hours — review the cross-domain alignment suggestions |
|
|
|
|
*** The fastest path to flip any domain
|
|
|
|
1. **Ingest all published text** for the domain (laws, specs, configs)
|
|
via LLM translation. One pass. Hours.
|
|
|
|
2. **Run the benchmark harness** to measure the system's own performance
|
|
in the domain. One sweep. Hours.
|
|
|
|
3. **Run active sandboxed probes** — test each gate rule against
|
|
synthetic inputs to verify it behaves as expected. Automated.
|
|
|
|
4. **Generate contrastive queries** — Screamer identifies the 5%
|
|
of rules where the LLM's translation is most uncertain (contradicts
|
|
a transferred rule, has no precedent, has multiple valid
|
|
interpretations). Present these as yes/no questions to the human
|
|
domain expert in a single session.
|
|
|
|
5. **Start serving real interactions.** Every gate outcome generates
|
|
an empirical fact that Screamer feeds back into the rule set.
|
|
The empirical loop tightens from the first real interaction.
|
|
|
|
For a codified domain (healthcare compliance, financial regulation,
|
|
industrial safety): flip within days of step 1-4. The only bottleneck
|
|
is the domain expert's review session in step 4 — a few hours of
|
|
human time.
|
|
|
|
For an uncodified but learnable domain (codebase refactoring):
|
|
flip within weeks of step 3 (benchmark harness) + step 5 (real
|
|
interactions). No LLM translation needed — the knowledge comes
|
|
from the system probing its own environment.
|
|
|
|
For a domain that can never flip (poetry, aesthetics): the system
|
|
never reaches sufficiency. It never claims to. The TUI shows
|
|
"Symbolic index: 0 facts. This domain has no codifiable rules."
|
|
The LLM handles 100% of poetry interactions. The gate stack
|
|
only checks for safety (no shell commands, no file deletions)
|
|
and passes through everything else to the LLM. The system is
|
|
honest about its frontier.
|
|
|
|
*** Bootstrapping the Lisp Machine: all domains are software
|
|
|
|
For the concrete goal of bootstrapping a self-driving Lisp Machine,
|
|
every domain involved is software — the most codifiable domain in
|
|
existence. Code has formal specifications, documented ISAs,
|
|
deterministic behavior, and objectively testable correctness.
|
|
Every subdomain required for the bootstrap flips.
|
|
|
|
| Subdomain | Knowledge type | Flip timeline | Why |
|
|
|-----------|---------------|---------------|-----|
|
|
| RISC-V ISA, Tenstorrent Tensix dispatch | Structural (ISA spec, API docs) + performance (profiling) | Days | Published spec, deterministic hardware, benchmark harness characterizes real behavior |
|
|
| SBCL runtime internals (GC, type dispatch, threading) | Structural (source code) + performance (latency profiles) | Days | Full source available, system can instrument itself |
|
|
| ACL2 metafunctions and macro layers | Structural (the logic is ACL2's own) | Instant | The theorem language is already the system's native logic — no translation step |
|
|
| FPGA/Verilog descriptions (if FPGA path) | Structural (VHDL/Verilog semantics) + performance (timing analysis, power) | Weeks | Published language semantics, but synthesis is slower and bitstream verification is harder than RISC-V |
|
|
| CIC prover kernel | Structural (type theory rules — these ARE formal) | Days | Mathematics is the most codified domain. ACL2 already does structural verification. Building a CIC kernel that ACL2 verifies is well-understood work. |
|
|
| Operating system interfaces, device drivers | Structural (syscall API, register maps) + empirical (test results) | Weeks | Published interfaces, deterministic behavior, but hardware quirks require empirical probing |
|
|
| Compiler optimization | Structural (IR semantics, optimization passes) + performance (benchmark before/after) | Weeks | Published semantics, objective quality metric (faster = better), benchmark harness measures |
|
|
|
|
Every single subdomain flips. The only variable is calendar days
|
|
to accumulate the knowledge.
|
|
|
|
*** The fastest acquisition sequence
|
|
|
|
Optimized for minimal wall-clock time to a self-driving Lisp Machine:
|
|
|
|
**Day 1: Ingestion day**
|
|
- LLM translates: RISC-V ISA spec, SBCL source, Tenstorrent API docs,
|
|
ACL2 reference, CIC type theory rules. All structural knowledge
|
|
enters the fact store in one parallel pass.
|
|
- ACL2 verifies consistency across all ingested domains.
|
|
- Human expert reviews the 5% of rules Screamer flagged as uncertain.
|
|
One session, a few hours.
|
|
|
|
**Day 1-2: Profiling day**
|
|
- Benchmark harness sweeps all 72 Tensix cores: measure instruction
|
|
latency, memory bandwidth, GC pause distribution, dispatch overhead.
|
|
- Each measurement is a fact with `:provenance :benchmark`.
|
|
- The benchmark harness is itself verified by ACL2 (it runs inside
|
|
a controlled sandbox, bounded time, no side effects on production data).
|
|
|
|
**Day 2-3: Active probing day**
|
|
- The system generates synthetic microcode routines: short programs
|
|
that exercise specific instructions, specific GC patterns, specific
|
|
dispatch paths.
|
|
- It loads them onto spare Tensix cores, measures actual latency, and
|
|
compares against the spec.
|
|
- Discrepancies become facts: `(:entity "core-42" :relation :dispatch-latency
|
|
:value "14 cycles" :source :measured :expected "12 cycles" :provenance :probe)`.
|
|
- After ~1,000 probes, the system knows the hardware's actual behavior
|
|
better than the published spec does.
|
|
|
|
**Day 3-7: Transfer and sufficiency**
|
|
- ACL2's existing knowledge about induction, rewriting, and termination
|
|
transfers directly to verifying microcode routines (same logic, different
|
|
subject matter).
|
|
- Screamer aligns microcode verification patterns with existing gate
|
|
verification patterns — both are structural proofs over finite state.
|
|
- The benchmark facts give ACL2 a concrete cost model. ACL2 can now
|
|
prove not just correctness but also "this microcode routine is at
|
|
least 10% faster than the current implementation."
|
|
- Sufficiency flip for microcode generation: the system proposes new
|
|
dispatch routines, ACL2 verifies them, Screamer checks against
|
|
hardware constraints, the gate stack blocks anything unsafe. Zero LLM
|
|
tokens for the optimization loop.
|
|
|
|
**Week 2-4: Self-optimizing system**
|
|
- The system profiles its own gate verification latency (already
|
|
instrumented via telemetry, Phase v0.66.0).
|
|
- It identifies the hot path: "fact-query accounts for 34% of verify time."
|
|
- It generates a new dispatch routine for fact-query, targets the
|
|
nearest idle Tensix core, loads it, benchmarks, and commits if
|
|
faster.
|
|
- The ontology now includes facts about its own optimization history:
|
|
`(:entity "fact-query-dispatch-v3" :relation :speedup-baseline
|
|
:value "1.34x" :provenance :self-optimize)`.
|
|
|
|
**After the flip: purely symbolic optimization**
|
|
|
|
The LLM is no longer needed for any optimization proposal. The system
|
|
profiles, proposes, verifies, tests, and commits entirely within the
|
|
symbolic engine. The LLM remains only for the boundary: interpreting
|
|
a human's high-level goal ("make the system faster") into a structured
|
|
optimization target, and formatting the benchmark report for human
|
|
readability. Those calls shrink toward zero as the system internalizes
|
|
common optimization goals as gate rules.
|
|
|
|
*** The surprising result for bootstrapping specifically
|
|
|
|
Because every subdomain of the Lisp Machine bootstrap is software,
|
|
and software is the most codifiable domain, the entire bootstrap
|
|
can flip in **days to weeks** with a single human review session.
|
|
|
|
The bottleneck is not knowledge acquisition. It is not the LLM's
|
|
coding ability. It is the initial human review of the 5% of
|
|
ambiguous rules that Screamer flags — a session measured in hours,
|
|
not weeks.
|
|
|
|
The Tenstorrent approach makes this even faster because the
|
|
microcode is software (RISC-V assembly), not hardware (FPGA
|
|
bitstream). The system can propose, load, test, and roll back
|
|
a new dispatch routine in seconds. An FPGA path would add
|
|
synthesis time (minutes to hours per iteration), stretching
|
|
the bootstrap from days to months.
|
|
|
|
A system with a Tenstorrent P150, the AGPL Passepartout code,
|
|
a RISC-V cross-compiler, and one patient human who reviews the
|
|
contrastive queries can achieve a self-driving Lisp Machine in
|
|
under a month.
|
|
|
|
*** Size and development time estimates
|
|
|
|
These are order-of-magnitude estimates based on the existing
|
|
architecture and the roadmap's line-count breakdowns. The numbers
|
|
are small because Lisp is more expressive, the architecture reuses
|
|
primitives across domains, and the symbolic engine replaces what
|
|
would be thousands of lines of Python in a conventional agent.
|
|
|
|
**** Current Passepartout (v0.7.2, main branch)
|
|
|
|
| Component | Lines of Lisp | Status |
|
|
|-----------|---------------|--------|
|
|
| Core pipeline (perceive-reason-act, memory, skills, transport, package) | ~1,800 | Built and running |
|
|
| Gate stack (security dispatcher, permissions, policy, vault, validator) | ~900 | Built and running |
|
|
| Neuro (provider router, provider dispatch, token economics, tokenizer) | ~900 | Built and running |
|
|
| Programming tools (lisp, org, repl, literate, standards, tools) | ~1,600 | Built and running |
|
|
| Symbolic (archivist, awareness, memory, scope, events, identity, self-improve) | ~1,200 | Built and running |
|
|
| Channels (TUI main/state/view, CLI, Telegram, Signal, Discord, Slack) | ~2,300 | Built, on refactor branch |
|
|
| TUI (cl-tty migration, ongoing) | ~1,300 | In progress on refactor branch |
|
|
| Config, diagnostics, embedding, sensors, integration tests | ~700 | Built and running |
|
|
| **Total existing** | **~10,700** | |
|
|
|
|
Development time to reach v0.7.2: approximately 2 months,
|
|
one developer (April-May 2026 from git log).
|
|
|
|
**** New code to reach v1.0.0 (Neurosymbolic Maturity)
|
|
|
|
| Phase / Feature | Lines | Est. dev time (one dev) |
|
|
|-----------------|-------|------------------------|
|
|
| v0.8.0 TUI stabilization (cl-tty) | ~500 | 3-4 weeks |
|
|
| v0.9.0 Eval harness + sandbox hardening | ~200 | 1-2 weeks |
|
|
| v0.11.0 Phase 0 (type-level gates, core integrity) | ~75 | 3-5 days |
|
|
| v0.13.0 Phase 0b (signal authentication, Layer 1) | ~200 | 1-2 weeks |
|
|
| v0.15.0 Phase 1 (fact store with provenance) | ~200 | 1-2 weeks |
|
|
| v0.17.0 Phase 1a (self-preservation, quarantine, watchdog) | ~120 | 1 week |
|
|
| v0.19.0 Phase 2 (Screamer admission gate) | ~200 | 1-2 weeks |
|
|
| v0.21.0 Phase 3 (archivist as fact proposer) | ~100 | 1 week |
|
|
| v0.23.0 Phase 4 (sufficiency criterion, the Flip) | ~50 | 2-3 days |
|
|
| v0.26.0 Phase 5 (VivaceGraph, Merkle DAG, ontology versioning) | ~400 | 2-4 weeks |
|
|
| v0.28.0-28.5 Phase 6 (ACL2 base + 5 macro layers) | ~540 | 3-4 weeks |
|
|
| v0.37.0 Phase 7 (10-80-10 planner) | ~500 | 3-4 weeks |
|
|
| All polish features (skins, export, CLI, MCP, LSP, telemetry, cost, etc.) | ~1,500 | 4-8 weeks |
|
|
| Integration testing, hardening, bug fixes | — | 4-8 weeks |
|
|
| **Total new code at v1.0.0** | **~4,500** | **4-6 months (one developer)** |
|
|
|
|
**** Additional code for a self-driving Lisp Machine
|
|
|
|
| Component | Lines | Est. dev time (one dev) |
|
|
|-----------|-------|------------------------|
|
|
| RISC-V microcode for Lisp dispatch (tagged memory, GC barriers, cons cells) | ~3,000 | 1-2 months |
|
|
| PCIe DMA driver from SBCL (C + sb-alien FFI) | ~500 | 2-4 weeks |
|
|
| Tenstorrent Tensix core management (allocate, load, benchmark) | ~1,500 | 3-4 weeks |
|
|
| Profiling and benchmark harness (Phase 4 applied to hardware) | ~500 | 1-2 weeks |
|
|
| Microcode synthesis from ACL2-verified specifications | ~500 | 2-4 weeks |
|
|
| **Total for Lisp Machine hardware integration** | **~6,000** | **3-6 months (one developer)** |
|
|
|
|
Notable: the microcode for Lisp dispatch (~3,000 lines of RISC-V
|
|
assembly) is smaller than the existing Lisp codebase. The hardest
|
|
part is not the assembly — it's the verification that the assembly
|
|
correctly implements the Lisp primitives, which ACL2 handles.
|
|
|
|
**** Total system at self-driving threshold
|
|
|
|
| | Lines of code | Dev time (one dev) | Dev time (small team, 2-3) |
|
|
|---|---|---|---|
|
|
| Passepartout v0.7.2 | ~10,700 | 2 months (done) | 1 month (done) |
|
|
| To v1.0.0 | +~4,500 | 4-6 months | 2-3 months |
|
|
| Lisp Machine hardware | +~6,000 | 3-6 months | 2-3 months |
|
|
| **Total** | **~21,000** | **9-14 months** | **5-7 months** |
|
|
|
|
**** Why the numbers are small
|
|
|
|
- **Lisp is 3-10x more compact than C++ or Python** for the same
|
|
semantics. The entire gate stack (900 lines) would be 3,000-5,000
|
|
lines of Python with middleware classes and serialization glue.
|
|
- **The architecture reuses primitives cross-domain.** The Merkle
|
|
tree, the gate stack, the fact store, the ACL2 prover — each is
|
|
built once and shared by all features. There is no "compliance
|
|
package" that duplicates the "refactoring package" infrastructure.
|
|
- **The symbolic engine replaces test code.** A conventional agent
|
|
needs thousands of lines of test fixtures for behavior validation.
|
|
ACL2's proofs replace those. The tests are the theorems.
|
|
- **The LLM generates the boilerplate.** The LLM writes gate rules,
|
|
macro layer templates, and migration scripts. The symbolic engine
|
|
verifies them. The human reviews the 5% edge cases. The bottleneck
|
|
is verification throughput, not code writing.
|
|
|
|
**** Comparison with other systems
|
|
|
|
| System | Lines | Developer-years |
|
|
|--------|-------|-----------------|
|
|
| Hermes Agent | ~50,000+ | 2+ years, ~3 devs |
|
|
| Claude Code | ~100,000+ (estimated) | 3+ years, large team |
|
|
| Llama.cpp | ~200,000 | 2+ years, many contributors |
|
|
| **Passepartout self-driving** | **~21,000** | **~1 year, 1-3 devs** |
|
|
| Symbolics Genera (1980s Lisp OS) | ~1,000,000 | ~10 years, large team |
|
|
|
|
The Symbolics comparison is instructive: they built a full Lisp
|
|
operating system from scratch in assembly and Lisp, with graphical
|
|
interface, networking, file system, and development environment.
|
|
Passepartout runs on Linux, which provides the OS layer for free.
|
|
The Lisp Machine hardware integration is a PCIe card, not a
|
|
replacement of the entire host. The scope is dramatically smaller.
|
|
|
|
The surprising result: **a self-driving Lisp Machine is a ~21,000
|
|
line project for a small team working less than a year.** Not a
|
|
billion-dollar moonshot. A well-scoped engineering project.
|
|
|
|
*** Revised time estimate given actual velocity
|
|
|
|
Moving from v0.4.0 to v0.7.2 (three minor versions covering TUI,
|
|
streaming, gate trace, HITL, Merkle audit, tool hardening, session
|
|
rewind, undo/redo, skills engine) in a single session means the
|
|
agent writes the code and the symbolic engine verifies it at a
|
|
cycle measured in minutes, not days.
|
|
|
|
The limiting factor is not coding speed. It is:
|
|
1. LLM API call latency per iteration (seconds per generation)
|
|
2. ACL2 verification time per submission (milliseconds per theorem)
|
|
3. Human review of Screamer-flagged edge cases (the 5%)
|
|
|
|
For the 4,500 lines remaining to v1.0.0, distributed across ~40
|
|
independent features (each 50-500 lines), with the agent generating,
|
|
ACL2 verifying, and the human reviewing only the flagged 5%:
|
|
|
|
| Phase | Lines | Cycles | Wall clock |
|
|
|-------|-------|--------|------------|
|
|
| TUI stabilization + eval harness | ~700 | 10-14 | Days |
|
|
| Phases 0-4 (type gates, fact store, Screamer, archivist, sufficiency) | ~670 | 10-14 | Days |
|
|
| Phase 5 (VivaceGraph, Merkle DAG, ontology versioning) | ~400 | 6-10 | Days |
|
|
| Phase 6 (ACL2 base + 5 macro layers) | ~540 | 8-12 | Days |
|
|
| Phase 7 (10-80-10 planner) | ~500 | 8-10 | Days |
|
|
| Polish features (skins, export, CLI, MCP, LSP, telemetry, etc.) | ~1,500 | 20-30 | 1-2 weeks |
|
|
| Integration, edge-case hardening, cross-phase regression | — | — | 1-2 weeks |
|
|
| **Total to v1.0.0** | **~4,500** | **~80 cycles** | **3-5 weeks** |
|
|
|
|
The bottleneck at this velocity is not code generation. It is
|
|
human availability to review the Screamer-flagged 5%. At 80 cycles
|
|
across 40 features, that is roughly 4 flagged rules per feature,
|
|
200 total, each requiring a yes/no answer from the human. In a
|
|
dedicated review session: 2-3 hours of human time.
|
|
|
|
For the Lisp Machine hardware integration (microcode, PCIe DMA,
|
|
Tensix management, benchmark harness — ~6,000 lines):
|
|
|
|
| Component | Lines | Cycles | Wall clock |
|
|
|-----------|-------|--------|------------|
|
|
| RISC-V microcode for Lisp dispatch | ~3,000 | 20-30 | 1-2 weeks |
|
|
| PCIe DMA driver (C + sb-alien FFI) | ~500 | 4-6 | Days |
|
|
| Tensix core management | ~1,500 | 10-15 | Days |
|
|
| Benchmark harness + microcode synthesis | ~1,000 | 8-12 | Days |
|
|
| **Total hardware integration** | **~6,000** | **~60 cycles** | **2-4 weeks** |
|
|
|
|
The Lisp Machine hardware integration is slower per cycle because
|
|
the microcode must be loaded onto physical hardware and benchmarked.
|
|
Each cycle includes: generate → ACL2 verify → load onto Tensix →
|
|
run benchmark → measure → feed back. That adds seconds per cycle
|
|
vs milliseconds for pure-software verification.
|
|
|
|
The total to a self-driving Lisp Machine (Logos + Stoa hardware):
|
|
~140 cycles, 6-10 weeks, 4-6 hours of human review time.
|
|
|
|
For the full Stoa (editor, browser, shell, Qt integration):
|
|
|
|
Stoa is not written from scratch. It is first assembled from
|
|
existing components, then systematically replaced. The initial
|
|
assembly is fast:
|
|
|
|
| Stage | Approach | Lines | Cycles | Wall clock |
|
|
|-------|----------|-------|--------|------------|
|
|
| Qt/EQL5 shell (minimal) | Wrap existing Qt widgets | ~500 | 4-6 | Days |
|
|
| Lish editor (minimal) | Org buffer + Qt text widget | ~1,000 | 8-10 | Days |
|
|
| Nyxt browser Stage 1 | Qt + WebKit, wrap existing API | ~2,000 | 10-15 | 1-2 weeks |
|
|
| **Stoa v2.0.0 working** | **~3,500** | **~30 cycles** | **2-3 weeks** |
|
|
|
|
After v2.0.0, erosion begins. Each replacement is a self-contained
|
|
project where the system proposes the replacement, ACL2 verifies
|
|
it produces identical output for all known inputs, and the system
|
|
loads it. The timeline is no longer measured in cycles — it is
|
|
measured in how many verifiable replacements the system can propose
|
|
and test before settling on the optimal implementation.
|
|
|
|
The total from today to a fully self-driving Lisp Machine with a
|
|
working editor, browser, and shell: approximately 8-12 weeks with
|
|
the actual observed velocity. Not years.
|
|
|
|
*** Self-writing beyond the bootstrap
|
|
|
|
Once the system achieves sufficiency for software engineering
|
|
(Phase 4 flip applied to code generation), the bulk of Stoa and
|
|
Agora is written by the system itself:
|
|
|
|
| System | Human writes | System writes | Total |
|
|
|--------|--------------|---------------|-------|
|
|
| Logos (Passepartout) | ~10,700 existing + ~4,500 to v1.0.0 | The system extends its own macro layers and fact store | ~15,000 + growing |
|
|
| Stoa (environment) | Design decisions, architectural constraints | ~100,000+ lines of editor, browser, shell, layout engine, each component verified by ACL2 before loading | ~150,000+ |
|
|
| Agora (network) | Protocol specification, threat model | DIDComm implementation, Relay Network, PDS, Lightning integration, contracts — each module verified by ACL2 | ~100,000+ |
|
|
| Hardware (tagged RISC-V) | ISA design, TinyTapeout shuttle | VHDL/Verilog for tagged memory, GC bus master, Lisp primitives — synthesized and tested via FPGA | ~50,000+ |
|
|
|
|
The human time is dominated by design decisions, not code writing.
|
|
Code writing is the agent's job. The bottleneck shifts from "how
|
|
many lines can I write per day" to "how many design decisions can
|
|
I make per day and how quickly can I review the 5% of ambiguities
|
|
Screamer flags."
|
|
|
|
At the observed velocity (v0.4.0 to v0.7.2 in a day), a
|
|
deep-thinking human paired with this architecture can go from
|
|
today's Passepartout to the full Logos + Stoa + Agora triad in
|
|
approximately 3-6 months — most of that time spent on design
|
|
decisions and protocol specification, not on code.
|
|
|
|
The triad, when complete, replaces every layer of the current
|
|
computing stack — cognition (OpenAI/Anthropic), environment
|
|
(Apple/Microsoft/Google), network (Facebook/Twitter/Slack) —
|
|
with Lisp-native, user-owned, ACL2-verified alternatives that
|
|
cost near-zero marginal compute. The lines that run the modern
|
|
internet (tens of millions across Google, Meta, Amazon, Apple,
|
|
Microsoft) are replaced by a single coherent architecture where
|
|
one gate stack verifies everything and one prover proves
|
|
everything consistent.
|
|
|
|
The social and economic impact of this is not "a better AI agent."
|
|
It is a complete alternative infrastructure for personal computing
|
|
that requires no cloud, no gatekeeper, no per-token fee, and no
|
|
trust. The lines don't need to exist on day one. They need to
|
|
exist in the right order — and the system writes them in that
|
|
order, one ACL2-verified submission at a time.
|
|
|
|
**** Market size and business models
|
|
|
|
The triad directly addresses markets that currently spend over a
|
|
trillion dollars annually combined:
|
|
|
|
| Market | Annual spend | What the triad replaces |
|
|
|--------|-------------|------------------------|
|
|
| Cloud computing (AWS, GCP, Azure) | ~$300B | Verification appliance runs locally — no per-resource billing |
|
|
| AI/LLM API revenue (OpenAI, Anthropic) | ~$50B | Near-zero marginal cost symbolic engine |
|
|
| Operating systems (Microsoft, Apple) | ~$100B | Stoa (Lisp-native editor, browser, shell) |
|
|
| Social media / communication (Meta, Twitter, Slack, Discord) | ~$200B | Agora (DID-based, encrypted, permissionless) |
|
|
| Identity / SSO (Okta, Auth0, Google/Apple) | ~$10B | Self-sovereign DID + HD keys |
|
|
| Payment processing (Stripe, PayPal, Visa/MC) | ~$200B | Lightning + smart contracts |
|
|
| Productivity software (Microsoft, Google) | ~$50B | Lish + Org-mode + Stoa |
|
|
| Compliance and audit (regulatory) | ~$50B | Automated ACL2-verified compliance |
|
|
| **Total addressable** | **~$960B** | |
|
|
|
|
The triad does not need to capture all of this. It needs to capture
|
|
the portion that is willing to pay for provable correctness, lower
|
|
cost, and user sovereignty. Even 1% of this TAM is ~$10B/year.
|
|
|
|
**** Business models for a free-software triad
|
|
|
|
The AGPL license is not a barrier to monetization — it is the
|
|
foundation of the trust model. An enterprise cannot buy provable
|
|
correctness from closed source; the code must be inspectable. The
|
|
revenue comes from what the AGPL does not cover:
|
|
|
|
***** Low-hanging fruit (immediate, months)
|
|
|
|
1. **Verification appliance (hardware).** An FPGA or Tenstorrent
|
|
card pre-loaded with a mature Passepartout image, domain-specific
|
|
gate rules, and a hardware root of trust. No cloud dependency.
|
|
Target: regulated industries that need provable compliance and
|
|
cannot accept cloud-based AI. Price: $5K-$50K/unit. Volume:
|
|
hundreds to low thousands in year one.
|
|
|
|
2. **Domain gate rule subscriptions.** Pre-verified gate rule
|
|
packages for specific compliance domains. HIPAA package: $50K/yr.
|
|
SOC2 package: $50K/yr. GDPR package: $50K/yr. FedRAMP package:
|
|
$100K/yr. Updated automatically when regulations change. An
|
|
enterprise with all four pays $250K/yr — and the switching cost
|
|
is high because changing packages means re-verifying the fact
|
|
store against new rules.
|
|
|
|
3. **Evaluation harness as certification.** "Run our 10,000-task
|
|
suite against your AI agent and get a Merkle-verified score."
|
|
Target: AI labs proving their agents' capabilities, enterprise
|
|
procurement requiring independent verification. Price: $50K-$200K
|
|
per certification. The regression suite grows with every deployed
|
|
instance, making the certification increasingly valuable over
|
|
time.
|
|
|
|
4. **Migration services.** "Bring your existing infrastructure into
|
|
the Passepartout gate stack." Custom gate rules, ontology design,
|
|
integration with existing systems. Price: $100K-$500K per
|
|
engagement. Each engagement feeds back edge cases into the
|
|
regression suite and domain gate packages.
|
|
|
|
Revenue estimate for year one (low-hanging fruit): 50 appliance
|
|
sales ($250K-$2.5M) + 20 gate rule subscriptions ($1M-$5M) +
|
|
10 certifications ($500K-$2M) + 5 migration engagements ($500K-
|
|
$2.5M). Total: $2.25M-$12M. Not venture-scale, but self-sustaining
|
|
for a small team.
|
|
|
|
***** Medium-term (1-3 years)
|
|
|
|
5. **Compute marketplace (Agora).** Passepartout instances offer
|
|
their symbolic engine capacity (ACL2 cycles, Screamer constraint
|
|
solving, VivaceGraph queries) to other agents on the Agora
|
|
network. The early player runs a large instance and sells compute
|
|
to smaller instances. The AGPL allows this because the marketplace
|
|
is a service, not a modification of the code. Revenue is a
|
|
percentage of each compute transaction.
|
|
|
|
6. **Relay Network (Agora infrastructure).** If Agora becomes the
|
|
default communication protocol for agent-to-agent interaction,
|
|
running Relays is a business. Every DIDComm message routes
|
|
through one or more Relays. Revenue: tiny per-message fee (fractions
|
|
of a cent) or paid priority routing. At billions of messages,
|
|
fractions become real.
|
|
|
|
7. **The Lisp Machine appliance (Stoa v5.0.0 hardware).** The tagged
|
|
RISC-V architecture running on FPGA or custom ASIC, sold as a
|
|
certified appliance for industries where correctness is worth
|
|
paying for: medical devices, industrial controllers, defense
|
|
systems, financial trading. Price: $20K-$100K/unit. If the
|
|
hardware validation succeeds on TinyTapeout, the upside is
|
|
enormous: a certified Lisp Machine at scale could capture a
|
|
significant fraction of the embedded systems market.
|
|
|
|
***** Big money (3-10 years)
|
|
|
|
8. **Verification monopoly: the regression suite as UL certification.**
|
|
The accumulated regression suite — thousands of edge cases from
|
|
every deployed instance, every bug fix, every regulatory change —
|
|
becomes the most comprehensive test of autonomous agent correctness
|
|
ever assembled. Any organization claiming a "safe AI agent" needs
|
|
Passepartout certification to prove it. This is Underwriters
|
|
Laboratory for AI — a certification nobody can ignore. Revenue:
|
|
licensing the certification mark to every AI vendor that ships
|
|
an agent. Margins: near-100% once the suite exists.
|
|
|
|
9. **Infrastructure lock-in: switching costs compound.** A hospital
|
|
that runs Passepartout with HIPAA gate rules ($50K/yr) for five
|
|
years has accumulated a fact store with a decade of compliance
|
|
decisions, a proof forest of verified rules, and an empirical
|
|
decision history tied to their specific deployment. Switching to
|
|
a competitor means discarding all of it. The accumulated value
|
|
grows as the fact store deepens. Annual revenue per enterprise
|
|
grows from $250K in year one to $500K-$1M by year five as more
|
|
domain packages are added and the fact store becomes more
|
|
valuable than the software itself.
|
|
|
|
10. **The compute marketplace at planetary scale.** If Passepartout
|
|
instances on Agora transact billions of verified operations per
|
|
day, the spread on compute transactions is enormous. This is
|
|
not a product sale — it is a bet on network effects. Every new
|
|
instance increases the value of the network (more capacity,
|
|
more diversity, more resilience). The early player that
|
|
provisions the largest compute capacity on Agora becomes the
|
|
default infrastructure provider for the entire network.
|
|
|
|
**** The investment thesis
|
|
|
|
The low-hanging fruit (appliances, gate rules, certification,
|
|
migration) generates enough cash to sustain development. The
|
|
medium-term (compute marketplace, Relay Network, Lisp Machine
|
|
hardware) builds network effects and switching costs. The big
|
|
money (verification monopoly, infrastructure lock-in, planetary
|
|
compute marketplace) is the venture-scale outcome.
|
|
|
|
The unique advantage: the early player benefits from every other
|
|
instance of the triad, because every deployed instance feeds edge
|
|
cases into the regression suite, grows the compute marketplace,
|
|
and validates the hardware designs. The network effects are positive
|
|
sum — the value of the system increases with every user, and the
|
|
early player captures a disproportionate share because they built
|
|
the infrastructure that every new instance depends on.
|
|
|
|
This is the AWS of provable computing: build the infrastructure,
|
|
let everyone use it for free (AGPL), charge for the parts that
|
|
scale (compute marketplace, certification, hardware, migration).
|
|
The switching costs compound. The network effects are positive sum.
|
|
The market is nearly a trillion dollars.
|
|
|
|
***** Additional revenue paths
|
|
|
|
** Usernames on Agora
|
|
|
|
The DID system is permissionless — anyone generates their own DID
|
|
via HD key derivation. But human-readable @handles (short names,
|
|
common English words, three-letter IDs) are naturally scarce. The
|
|
early player controls the namespace registry:
|
|
|
|
- **Free tier:** any DID can claim a namespace.username on a
|
|
first-come, first-served basis with proof of key ownership
|
|
- **Premium tier:** short names (2-3 chars), common words, brand
|
|
names, squatter prevention via auction or annual lease
|
|
- **Revenue model:** $5-$50/year per premium username, auction
|
|
revenue for highly contested names (single-letter, common
|
|
surnames). ENS-style: registration fees fund development, not
|
|
speculation.
|
|
|
|
At scale: 1M premium usernames at $10/yr average = $10M/yr
|
|
recurring. The namespace registry is a natural monopoly — the
|
|
early player's registry is the most widely accepted, so every
|
|
new user registers there. Network effects lock in.
|
|
|
|
** PDS as a service
|
|
|
|
The Personal Data Store (the Merkle fact store exposed on Agora)
|
|
can be self-hosted (free, AGPL) or hosted by the early player:
|
|
|
|
- **Free tier (self-hosted):** full AGPL PDS, runs on any Lisp
|
|
host, full control, no cost
|
|
- **Basic hosted tier:** $10-$20/month, 10GB fact store, 10M
|
|
queries/month, automated backup, automatic upgrades
|
|
- **Business hosted tier:** $100-$500/month, 100GB+ fact store,
|
|
unlimited queries, SLA, dedicated relay, compliance logging
|
|
- **Enterprise hosted tier:** $1,000-$10,000/month, air-gapped
|
|
appliance management, dedicated support, regulatory-compliant
|
|
data residency, integration with existing SIEM
|
|
|
|
This is the classic open-core SaaS model (GitLab, Sentry,
|
|
PlanetScale). The free self-hosted version drives adoption and
|
|
trust (you can inspect every line). The hosted version captures
|
|
the value from users who value convenience over control. Bluesky
|
|
already demonstrated demand: they charge ~$10/month for PDS
|
|
hosting with minimal feature differentiation.
|
|
|
|
PDS as a service is lower-margin than hardware appliances but
|
|
higher-volume. Target: 100K subscribers at $15/month average =
|
|
$18M/yr recurring, with near-zero marginal cost per additional
|
|
subscriber (the symbolic engine is CPU-bound, not per-user
|
|
metered).
|
|
|
|
Combined with premium usernames: $10M/yr + $18M/yr = $28M/yr
|
|
recurring from Agora services alone before any compute marketplace
|
|
revenue. These are not speculative — they are standard internet
|
|
business models (namespace registry + SaaS hosting) applied to
|
|
a decentralized identity and data layer.
|
|
|
|
*** The full triad: Logos, Stoa, Agora
|
|
|
|
The self-driving Lisp Machine is not the endpoint. It is one
|
|
component of a three-part architecture:
|
|
|
|
| Layer | Name | Function | What it is |
|
|
|-------|------|----------|-----------|
|
|
| Logos | Passepartout | The mind | Cognitive agent, symbolic engine, gate stack, fact store, ACL2, 10-80-10 planner |
|
|
| Stoa | The Porch | The body | Editor (Lish), browser (Nyxt), shell (Lish), Org-mode filesystem, Qt/EQL5 UI, Lisp Machine hardware |
|
|
| Agora | The Society | The network | Self-sovereign identity (DID), encrypted comms (DIDComm), Personal Data Store, Relay Network, contracts, liquid democracy, compute marketplace |
|
|
|
|
**** Passepartout is the PDS
|
|
|
|
Agora's Personal Data Store (PDS) is Passepartout's in-process
|
|
memory — the Merkle tree, the fact store, the memory-objects.
|
|
Every memory-object already has a SHA-256 hash (Merkle provenance),
|
|
which maps directly to Agora's CIDv1 content addressing. The mapping:
|
|
|
|
| Passepartout memory-object | Agora Note |
|
|
|----------------------------|------------|
|
|
| org-object-id | uuid (stable) |
|
|
| org-object-hash | cid (content hash) |
|
|
| getf attrs :TITLE | payload (derived) |
|
|
| getf attrs :TAGS | routing/access-control hints |
|
|
| org-object-content | payload body |
|
|
| merkle hash | CIDv1 (same SHA-256, different formatting) |
|
|
|
|
Passepartout's gate stack verifies every action before it touches
|
|
the PDS. No unverified note escapes. ACL2 proves the access controls
|
|
are correctly enforced. Screamer checks consistency with the
|
|
instance's existing knowledge.
|
|
|
|
**** Stoa is the body
|
|
|
|
Stoa's roadmap (v2.0.0 → v6.0.0) describes the environment:
|
|
- v2.0.0: Lish editor + Nyxt browser (Stage 1, Qt/WebKit) + Lish shell
|
|
- v3.0.0+: Cannibalization — eat dependencies one by one. Replace
|
|
Qt with Lisp-native layout, reduce WebKit to pixel-painting,
|
|
eventually pure-Lisp browser and window management.
|
|
- v4.0.0: Native inference — llama.cpp FFI in-process, DSL-compiled
|
|
model architectures, live surgery on cognition (inspect hidden
|
|
states mid-inference)
|
|
- v5.0.0: Hardware — tagged RISC-V architecture via TinyTapeout,
|
|
FPGA prototype, hardware GC via dedicated bus master
|
|
- v6.0.0: True agency — world models, temporal reasoning, goal
|
|
persistence across restarts
|
|
|
|
The architectural principle: Stoa is not a collection of clients
|
|
connecting to a daemon. It is a single Lisp image where the editor,
|
|
browser, shell, and agent coexist. The Dispatcher gate stack
|
|
verifies every action regardless of who initiated it — user or
|
|
agent. The distinction between "tool" and "self" dissolves.
|
|
|
|
**** Agora is the network
|
|
|
|
Agora provides the decentralized identity and communication layer
|
|
so Passepartout instances can talk to each other:
|
|
- Self-sovereign identity via HD key derivation (BIP-44)
|
|
- Encrypted messaging via DIDComm (agent-to-agent)
|
|
- Notes as atomic, content-addressed, signed data units
|
|
- Relay Network for censorship-resistant message routing
|
|
- Compute marketplace where instances offer symbolic engine capacity
|
|
- Contracts and liquid democracy infrastructure
|
|
|
|
Agora integration is a parallel track to Passepartout's core
|
|
roadmap (v0.3.0 → v1.0.0). The identity DID and DIDComm gateway
|
|
can be built as skills at any time without core changes. The PDS
|
|
transformation (making the fact store network-addressable) waits
|
|
for v1.0.0+.
|
|
|
|
**** The complete picture
|
|
|
|
The triad replaces every layer of the modern computing stack:
|
|
|
|
| Layer | Current (Big Tech) | Triad |
|
|
|-------|-------------------|-------|
|
|
| Cognition | ChatGPT, Claude, Gemini (centralized API, per-token pricing) | Passepartout (local symbolic engine, near-zero marginal cost) |
|
|
| Environment | macOS/Windows/ChromeOS (closed platforms) | Stoa (Lisp-native editor, browser, shell, hardware) |
|
|
| Network | Facebook, Twitter, Slack, Discord (extractive, centralized) | Agora (DID-based, encrypted, user-owned, permissionless) |
|
|
| App model | Web apps + app stores (gatekeepers take 30%) | Skills + Org files (hot-reloadable, no gatekeeper) |
|
|
| Compute | Hyperscaler cloud (AWS, GCP, Azure) | Verification appliance (local, provable, near-zero marginal cost) |
|
|
| Identity | OAuth/Google/Apple SSO (surveillance-based) | Self-sovereign DID + HD key hierarchy (user-owned) |
|
|
| Commerce | Stripe, PayPal (2-3% + chargeback risk) | Lightning Network + smart contracts (permissionless) |
|
|
|
|
The line count estimate (21,000 for Passepartout + Lisp Machine)
|
|
covers only Logos + Stoa's hardware layer. Agora's full
|
|
specification spans 10 requirements documents and the implementation
|
|
(network protocol, PDS, Relay, DID identity) is a separate body of
|
|
work comparable to Passepartout itself.
|
|
|
|
The unifying factor: all three speak plists. All three operate in
|
|
Lisp address space. All three are verified by the same ACL2 prover.
|
|
The gate stack that verifies a shell command also verifies a
|
|
DIDComm message and an Agora Note publication. The symbolic engine
|
|
that plans a refactoring also plans a contract negotiation.
|
|
|
|
One gate stack. One memory model. One prover.
|
|
|
|
Large refactoring projects (extract module, rename API, split monolith)
|
|
are the hardest test for any AI agent. Current approaches (Claude Code,
|
|
Copilot) handle them probabilistically — every step costs tokens, and
|
|
there is no formal guarantee the final system is consistent.
|
|
|
|
Passepartout's Phase 7 planner (10-80-10) transforms this: the symbolic
|
|
engine handles planning, ordering, and structural verification; the LLM
|
|
handles only the code transformation itself.
|
|
|
|
*** The workflow
|
|
|
|
1. **Codebase ingestion.** The scanner walks the entire codebase, builds
|
|
an abstract syntax graph (not just flat files — imports, type
|
|
dependencies, function call chains, test coverage). Each module
|
|
becomes a fact in the fact store with `:provenance :codebase-scan`.
|
|
|
|
2. **Goal translation (LLM, 10%).** "Extract authentication into its own
|
|
service" becomes a structured goal plist:
|
|
(:goal :extract-module
|
|
:source :auth
|
|
:target-service auth-service
|
|
:files (:affected (:app/auth/* :app/middleware/auth.clj :tests/*))
|
|
:constraints (:no-breaking-api (:public-api :unchanged))
|
|
:verification (:all-tests-pass :api-contract-preserved))
|
|
|
|
3. **Constraint satisfaction (Screamer, 80%).** Screamer expresses the
|
|
refactoring as a constraint satisfaction problem:
|
|
- Variables: file modifications ordered by dependency (auth middleware
|
|
must be updated after auth module is extracted but before its callers)
|
|
- Constraints: no circular dependencies, no step that creates broken
|
|
intermediate state, no test file modified before its source
|
|
- Objective: minimize total steps while respecting all constraints
|
|
|
|
Screamer returns a viable plan or reports unsolvability with the
|
|
conflicting constraints — for example, "auth middleware and auth
|
|
module have a circular type dependency that must be resolved before
|
|
extraction."
|
|
|
|
4. **Plan verification (ACL2, part of 80%).** ACL2 proves:
|
|
- No dependency cycles in the plan (A must run before B, B before C,
|
|
C before A → rejected)
|
|
- No deadlocks (two modules waiting on each other)
|
|
- Every planned write is within the refactoring scope (no stray
|
|
modifications to unrelated files)
|
|
- The gate stack will not reject any planned command (no blocked
|
|
patterns in the refactoring scripts)
|
|
|
|
5. **Incremental execution.** The planner executes each step:
|
|
a. Take Merkle snapshot of the current state
|
|
b. LLM proposes the code transformation for this step
|
|
c. Gate stack checks the proposal (no forbidden file writes,
|
|
no shell commands outside the allowed refactoring scope)
|
|
d. Transformation is applied
|
|
e. Tests run. If they pass → commit the snapshot, update the
|
|
fact store with the new codebase graph. If they fail → roll
|
|
back to the previous snapshot, flag the LLM's proposal as
|
|
`:provenance :failed-transformation`, and attempt a corrected
|
|
proposal.
|
|
|
|
6. **Final verification.** After all steps complete, ACL2 re-verifies
|
|
the entire codebase fact store — all dependencies, all public API
|
|
surfaces, all constraints. The result is a Merkle chain from
|
|
"before" to "after" with every intermediate state verified.
|
|
|
|
*** What the symbolic engine cannot do
|
|
|
|
The fundamental limit: **first-order logic cannot prove semantic
|
|
equivalence of arbitrary programs.** ACL2 can verify that the
|
|
refactoring plan has no structural flaws, that the dependency graph
|
|
is acyclic, that every step is within scope, and that tests pass.
|
|
It cannot prove "the extracted auth service behaves identically to
|
|
the inline auth module from the caller's perspective" for a
|
|
general-purpose language.
|
|
|
|
This gap is filled by:
|
|
- **Tests as empirical verification.** The planner runs the full test
|
|
suite after each step. A passing test suite is not a proof of
|
|
correctness, but combined with ACL2's structural verification, it
|
|
is strong empirical evidence.
|
|
- **API contract checking.** For refactoring that preserves public APIs,
|
|
ACL2 can verify that the type signatures, argument counts, and
|
|
return types of the extracted module's public interface match
|
|
exactly — a structural equivalence that does not require semantic
|
|
reasoning.
|
|
- **Human review of semantic concerns.** The planner flags steps that
|
|
involve semantic choices (e.g., "the extracted auth service now
|
|
handles token refresh differently"). These steps are presented to
|
|
the developer for review with full provenance: before state, after
|
|
state, and the diff. The developer's approval or rejection becomes
|
|
a Merkle fact with `:provenance :human-reviewed`.
|
|
|
|
*** What makes this better than Claude Code
|
|
|
|
| Dimension | Claude Code | Passepartout Planner |
|
|
|-----------|-------------|---------------------|
|
|
| Planning | Prompt-based, implicit | Screamer CSP, explicit, verified |
|
|
| Step ordering | Greedy, every step costs tokens | Dependency-ordered, zero-token constraint check |
|
|
| Rollback | Limited (git reset, no per-step) | Merkle snapshot per step, instant rollback |
|
|
| Scope control | Prompt-based ("only touch auth files") | ACL2-verified write scope, cannot escape |
|
|
| Cost | $0.50-$2 per refactoring session | Near-zero (CPU cycles for planning, LLM for code only) |
|
|
| Final proof | None — you trust that tests caught everything | Merkle chain from before→after, ACL2-verified |
|
|
|
|
A software ecosystem changing hardware economics has never happened before.
|
|
Passepartout's most realistic path: verification appliances for regulated
|
|
industries — RISC-V cores with Lisp microcode on FPGA, sold as hardened
|
|
devices for healthcare compliance, defense, and industrial control.
|
|
|
|
Not a general-purpose Lisp Machine replacing laptops. A specialized device
|
|
where correctness is worth paying for. If such appliances sell in the
|
|
hundreds of thousands, the economics of a custom Lisp ASIC start to make
|
|
sense. The reversal is not Lisp returning as a general platform, but Lisp
|
|
winning a vertical important enough to justify its own silicon.
|
|
|
|
The path: Passepartout software (AGPL) → creates demand for verified
|
|
infrastructure → verification appliance (FPGA, RISC-V + Lisp μcode) →
|
|
high-volume niche → custom ASIC economics viable → Lisp native hardware
|
|
exists for the first time since the Symbolics era.
|
|
|
|
** Lisp vs C for embedded systems
|
|
|
|
- Lisp can match C for low-level work through compile-to-C paths (ECL,
|
|
PreScheme) or tiny Lisps (uLisp, FemtoLisp, BitLisp for RISC-V)
|
|
- The GC is the hard wall for hard real-time; mitigated by pre-allocation,
|
|
no-alloc hot paths, or real-time GC
|
|
- Most practical path: "Lisp as macro language for C" — generate C from
|
|
Lisp macros, ship the compiled binary. This is how NASA's Deep Space 1
|
|
worked: Lisp planning on Earth generated commands for C flight software.
|
|
- The Lisp Machine on commodity FPGA (RISC-V softcore + Lisp μcode on
|
|
Artix-7 / iCE40) is the ambitious path — Lisp down to the metal for $50.
|
|
|
|
** Microbiology works like Lisp, not C
|
|
|
|
Striking parallels:
|
|
|
|
1. Homoiconicity — DNA is code and data in the same molecule; no separate
|
|
source and binary
|
|
2. Hot-reloadable image — alternative splicing, epigenetic marks,
|
|
post-translational modifications change the running program without
|
|
restart
|
|
3. Automatic memory management — proteasomes degrade misfolded proteins,
|
|
autophagy recycles organelles; the cell never calls free()
|
|
4. Interpreted dynamic language — DNA → RNA → ribosome (interpreter) →
|
|
protein; no static compilation step
|
|
5. Self-modifying source — CRISPR, transposons, DNA repair modify the
|
|
genome at runtime; eval on the genome
|
|
6. Duck typing — protein folding depends on chemical environment, not
|
|
type declarations; interfaces are shape-matching, not compiler-checked
|
|
7. Concurrent real-time GC — apoptosis breaks down cell components for
|
|
recycling by neighboring cells; the collector is external to the object
|
|
|
|
Biology chose the Lisp model because it is more robust, adaptable, and
|
|
evolvable. Evolution paid for the overhead (GC, interpretation, dynamic
|
|
dispatch) with parallelism and redundancy. It optimized for survival in
|
|
an unpredictable environment, not peak single-thread throughput.
|
|
|
|
Biology is the proof that the Lisp model can be efficient at planetary
|
|
scale, running on hardware that self-assembles from food. The ceiling
|
|
Passepartout aims at is still far below the system that wrote itself
|
|
in DNA.
|