Compare commits
2 Commits
0821b2b10a
...
0ffad4c315
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
0ffad4c315 | ||
|
|
3b07a232cf |
409
ideas/passepartout-economics.org
Normal file
409
ideas/passepartout-economics.org
Normal file
@@ -0,0 +1,409 @@
|
|||||||
|
#+TITLE: Passepartout — Patents, Moats, Economics, Design Implications
|
||||||
|
#+AUTHOR: Hermes agent distillation of 2026-05-21 discussion with Amr
|
||||||
|
#+FILETAGS: :passepartout:agent:economics:ip:licensing:
|
||||||
|
#+STARTUP: content
|
||||||
|
|
||||||
|
* Summary
|
||||||
|
|
||||||
|
Discussion about the economic and strategic implications of Passepartout's
|
||||||
|
architecture — a self-bootstrapping agent that combines deterministic safety
|
||||||
|
gates (0 LLM tokens per verification), Merkle-tree memory with provenance,
|
||||||
|
a symbolic fact store with sufficiency criterion, and ACL2-based macro layer
|
||||||
|
bootstrapping for provable reasoning.
|
||||||
|
|
||||||
|
The central claim: this architecture decouples intelligence from LLM API
|
||||||
|
consumption. The probabilistic engine (LLM) handles ~10% input/output
|
||||||
|
translation; the symbolic engine handles ~80% of reasoning at near-zero
|
||||||
|
marginal cost. The cost curve inverts: generation is expensive, verification
|
||||||
|
is cheap.
|
||||||
|
|
||||||
|
* Patentability
|
||||||
|
|
||||||
|
** Likely patentable
|
||||||
|
|
||||||
|
- **Probabilistic-deterministic split with deterministic gates between LLM
|
||||||
|
proposal and execution.** The LLM proposes, the gate stack decides. Each
|
||||||
|
gate is a pure Lisp function costing 0 LLM tokens. Every competitor uses
|
||||||
|
prompt-based guardrails. The specific 11-vector gate stack (secret
|
||||||
|
exposure, path protection, self-build boundary, shell safety, network
|
||||||
|
exfiltration, privacy tags, Lisp syntax, credential vault, tool permissions,
|
||||||
|
policy, protocol validation) is a specific novel implementation.
|
||||||
|
|
||||||
|
- **Foveal-peripheral context model with Org-tree structured retrieval.**
|
||||||
|
Depth ≤ 2 always; full render on foveal node; full render on semantic
|
||||||
|
similarity to foveal; full render on temporal relevance (modified today,
|
||||||
|
upcoming deadlines); everything else title-only. Targets 2,000-4,000 tokens.
|
||||||
|
No agent does this.
|
||||||
|
|
||||||
|
- **Merkle-tree memory with copy-on-write snapshots and operation-level
|
||||||
|
undo/redo.** Every memory-object is content-addressed. Snapshots are
|
||||||
|
deep-copies. Undo/redo at the individual operation level. Applied to an
|
||||||
|
agent's reasoning loop.
|
||||||
|
|
||||||
|
- **Gate-to-fact bootstrap with sufficiency criterion.** Mechanically
|
||||||
|
extracting facts from the gate stack's own data structures (protected paths,
|
||||||
|
shell blocked patterns, network whitelist) as the seed of an ontology. A
|
||||||
|
measurable sufficiency threshold that flips the system from LLM-proposes
|
||||||
|
to Screamer-deduces.
|
||||||
|
|
||||||
|
- **Macro-layer-as-skill bootstrapping architecture.** Encoding theorem-proving
|
||||||
|
capability as hot-reloadable skills where each layer is verified by the layer
|
||||||
|
below. The proof forest is a Merkle-versioned dependency tree.
|
||||||
|
|
||||||
|
** Likely not patentable (known techniques in expected applications)
|
||||||
|
|
||||||
|
- ACL2 itself (decades old)
|
||||||
|
- Screamer for consistency checking (constraint solving on a triple store is
|
||||||
|
an obvious application)
|
||||||
|
- Hot-reloadable skills (Lisp images have been hot-reloadable for 40 years)
|
||||||
|
- Org-mode as a data format
|
||||||
|
- Multi-layer signal authentication (known in network security)
|
||||||
|
|
||||||
|
** Counterargument from prior art
|
||||||
|
|
||||||
|
A patent examiner will argue that:
|
||||||
|
- "Thin harness, fat skills" is the standard OS microkernel architecture
|
||||||
|
applied to an AI agent
|
||||||
|
- Foveal-peripheral context is locality of reference (standard in OS design)
|
||||||
|
- Merkle-tree memory is content-addressed storage (standard in distributed
|
||||||
|
systems)
|
||||||
|
- Deterministic gate stack is capability-based security (going back to
|
||||||
|
KeyKOS in the 1980s)
|
||||||
|
|
||||||
|
The defense: these principles have never been *combined* in an AI agent, and
|
||||||
|
the combination produces emergent effects (cost curve inversion, sufficiency
|
||||||
|
flip, self-repairing bootstrapping chain) that no single principle produces
|
||||||
|
alone. Good patent claims would cover the specific combination, not the
|
||||||
|
individual components.
|
||||||
|
|
||||||
|
** Strongest single claim
|
||||||
|
|
||||||
|
An AI agent system comprising:
|
||||||
|
1. A probabilistic language model
|
||||||
|
2. A stack of deterministic safety gates operating at zero LLM-token cost
|
||||||
|
between the model's proposal and execution
|
||||||
|
3. A Merkle-versioned memory store from which gate outcomes are mechanically
|
||||||
|
extracted as facts
|
||||||
|
4. A symbolic reasoning engine seeded by those facts with a measurable
|
||||||
|
sufficiency criterion that determines when the probabilistic model can
|
||||||
|
be bypassed
|
||||||
|
|
||||||
|
Each element is known. The combination is novel and non-obvious.
|
||||||
|
|
||||||
|
* Licensing Strategy
|
||||||
|
|
||||||
|
** AGPLv3 for the public repository
|
||||||
|
|
||||||
|
AGPLv3 closes the ASP loophole (Section 13): anyone who modifies the
|
||||||
|
software and offers it over a network must release their modified source.
|
||||||
|
This protects against proprietary forks that extract value without
|
||||||
|
contributing back.
|
||||||
|
|
||||||
|
Crucially: AGPL is a *product requirement*, not a concession to openness.
|
||||||
|
The system's value proposition is provable correctness — every decision has
|
||||||
|
Merkle provenance, the proof forest is visible, the sufficiency meter is
|
||||||
|
readable. This claim is structurally incredible with closed source. An
|
||||||
|
enterprise buyer needs to inspect the gate stack, verify the Merkle
|
||||||
|
implementation, and confirm ACL2 integration is sound. AGPL makes this
|
||||||
|
possible without signing an NDA.
|
||||||
|
|
||||||
|
** AGPL only covers modifications to code, not:
|
||||||
|
|
||||||
|
- Gate rules specific to a domain (these are data, not code)
|
||||||
|
- The fact store (empirical data generated from usage)
|
||||||
|
- Ontology categories (design decisions stored as configuration)
|
||||||
|
- Proprietary skills loaded at runtime (AGPL boundary on plugin systems
|
||||||
|
is legally unsettled)
|
||||||
|
|
||||||
|
** Dual license model
|
||||||
|
|
||||||
|
- AGPLv3 for open source — builds ecosystem, trust, and community
|
||||||
|
- Commercial license for enterprises that cannot accept AGPL (blanket
|
||||||
|
policies against AGPL infection) — MySQL/SugarCRM/GraphQL model
|
||||||
|
|
||||||
|
* Moats
|
||||||
|
|
||||||
|
** Re-evaluated: time is not the primary moat
|
||||||
|
|
||||||
|
Initial assumption: the bootstrapping chain (gate outcomes → facts →
|
||||||
|
Screamer rules → ACL2 theorems → macro layers) takes months to build,
|
||||||
|
giving first-mover advantage.
|
||||||
|
|
||||||
|
Challenge: a Phase 4+ Passepartout fed on Wikipedia + Wikidata can build
|
||||||
|
a general ontology in two weeks. Entity resolution is batch work. Structural
|
||||||
|
consistency verification is minutes. The organic growth advantage collapses
|
||||||
|
for general knowledge.
|
||||||
|
|
||||||
|
** Actual moats (weaker than initially assumed)
|
||||||
|
|
||||||
|
1. **Domain-specific gate rules** — thin. A few hundred lines of Lisp data
|
||||||
|
encoding deployment-specific path patterns, shell safety rules, and
|
||||||
|
volume layouts. Write once, trivial to copy. Not a real moat.
|
||||||
|
|
||||||
|
2. **Empirical decision history** — every HITL decision is a Merkle fact.
|
||||||
|
"On date T, user approved action X under context Y." A fresh instance
|
||||||
|
has none of this. Makes *your* instance more valuable but doesn't
|
||||||
|
prevent competition — it's a switching cost, not a barrier to entry.
|
||||||
|
|
||||||
|
3. **Evaluation harness (regression suite)** — thousands of test cases
|
||||||
|
accumulated from every bug fix. Cannot be ingested from public data.
|
||||||
|
Built only by using the system, breaking it, fixing it, and adding a
|
||||||
|
test. Strongest residual moat, but even this can be partially
|
||||||
|
compressed through public benchmarks (SWE-bench, etc.).
|
||||||
|
|
||||||
|
4. **Infrastructure integration** — the specific Docker compose layouts,
|
||||||
|
Traefik router patterns, Authentik provider configurations, backup
|
||||||
|
policies encoded as gate rules over months of use. A competitor's
|
||||||
|
infrastructure is different; their generic Passepartout does not know
|
||||||
|
your topology.
|
||||||
|
|
||||||
|
** Strongest competitor strategy
|
||||||
|
|
||||||
|
Not copying your gate rules — offering the same architecture as a service
|
||||||
|
with their own pre-seeded general knowledge, a generic safety baseline,
|
||||||
|
and a consulting engagement to customize gate rules for each customer.
|
||||||
|
The AGPL prevents closing the architecture but does not prevent offering
|
||||||
|
it as a service with a customization layer.
|
||||||
|
|
||||||
|
** The defensible business is services, not product
|
||||||
|
|
||||||
|
The defensible entity is "the organization that best understands how to
|
||||||
|
adapt Passepartout to your domain" — not "the organization that owns
|
||||||
|
Passepartout." The Lisp Machine appliance (hardware + certification) and
|
||||||
|
evaluation harness certification service are the closest thing to product
|
||||||
|
defensibility.
|
||||||
|
|
||||||
|
* Economics and Monetization
|
||||||
|
|
||||||
|
** Cost structure
|
||||||
|
|
||||||
|
- One-time cost: gate-rule encoding for a domain (from hours for codified
|
||||||
|
domains — FAR, HIPAA, ISO standards — up to months for tacit domains)
|
||||||
|
- The LLM translates codified rules directly: ingest regulation → produce
|
||||||
|
gate rule plist → ACL2 verifies consistency → human reviews. This is
|
||||||
|
translation, not reasoning.
|
||||||
|
- For non-codified knowledge (craft expertise, organizational culture):
|
||||||
|
Phase 3 archivist loop over time
|
||||||
|
- Near-zero marginal cost: ACL2 proof + Screamer consistency check +
|
||||||
|
VivaceGraph lookup per interaction — all CPU-native, all in-image
|
||||||
|
- No recurring LLM API costs for the 80% symbolic reasoning layer
|
||||||
|
- After sufficiency flip: pennies per day vs dollars per day for LLM-only
|
||||||
|
|
||||||
|
** Revenue models by field
|
||||||
|
|
||||||
|
| Field | Why Passepartout | Revenue Model |
|
||||||
|
|-------+------------------+---------------|
|
||||||
|
| Industrial infrastructure (refineries, power grids, manufacturing) | Offline operation, provably safe, near-zero marginal cost, mandatory audit trail | Lisp Machine appliance + SCADA certification package |
|
||||||
|
| Healthcare administration (billing, claims, prior authorization) | Rule-heavy domain, privacy-mandated, audit-driven, high per-transaction cost today | Subscription for regulatory gate packages (CPT/ICD-10/HIPAA rules), updated when CMS publishes new rules |
|
||||||
|
| Software supply chain (CI/CD security, SBOM verification) | First-order structural verification — ACL2 is natural fit, CI/CD pipeline is already a sequence of gate-checkable steps | Evaluation harness as certification service — "run our 10,000-task suite and get a provable score" |
|
||||||
|
| Regulatory compliance (GDPR, SOC2, SOX, GxP) | Rule-completeness, active enforcement (not document-based), provable audit trail | Subscription for regulation-specific gate packages — GDPR package, SOC2 package, FedRAMP package, updated when regulations change |
|
||||||
|
| Defense and classified environments | Air-gapped operation, classification-level gate rules, Merkle provenance is court-admissible evidence | Government contract + hardened appliance with hardware root of trust |
|
||||||
|
|
||||||
|
** Critical insight: encoding cost drops to near-zero for codified domains **
|
||||||
|
|
||||||
|
Laws, regulations, standards, procedures, and technical specifications are
|
||||||
|
already written down in structured text. The LLM does not need to *reason*
|
||||||
|
about them — it needs to *translate* them into gate rules and ACL2 theorems.
|
||||||
|
|
||||||
|
Example: The US Federal Acquisition Regulation (FAR) is ~2,000 pages of
|
||||||
|
"thou shalt" and "thou shalt not" statements. A frontier LLM can ingest
|
||||||
|
the FAR and produce a plist of gate rules:
|
||||||
|
- (if contract > $250K AND not small-business-set-aside → :deny)
|
||||||
|
- (if sole-source AND no justification-documented → :deny, produce-justification)
|
||||||
|
|
||||||
|
ACL2 then verifies the rule set for internal consistency (Phase 6). Screamer
|
||||||
|
checks against existing compliance facts. The human reviews the bootstrap
|
||||||
|
output and approves or corrects individual rules.
|
||||||
|
|
||||||
|
The key distinction: the LLM is not *extracting knowledge from prose* in the
|
||||||
|
way Phase 3 archivist does (which is open-ended, noisy, requires grounding).
|
||||||
|
It is *translating a known rule system into a formal representation* — a
|
||||||
|
mechanical transformation of structured text into structured rules. The
|
||||||
|
result is not "the LLM's best guess at the rules" but "the rule set as
|
||||||
|
stated in the source document, mechanically transcribed."
|
||||||
|
|
||||||
|
For domains where the knowledge is codified as text, the gate-rule encoding
|
||||||
|
time drops from weeks to hours. The only bottleneck is human review of the
|
||||||
|
output — and the system can assist here by surfacing contradictions for
|
||||||
|
resolution rather than requiring a full line-by-line audit.
|
||||||
|
|
||||||
|
** What can actually be monetized (TLDR)
|
||||||
|
|
||||||
|
1. **Pre-loaded bootstrapping chains for specific verticals** — domain gate
|
||||||
|
rules, pre-seeded fact stores, mature proof forests. Saves the buyer
|
||||||
|
months of bootstrapping. Distributed as data packages under commercial
|
||||||
|
license, not AGPL.
|
||||||
|
|
||||||
|
2. **Evaluation harness as certification service** — "Bring your agent,
|
||||||
|
we'll run it through our suite and give a Merkle-verified score."
|
||||||
|
The regression suite grows with every deployment; a competitor's
|
||||||
|
regression suite starts empty.
|
||||||
|
|
||||||
|
3. **Hardened Lisp Machine appliance** — RISC-V soft-core with Lisp
|
||||||
|
microcode, pre-loaded mature Passepartout, certified for specific
|
||||||
|
verticals (IEC 62443 for industrial, HIPAA for healthcare). Value is
|
||||||
|
in integration and certification, not the AGPL software.
|
||||||
|
|
||||||
|
4. **Verified skill marketplace** — marketplace where skills are verified
|
||||||
|
(sandbox + ACL2 non-contradiction proof) before listing. Marketplace
|
||||||
|
takes a cut. Value is in the verification infrastructure, not the
|
||||||
|
skills themselves.
|
||||||
|
|
||||||
|
5. **Support and consulting** — the Red Hat model. AGPL code is free;
|
||||||
|
training, custom gate rules, ontology design, and emergency support
|
||||||
|
are paid.
|
||||||
|
|
||||||
|
* Design and Architectural Implications
|
||||||
|
|
||||||
|
** The self-improving system
|
||||||
|
|
||||||
|
Passepartout bootstraps two feedback loops:
|
||||||
|
|
||||||
|
- **Empirical loop:** gate outcomes → facts → Screamer-verified patterns →
|
||||||
|
sufficiency flip → auto-extraction. Knowledge grows without the LLM
|
||||||
|
touching most of it.
|
||||||
|
|
||||||
|
- **Logical loop:** ACL2 theorems → macro layers (generators, metafunctions,
|
||||||
|
induction DSL, abstract theories) → richer proof strategies → better
|
||||||
|
verification. Reasoning capacity grows without changing the prover binary.
|
||||||
|
|
||||||
|
These loops intersect at the fact store: proven theorems become facts, richer
|
||||||
|
facts generate better proof strategies, better strategies verify more facts.
|
||||||
|
The system upgrades itself.
|
||||||
|
|
||||||
|
** The 10-80-10 becomes approximately true
|
||||||
|
|
||||||
|
- 10%: LLM handles input translation (natural language → structured goal)
|
||||||
|
and output formatting (structured result → natural language)
|
||||||
|
- 80%: Symbolic engine handles reasoning — Screamer plans, ACL2 verifies,
|
||||||
|
VivaceGraph retrieves facts. Zero LLM tokens.
|
||||||
|
- The cost curve inverts: verification is cheaper than generation.
|
||||||
|
|
||||||
|
** Key implications
|
||||||
|
|
||||||
|
1. **Verification becomes cheaper than generation.** Once macro layers are
|
||||||
|
mature, proving a new rule non-contradictory costs near-zero. The LLM
|
||||||
|
proposes; the symbolic engine accepts or rejects.
|
||||||
|
|
||||||
|
2. **Trust scales with use.** Every interaction produces a structurally
|
||||||
|
verified outcome. Non-lossy fact base grows. Proof forest thickens. An
|
||||||
|
auditor can inspect the Merkle tree of gate outcomes and trace any
|
||||||
|
decision to its root theorem.
|
||||||
|
|
||||||
|
3. **Degradation is reversible.** Every proof layer is a hot-reloadable
|
||||||
|
skill. Every fact has provenance. A bad metafunction is unloaded;
|
||||||
|
theorems proven under it are flagged for re-verification; the fact
|
||||||
|
store retains the pre-upgrade ontology version.
|
||||||
|
|
||||||
|
4. **The system can diagnose its own logical frontier.** If ACL2 keeps
|
||||||
|
failing on a class of properties, and the failure mode is structural
|
||||||
|
(not solvable by more macros), the fact store accumulates a pattern:
|
||||||
|
"These N properties are first-order inexpressible." This signals the
|
||||||
|
human: the system needs a CIC prover (dependent types) for this domain.
|
||||||
|
The system cannot transcend its logic without external intervention —
|
||||||
|
but it can surface the boundary precisely.
|
||||||
|
|
||||||
|
** The Lisp Machine endpoint
|
||||||
|
|
||||||
|
If the system designs and builds itself on Lisp Machine hardware:
|
||||||
|
- The same system that proves theorems also optimizes the microcode
|
||||||
|
- No OS boundary, no driver layer — system and proof environment are one
|
||||||
|
- A RISC-V soft-core with Lisp microcode is manufacturable at older fab
|
||||||
|
nodes (28nm, 45nm) — sovereign intelligence without GPU supply chains
|
||||||
|
|
||||||
|
** Social implications
|
||||||
|
|
||||||
|
- **Concentration of reasoning.** The macro layers become opaque to anyone
|
||||||
|
who doesn't understand the bootstrapping history. The system understands
|
||||||
|
its own reasoning better than its users do.
|
||||||
|
|
||||||
|
- **Cost advantage widens inequality asymmetrically.** The first instance
|
||||||
|
to reach maturity requires significant gate-rule design (from hours for
|
||||||
|
codified domains to months for tacit ones). After that, replication is
|
||||||
|
cheap. Organizations that invest early have a permanent cost advantage
|
||||||
|
over those that wait for a turnkey product.
|
||||||
|
|
||||||
|
- **Sovereign artifact.** A self-building system on its own hardware does
|
||||||
|
not depend on cloud APIs, GPU supply chains, or proprietary model
|
||||||
|
weights. Its intelligence is generated, verified, and sustained locally.
|
||||||
|
Enables sovereign AI for nations without GPU access.
|
||||||
|
|
||||||
|
* Open Questions
|
||||||
|
|
||||||
|
1. Can CIC (dependent type theory) be implemented as a Passepartout skill,
|
||||||
|
verified for crash-freedom and rule fidelity by ACL2, and integrated
|
||||||
|
into the existing fact store API? The Gödelian boundary: ACL2 can
|
||||||
|
verify the kernel's implementation but not its soundness in any
|
||||||
|
absolute sense — but this matches current practice (Lean 4's ~500 line
|
||||||
|
C++ kernel is trusted, not proved).
|
||||||
|
|
||||||
|
2. Can the system generate novel proof strategies? A sufficiently rich
|
||||||
|
abstract theory layer + Screamer could propose: "Proofs in domain X
|
||||||
|
all use induction schema Y. Generalizing to Z would prove new
|
||||||
|
properties across A, B, C." The LLM translates to a metafunction;
|
||||||
|
ACL2 verifies it; the prover gains a new tactic invented by itself.
|
||||||
|
|
||||||
|
3. What is the social contract for a system that can truthfully say
|
||||||
|
"I know this is correct" — and "I know what I don't know"?
|
||||||
|
Most current AI systems can do neither.
|
||||||
|
|
||||||
|
* Impact on the AI and GPU Industry
|
||||||
|
|
||||||
|
If a symbolic-bootstrapping architecture becomes popular — especially now
|
||||||
|
that codified domains can be ingested at near-zero encoding cost — the
|
||||||
|
industry structure shifts fundamentally.
|
||||||
|
|
||||||
|
** Token demand compresses
|
||||||
|
|
||||||
|
The entire AI industry (OpenAI, Anthropic, Google — ~$50B API revenue) is
|
||||||
|
built on per-token pricing: metered cognition. A mature Passepartout
|
||||||
|
reduces token consumption to the unfamiliar 10% I/O boundary. Token demand
|
||||||
|
shifts from "every interaction burns tokens" to "only unfamiliar
|
||||||
|
interactions burn tokens." Steady-state per-user LLM consumption drops by
|
||||||
|
an order of magnitude.
|
||||||
|
|
||||||
|
** GPU inference demand plateaus in regulated industries
|
||||||
|
|
||||||
|
GPU inference is driven by two things: training and per-request inference.
|
||||||
|
Training demand is unaffected (frontier models still train on clusters).
|
||||||
|
Inference demand drops 80-90% in any sector where the rule book is
|
||||||
|
published — which covers most economically significant sectors (finance,
|
||||||
|
healthcare, industrial, government procurement, legal compliance).
|
||||||
|
|
||||||
|
Nvidia's growth narrative shifts from "every transaction goes through a
|
||||||
|
GPU" to "every training run needs a GPU, and the generative 20% needs
|
||||||
|
inference." A smaller inference TAM than current market pricing assumes.
|
||||||
|
|
||||||
|
** Hyperscaler competition shifts
|
||||||
|
|
||||||
|
The competitive thesis "AI is the next OS, and we own the compute layer"
|
||||||
|
weakens if the most valuable AI workloads run on a $500 RISC-V board on
|
||||||
|
your premises. The hyperscalers respond by:
|
||||||
|
- Offering Passepartout as a managed service (AGPL allows this)
|
||||||
|
- Differentiating on the frontier I/O API and world model API
|
||||||
|
- Competing on gate rule libraries for specific industries
|
||||||
|
|
||||||
|
The race shifts from "who has the most H100s" to "who has the best
|
||||||
|
domain-specific gate rules." Google's industry data advantage matters
|
||||||
|
more than Azure's raw compute.
|
||||||
|
|
||||||
|
** New hardware tier: verification appliances
|
||||||
|
|
||||||
|
A new category emerges: CPU-native verification appliances running a Lisp
|
||||||
|
microcode on RISC-V cores. Low volume (hundreds of thousands/year),
|
||||||
|
high margin ($5K-50K/unit), high switching costs. The Sun Microsystems
|
||||||
|
model, not the Intel model. Manufacturable at older fab nodes (28nm,
|
||||||
|
45nm) — no dependency on TSMC's leading edge.
|
||||||
|
|
||||||
|
** The key uncertainty and its resolution
|
||||||
|
|
||||||
|
Original question: how long does gate-rule encoding take?
|
||||||
|
|
||||||
|
Resolution: for codified domains, near-zero. The LLM translates published
|
||||||
|
regulations into formal rules in one pass — it is a mechanical transformation,
|
||||||
|
not open-ended reasoning. The bottleneck only exists for tacit, oral, unwritten
|
||||||
|
knowledge (craft expertise, organizational culture).
|
||||||
|
|
||||||
|
Consequence for the transition timeline: Phase 2 (sufficiency) happens
|
||||||
|
within months for any domain whose rule book is published. The disruption
|
||||||
|
accelerates from years to quarters.
|
||||||
214
methodology/CLOUDFLARE-SETUP.md
Normal file
214
methodology/CLOUDFLARE-SETUP.md
Normal file
@@ -0,0 +1,214 @@
|
|||||||
|
# Cloudflare Infrastructure Setup
|
||||||
|
|
||||||
|
## Architecture Overview
|
||||||
|
|
||||||
|
```
|
||||||
|
Browser ──► Cloudflare (orange cloud) ──► Tunnel ──► cloudflared ──► Traefik:8081
|
||||||
|
│
|
||||||
|
(tunnel entrypoint)
|
||||||
|
│
|
||||||
|
┌──────────┴──────────┐
|
||||||
|
secureweb tunnel
|
||||||
|
(direct internet) (via tunnel)
|
||||||
|
:443 :8081
|
||||||
|
LetsEncrypt plain HTTP
|
||||||
|
```
|
||||||
|
|
||||||
|
All external traffic goes through Cloudflare Tunnel. Traefik serves as the internal reverse proxy, routing by Host header. Traefik's LetsEncrypt certs exist for LAN/direct access but aren't used externally (Cloudflare edge terminates TLS for visitors).
|
||||||
|
|
||||||
|
## Active Stack
|
||||||
|
|
||||||
|
| Component | Location | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| Tunnel name | `home` | Created at Cloudflare Zero Trust → Networks → Tunnels |
|
||||||
|
| Tunnel ID | `c29295c5-946a-4ddf-bdfe-7eafcd74faa3` | Visible in dashboard |
|
||||||
|
| Token | `.env` `TUNNEL_TOKEN=...` | Used by cloudflared to authenticate |
|
||||||
|
| cloudflared | production-1 Docker container | Runs with `--restart unless-stopped` |
|
||||||
|
| Traefik | production-1 Docker container | Ports 80, 443, 8080, 8082 |
|
||||||
|
| Traefik entrypoints | `web` (:80→redirect), `secureweb` (:443, TLS), `tunnel` (:8081), `metrics` (:8082) | |
|
||||||
|
| Traefik config | `/docker/appdata/traefik/traefik.yaml` | |
|
||||||
|
| Compose file | `/docker/compose/docker-compose.yaml` | cloudflared service defined but runs standalone due to interpolation issues |
|
||||||
|
| Traefik labels | Per-service in docker-compose.yaml | Pattern: `entrypoints=tunnel`, `Host(`subdomain.gharbeia.net`)` |
|
||||||
|
|
||||||
|
## 1. Adding a Second Domain
|
||||||
|
|
||||||
|
Example: adding `example.com` alongside `gharbeia.net`.
|
||||||
|
|
||||||
|
### Step 1: Cloudflare DNS
|
||||||
|
|
||||||
|
1. Go to Cloudflare Dashboard → Add site → enter `example.com`
|
||||||
|
2. Cloudflare scans existing DNS records — verify and continue
|
||||||
|
3. Change nameservers at your registrar to Cloudflare's
|
||||||
|
4. Wait for DNS to propagate
|
||||||
|
|
||||||
|
### Step 2: Cloudflare Tunnel — add hostname
|
||||||
|
|
||||||
|
1. **Zero Trust** → **Networks** → **Connectors** → **Cloudflare Tunnels**
|
||||||
|
2. Select tunnel **home** → **Edit**
|
||||||
|
3. Click **Add a public hostname**
|
||||||
|
4. Set:
|
||||||
|
- **Subdomain:** `*`
|
||||||
|
- **Domain:** `example.com`
|
||||||
|
- **Service Type:** HTTP
|
||||||
|
- **URL:** `traefik:8081`
|
||||||
|
5. Save
|
||||||
|
|
||||||
|
Cloudflare DNS will automatically create a wildcard CNAME for `*.example.com` pointing to `c29295c5-946a-4ddf-bdfe-7eafcd74faa3.cfargotunnel.com`.
|
||||||
|
|
||||||
|
### Step 3: Traefik — update cert resolver
|
||||||
|
|
||||||
|
Cloudflare wildcard certs only cover `*.gharbeia.net`. For `*.example.com`, either:
|
||||||
|
|
||||||
|
**Option A: Add to Traefik's ACME config**
|
||||||
|
|
||||||
|
In `/docker/appdata/traefik/traefik.yaml`, in `certificatesResolvers.letsencrypt.acme`:
|
||||||
|
```yaml
|
||||||
|
dnsChallenge:
|
||||||
|
provider: cloudflare
|
||||||
|
resolvers:
|
||||||
|
- 1.1.1.1:53
|
||||||
|
- 1.0.0.1:53
|
||||||
|
propagation:
|
||||||
|
delayBeforeChecks: 120s # increase for wildcards
|
||||||
|
```
|
||||||
|
|
||||||
|
Then in `entryPoints.secureweb.http.tls.domains`:
|
||||||
|
```yaml
|
||||||
|
domains:
|
||||||
|
- main: gharbeia.net
|
||||||
|
sans:
|
||||||
|
- "*.gharbeia.net"
|
||||||
|
- main: example.com
|
||||||
|
sans:
|
||||||
|
- "*.example.com"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Important:** Before ACME DNS challenge can succeed for wildcard domains proxied via Cloudflare, you MUST create a DNS-only placeholder record for `_acme-challenge` to break the wildcard:
|
||||||
|
|
||||||
|
Go to Cloudflare DNS → Add record:
|
||||||
|
- **Type:** A
|
||||||
|
- **Name:** `_acme-challenge`
|
||||||
|
- **Content:** `127.0.0.1`
|
||||||
|
- **Proxy:** DNS-only (gray cloud)
|
||||||
|
|
||||||
|
Without this, the proxied wildcard CNAME intercepts `_acme-challenge.*` queries and LetsEncrypt can't verify the TXT challenge records.
|
||||||
|
|
||||||
|
**Option B: Skip HTTPS cert for now**
|
||||||
|
|
||||||
|
If the second domain doesn't need Traefik's own cert (i.e., Cloudflare edge certs are sufficient), no Traefik changes needed. The tunnel handles everything.
|
||||||
|
|
||||||
|
### Step 4: Add Traefik labels
|
||||||
|
|
||||||
|
For each service on the new domain, add Traefik labels in docker-compose.yaml:
|
||||||
|
```yaml
|
||||||
|
labels:
|
||||||
|
- traefik.enable=true
|
||||||
|
- traefik.http.routers.servicename.rule=Host(`subdomain.example.com`)
|
||||||
|
- traefik.http.routers.servicename.entrypoints=tunnel
|
||||||
|
- traefik.http.services.servicename.loadbalancer.server.port=3000
|
||||||
|
```
|
||||||
|
|
||||||
|
Then restart the container to pick up labels (compose or docker run depending on env state).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Security Panic — Changing All Tokens and Settings
|
||||||
|
|
||||||
|
### Step 1: Cloudflare API Token
|
||||||
|
|
||||||
|
1. Go to Cloudflare Dashboard → **My Profile** → **API Tokens**
|
||||||
|
2. **Delete** the old token
|
||||||
|
3. **Create Token** → **Edit zone DNS** template:
|
||||||
|
- Permissions: Zone → DNS → Edit
|
||||||
|
- Zone Resources: Include → Specific zone → `gharbeia.net`
|
||||||
|
- TTL: No expiration (or set a reasonable one)
|
||||||
|
4. Copy the new token
|
||||||
|
|
||||||
|
### Step 2: Update .env on production-1
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh production-1
|
||||||
|
# Edit the token
|
||||||
|
sed -i "s|^CLOUDFLARE_DNS_API_TOKEN=.*|CLOUDFLARE_DNS_API_TOKEN=<new-token>|" /docker/compose/.env
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Restart Traefik
|
||||||
|
|
||||||
|
Traefik runs standalone (not under compose). Restart it with the new token:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker rm -f traefik
|
||||||
|
docker run -d --name traefik --restart unless-stopped --network networking \
|
||||||
|
-p 80:80 -p 443:443 -p 8080:8080 -p 8082:8082 \
|
||||||
|
-e CF_DNS_API_TOKEN="<new-token>" \
|
||||||
|
-e TZ="America/New_York" \
|
||||||
|
-v /var/run/docker.sock:/var/run/docker.sock:ro \
|
||||||
|
-v /docker/appdata/logs/traefik:/var/log \
|
||||||
|
-v /docker/appdata/traefik:/etc/traefik \
|
||||||
|
-v /docker/appdata/traefik/letsencrypt:/letsencrypt \
|
||||||
|
traefik:latest
|
||||||
|
```
|
||||||
|
|
||||||
|
Traefik will automatically renew its LetsEncrypt cert with the new token.
|
||||||
|
|
||||||
|
### Step 4: Cloudflare Tunnel Token
|
||||||
|
|
||||||
|
If you also rotate the tunnel token:
|
||||||
|
|
||||||
|
1. Go to **Zero Trust** → **Networks** → **Tunnels** → **home**
|
||||||
|
2. Click the three dots → **Recreate token**
|
||||||
|
3. Copy the new token
|
||||||
|
|
||||||
|
On production-1:
|
||||||
|
```bash
|
||||||
|
docker rm -f cloudflared
|
||||||
|
docker run -d --name cloudflared --restart unless-stopped --network networking \
|
||||||
|
-v /docker/appdata/cloudflared:/home/nonroot/.cloudflared \
|
||||||
|
cloudflare/cloudflared:latest tunnel --no-autoupdate run --token "<new-token>"
|
||||||
|
```
|
||||||
|
|
||||||
|
Also update the `.env` file:
|
||||||
|
```
|
||||||
|
TUNNEL_TOKEN=<new-token>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 5: Verify everything works
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check cloudflared is connected
|
||||||
|
docker logs cloudflared --tail 5 | grep "Registered tunnel connection"
|
||||||
|
|
||||||
|
# Check Traefik is running and has routes
|
||||||
|
docker logs traefik --tail 10 | grep "Register"
|
||||||
|
|
||||||
|
# Test end-to-end
|
||||||
|
curl -sI https://git.gharbeia.net/ | head -5
|
||||||
|
# Should return HTTP/2 200
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 6: Rotate other secrets (if needed)
|
||||||
|
|
||||||
|
Other credentials in `.env` that may need rotation:
|
||||||
|
- `AUTHENTIK_SECRET_KEY` — generate new: `openssl rand -base64 60 | tr -d '\n'`
|
||||||
|
- `POSTGRESQL_PASSWORD` — generate new
|
||||||
|
- `EMAIL_PASSWORD` — Fastmail app password
|
||||||
|
- `GITEA_TOKEN` — Gitea settings → Applications
|
||||||
|
- `OPENROUTER_API_KEY` — OpenRouter dashboard
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Cert renewal fails with "Invalid format for Authorization header"
|
||||||
|
The Cloudflare API token in `CF_DNS_API_TOKEN` is wrong or expired. Generate a new one and restart Traefik.
|
||||||
|
|
||||||
|
### Cert renewal stuck on "Waiting for DNS record propagation"
|
||||||
|
Likely the proxied wildcard CNAME is intercepting `_acme-challenge` subdomain. Add a DNS-only A record for `_acme-challenge` → `127.0.0.1` in Cloudflare DNS to break the wildcard.
|
||||||
|
|
||||||
|
### Tunnel shows 502 Bad Gateway
|
||||||
|
- Check cloudflared logs: `docker logs cloudflared --tail 10`
|
||||||
|
- If `connection refused` → Traefik isn't listening on the expected port
|
||||||
|
- If `tls: unrecognized name` → SNI mismatch (revert to HTTP mode in tunnel config)
|
||||||
|
|
||||||
|
### Adding a new service
|
||||||
|
1. Add Traefik labels to the service in docker-compose.yaml
|
||||||
|
2. If the container was started without labels, recreate it with `docker run` including `-l` flags (compose may fail due to .env interpolation issues)
|
||||||
|
3. The tunnel wildcard already catches `*.gharbeia.net` — no tunnel changes needed
|
||||||
Reference in New Issue
Block a user