From b5d59c3360d649fb4fee95fbb5b9a343b1d41a98 Mon Sep 17 00:00:00 2001
From: Hermes <hermes@hermes.local>
Date: Thu, 21 May 2026 18:52:19 +0000
Subject: [PATCH] llm-ability: coding skill is a speed multiplier, not a gate
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- LLM proposes code at every bootstrap stage (microcode, CIC kernel,
  macro layers, gate rules) — symbolic engine verifies before accepting
- Weak model = more retries (5-15), strong model = fewer (1-3)
  Both produce 100% verified output because the symbolic engine catches
  all mistakes
- The critical transition: not better LLMs, but the sufficiency flip
  applied to hardware. Once enough facts about runtime behavior
  accumulate, the system proposes microcode optimizations with zero
  LLM tokens.
- Surprise result: a barely competent LLM is sufficient for the full
  bootstrapping chain. It's slower and costs more in API calls, but
  reaches the same destination.
---
 ideas/passepartout-economics.org | 57 ++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/ideas/passepartout-economics.org b/ideas/passepartout-economics.org
index 607c350..0c8cec8 100644
--- a/ideas/passepartout-economics.org
+++ b/ideas/passepartout-economics.org
@@ -722,6 +722,63 @@ the LLM. The system designs its own core dispatch logic,
 loads it onto idle cores, and verifies the result with ACL2
 before committing.
 
+** How the LLM's coding ability affects the bootstrapping timeline
+
+The LLM writes code at every stage of the bootstrapping:
+1. The Lisp Machine's microcode (RISC-V dispatch, GC barriers, 
+   tagged memory operations)
+2. The CIC prover kernel (if built as a skill)
+3. ACL2 macro layers for new domains
+4. Gate rules for previously uncodified domains
+5. The initial self-optimization proposals
+
+At each stage, the symbolic engine (ACL2, Screamer, gate stack)
+verifies the LLM's output before accepting it. The LLM proposes;
+the symbolic engine disposes.
+
+This means the LLM's coding ability is a **speed multiplier, not
+a gate**. A weak LLM (3B local model) produces correct code 
+after N retries where the symbolic engine catches the mistakes
+and feeds them back. A strong LLM (Claude Sonnet, DeepSeek, GPT)
+produces correct code after fewer retries. The cost difference
+is in API calls and wall-clock time, not in the correctness of
+the final output — the symbolic engine guarantees that.
+
+| Scenario | LLM quality | Retries per unit | Wall-clock per unit | Correctness |
+|----------|-------------|------------------|---------------------|-------------|
+| Bootstrapping with local 3B model | Low | 5-15 | 10x slower | 100% (verified) |
+| Bootstrapping with frontier API | High | 1-3 | 1x | 100% (verified) |
+| Bootstrapping after sufficiency flip | None (symbolic only) | 0 | Instant | 100% (verified) |
+
+The critical transition is between row 2 and row 3: once the
+symbolic engine has accumulated enough non-lossy facts about
+the Lisp Machine's hardware behavior (latency profiles, GC
+patterns, instruction timings), it can propose microcode
+optimizations without any LLM involvement. ACL2 proves the
+optimization preserves correctness; Screamer checks it against
+known hardware constraints; the gate stack verifies it won't
+damage the running system. Zero tokens.
+
+This is the sufficiency flip applied to hardware. The timeline
+to reach it depends on how many facts the system can gather
+about its own runtime behavior, not on how good the LLM is.
+
+** The surprising result
+
+An LLM that is just barely competent at coding (enough to
+generate syntactically valid RISC-V or Lisp that passes the
+symbolic engine's checks after a few retries) is sufficient
+for the entire bootstrapping chain. It takes longer — more
+retries, more wall-clock — but it reaches the same endpoint:
+a system that designs its own microcode without any LLM.
+
+The LLM's coding ability determines how many API dollars and
+calendar months the bootstrap requires. It does not determine
+whether the bootstrap succeeds. A patient operator with a
+3B local model and a Tenstorrent card reaches the same
+destination as an operator with bottomless API credits — the
+second arrives faster, but both arrive.
+
 Large refactoring projects (extract module, rename API, split monolith)
 are the hardest test for any AI agent. Current approaches (Claude Code,
 Copilot) handle them probabilistically — every step costs tokens, and