From b5d59c3360d649fb4fee95fbb5b9a343b1d41a98 Mon Sep 17 00:00:00 2001 From: Hermes Date: Thu, 21 May 2026 18:52:19 +0000 Subject: [PATCH] llm-ability: coding skill is a speed multiplier, not a gate MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - LLM proposes code at every bootstrap stage (microcode, CIC kernel, macro layers, gate rules) — symbolic engine verifies before accepting - Weak model = more retries (5-15), strong model = fewer (1-3) Both produce 100% verified output because the symbolic engine catches all mistakes - The critical transition: not better LLMs, but the sufficiency flip applied to hardware. Once enough facts about runtime behavior accumulate, the system proposes microcode optimizations with zero LLM tokens. - Surprise result: a barely competent LLM is sufficient for the full bootstrapping chain. It's slower and costs more in API calls, but reaches the same destination. --- ideas/passepartout-economics.org | 57 ++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) diff --git a/ideas/passepartout-economics.org b/ideas/passepartout-economics.org index 607c350..0c8cec8 100644 --- a/ideas/passepartout-economics.org +++ b/ideas/passepartout-economics.org @@ -722,6 +722,63 @@ the LLM. The system designs its own core dispatch logic, loads it onto idle cores, and verifies the result with ACL2 before committing. +** How the LLM's coding ability affects the bootstrapping timeline + +The LLM writes code at every stage of the bootstrapping: +1. The Lisp Machine's microcode (RISC-V dispatch, GC barriers, + tagged memory operations) +2. The CIC prover kernel (if built as a skill) +3. ACL2 macro layers for new domains +4. Gate rules for previously uncodified domains +5. The initial self-optimization proposals + +At each stage, the symbolic engine (ACL2, Screamer, gate stack) +verifies the LLM's output before accepting it. The LLM proposes; +the symbolic engine disposes. + +This means the LLM's coding ability is a **speed multiplier, not +a gate**. A weak LLM (3B local model) produces correct code +after N retries where the symbolic engine catches the mistakes +and feeds them back. A strong LLM (Claude Sonnet, DeepSeek, GPT) +produces correct code after fewer retries. The cost difference +is in API calls and wall-clock time, not in the correctness of +the final output — the symbolic engine guarantees that. + +| Scenario | LLM quality | Retries per unit | Wall-clock per unit | Correctness | +|----------|-------------|------------------|---------------------|-------------| +| Bootstrapping with local 3B model | Low | 5-15 | 10x slower | 100% (verified) | +| Bootstrapping with frontier API | High | 1-3 | 1x | 100% (verified) | +| Bootstrapping after sufficiency flip | None (symbolic only) | 0 | Instant | 100% (verified) | + +The critical transition is between row 2 and row 3: once the +symbolic engine has accumulated enough non-lossy facts about +the Lisp Machine's hardware behavior (latency profiles, GC +patterns, instruction timings), it can propose microcode +optimizations without any LLM involvement. ACL2 proves the +optimization preserves correctness; Screamer checks it against +known hardware constraints; the gate stack verifies it won't +damage the running system. Zero tokens. + +This is the sufficiency flip applied to hardware. The timeline +to reach it depends on how many facts the system can gather +about its own runtime behavior, not on how good the LLM is. + +** The surprising result + +An LLM that is just barely competent at coding (enough to +generate syntactically valid RISC-V or Lisp that passes the +symbolic engine's checks after a few retries) is sufficient +for the entire bootstrapping chain. It takes longer — more +retries, more wall-clock — but it reaches the same endpoint: +a system that designs its own microcode without any LLM. + +The LLM's coding ability determines how many API dollars and +calendar months the bootstrap requires. It does not determine +whether the bootstrap succeeds. A patient operator with a +3B local model and a Tenstorrent card reaches the same +destination as an operator with bottomless API credits — the +second arrives faster, but both arrive. + Large refactoring projects (extract module, rename API, split monolith) are the hardest test for any AI agent. Current approaches (Claude Code, Copilot) handle them probabilistically — every step costs tokens, and