llm-ability: coding skill is a speed multiplier, not a gate

- LLM proposes code at every bootstrap stage (microcode, CIC kernel, macro layers, gate rules) — symbolic engine verifies before accepting - Weak model = more retries (5-15), strong model = fewer (1-3) Both produce 100% verified output because the symbolic engine catches all mistakes - The critical transition: not better LLMs, but the sufficiency flip applied to hardware. Once enough facts about runtime behavior accumulate, the system proposes microcode optimizations with zero LLM tokens. - Surprise result: a barely competent LLM is sufficient for the full bootstrapping chain. It's slower and costs more in API calls, but reaches the same destination.
2026-05-21 18:52:19 +00:00
parent f9085a4690
commit b5d59c3360
1 changed files with 57 additions and 0 deletions
--- a/ideas/passepartout-economics.org
+++ b/ideas/passepartout-economics.org
@@ -722,6 +722,63 @@ the LLM. The system designs its own core dispatch logic,
 loads it onto idle cores, and verifies the result with ACL2
 before committing.
 ** How the LLM's coding ability affects the bootstrapping timeline
 The LLM writes code at every stage of the bootstrapping:
 1. The Lisp Machine's microcode (RISC-V dispatch, GC barriers, 
   tagged memory operations)
 2. The CIC prover kernel (if built as a skill)
 3. ACL2 macro layers for new domains
 4. Gate rules for previously uncodified domains
 5. The initial self-optimization proposals
 At each stage, the symbolic engine (ACL2, Screamer, gate stack)
 verifies the LLM's output before accepting it. The LLM proposes;
 the symbolic engine disposes.
 This means the LLM's coding ability is a **speed multiplier, not
 a gate**. A weak LLM (3B local model) produces correct code 
 after N retries where the symbolic engine catches the mistakes
 and feeds them back. A strong LLM (Claude Sonnet, DeepSeek, GPT)
 produces correct code after fewer retries. The cost difference
 is in API calls and wall-clock time, not in the correctness of
 the final output — the symbolic engine guarantees that.
 | Scenario | LLM quality | Retries per unit | Wall-clock per unit | Correctness |
 |----------|-------------|------------------|---------------------|-------------|
 | Bootstrapping with local 3B model | Low | 5-15 | 10x slower | 100% (verified) |
 | Bootstrapping with frontier API | High | 1-3 | 1x | 100% (verified) |
 | Bootstrapping after sufficiency flip | None (symbolic only) | 0 | Instant | 100% (verified) |
 The critical transition is between row 2 and row 3: once the
 symbolic engine has accumulated enough non-lossy facts about
 the Lisp Machine's hardware behavior (latency profiles, GC
 patterns, instruction timings), it can propose microcode
 optimizations without any LLM involvement. ACL2 proves the
 optimization preserves correctness; Screamer checks it against
 known hardware constraints; the gate stack verifies it won't
 damage the running system. Zero tokens.
 This is the sufficiency flip applied to hardware. The timeline
 to reach it depends on how many facts the system can gather
 about its own runtime behavior, not on how good the LLM is.
 ** The surprising result
 An LLM that is just barely competent at coding (enough to
 generate syntactically valid RISC-V or Lisp that passes the
 symbolic engine's checks after a few retries) is sufficient
 for the entire bootstrapping chain. It takes longer — more
 retries, more wall-clock — but it reaches the same endpoint:
 a system that designs its own microcode without any LLM.
 The LLM's coding ability determines how many API dollars and
 calendar months the bootstrap requires. It does not determine
 whether the bootstrap succeeds. A patient operator with a
 3B local model and a Tenstorrent card reaches the same
 destination as an operator with bottomless API credits — the
 second arrives faster, but both arrive.
 Large refactoring projects (extract module, rename API, split monolith)
 are the hardest test for any AI agent. Current approaches (Claude Code,
 Copilot) handle them probabilistically — every step costs tokens, and