llm-ability: coding skill is a speed multiplier, not a gate

- LLM proposes code at every bootstrap stage (microcode, CIC kernel,
  macro layers, gate rules) — symbolic engine verifies before accepting
- Weak model = more retries (5-15), strong model = fewer (1-3)
  Both produce 100% verified output because the symbolic engine catches
  all mistakes
- The critical transition: not better LLMs, but the sufficiency flip
  applied to hardware. Once enough facts about runtime behavior
  accumulate, the system proposes microcode optimizations with zero
  LLM tokens.
- Surprise result: a barely competent LLM is sufficient for the full
  bootstrapping chain. It's slower and costs more in API calls, but
  reaches the same destination.
This commit is contained in:
Hermes
2026-05-21 18:52:19 +00:00
parent f9085a4690
commit b5d59c3360

View File

@@ -722,6 +722,63 @@ the LLM. The system designs its own core dispatch logic,
loads it onto idle cores, and verifies the result with ACL2
before committing.
** How the LLM's coding ability affects the bootstrapping timeline
The LLM writes code at every stage of the bootstrapping:
1. The Lisp Machine's microcode (RISC-V dispatch, GC barriers,
tagged memory operations)
2. The CIC prover kernel (if built as a skill)
3. ACL2 macro layers for new domains
4. Gate rules for previously uncodified domains
5. The initial self-optimization proposals
At each stage, the symbolic engine (ACL2, Screamer, gate stack)
verifies the LLM's output before accepting it. The LLM proposes;
the symbolic engine disposes.
This means the LLM's coding ability is a **speed multiplier, not
a gate**. A weak LLM (3B local model) produces correct code
after N retries where the symbolic engine catches the mistakes
and feeds them back. A strong LLM (Claude Sonnet, DeepSeek, GPT)
produces correct code after fewer retries. The cost difference
is in API calls and wall-clock time, not in the correctness of
the final output — the symbolic engine guarantees that.
| Scenario | LLM quality | Retries per unit | Wall-clock per unit | Correctness |
|----------|-------------|------------------|---------------------|-------------|
| Bootstrapping with local 3B model | Low | 5-15 | 10x slower | 100% (verified) |
| Bootstrapping with frontier API | High | 1-3 | 1x | 100% (verified) |
| Bootstrapping after sufficiency flip | None (symbolic only) | 0 | Instant | 100% (verified) |
The critical transition is between row 2 and row 3: once the
symbolic engine has accumulated enough non-lossy facts about
the Lisp Machine's hardware behavior (latency profiles, GC
patterns, instruction timings), it can propose microcode
optimizations without any LLM involvement. ACL2 proves the
optimization preserves correctness; Screamer checks it against
known hardware constraints; the gate stack verifies it won't
damage the running system. Zero tokens.
This is the sufficiency flip applied to hardware. The timeline
to reach it depends on how many facts the system can gather
about its own runtime behavior, not on how good the LLM is.
** The surprising result
An LLM that is just barely competent at coding (enough to
generate syntactically valid RISC-V or Lisp that passes the
symbolic engine's checks after a few retries) is sufficient
for the entire bootstrapping chain. It takes longer — more
retries, more wall-clock — but it reaches the same endpoint:
a system that designs its own microcode without any LLM.
The LLM's coding ability determines how many API dollars and
calendar months the bootstrap requires. It does not determine
whether the bootstrap succeeds. A patient operator with a
3B local model and a Tenstorrent card reaches the same
destination as an operator with bottomless API credits — the
second arrives faster, but both arrive.
Large refactoring projects (extract module, rename API, split monolith)
are the hardest test for any AI agent. Current approaches (Claude Code,
Copilot) handle them probabilistically — every step costs tokens, and