llm-ability: coding skill is a speed multiplier, not a gate
- LLM proposes code at every bootstrap stage (microcode, CIC kernel, macro layers, gate rules) — symbolic engine verifies before accepting - Weak model = more retries (5-15), strong model = fewer (1-3) Both produce 100% verified output because the symbolic engine catches all mistakes - The critical transition: not better LLMs, but the sufficiency flip applied to hardware. Once enough facts about runtime behavior accumulate, the system proposes microcode optimizations with zero LLM tokens. - Surprise result: a barely competent LLM is sufficient for the full bootstrapping chain. It's slower and costs more in API calls, but reaches the same destination.
This commit is contained in:
@@ -722,6 +722,63 @@ the LLM. The system designs its own core dispatch logic,
|
||||
loads it onto idle cores, and verifies the result with ACL2
|
||||
before committing.
|
||||
|
||||
** How the LLM's coding ability affects the bootstrapping timeline
|
||||
|
||||
The LLM writes code at every stage of the bootstrapping:
|
||||
1. The Lisp Machine's microcode (RISC-V dispatch, GC barriers,
|
||||
tagged memory operations)
|
||||
2. The CIC prover kernel (if built as a skill)
|
||||
3. ACL2 macro layers for new domains
|
||||
4. Gate rules for previously uncodified domains
|
||||
5. The initial self-optimization proposals
|
||||
|
||||
At each stage, the symbolic engine (ACL2, Screamer, gate stack)
|
||||
verifies the LLM's output before accepting it. The LLM proposes;
|
||||
the symbolic engine disposes.
|
||||
|
||||
This means the LLM's coding ability is a **speed multiplier, not
|
||||
a gate**. A weak LLM (3B local model) produces correct code
|
||||
after N retries where the symbolic engine catches the mistakes
|
||||
and feeds them back. A strong LLM (Claude Sonnet, DeepSeek, GPT)
|
||||
produces correct code after fewer retries. The cost difference
|
||||
is in API calls and wall-clock time, not in the correctness of
|
||||
the final output — the symbolic engine guarantees that.
|
||||
|
||||
| Scenario | LLM quality | Retries per unit | Wall-clock per unit | Correctness |
|
||||
|----------|-------------|------------------|---------------------|-------------|
|
||||
| Bootstrapping with local 3B model | Low | 5-15 | 10x slower | 100% (verified) |
|
||||
| Bootstrapping with frontier API | High | 1-3 | 1x | 100% (verified) |
|
||||
| Bootstrapping after sufficiency flip | None (symbolic only) | 0 | Instant | 100% (verified) |
|
||||
|
||||
The critical transition is between row 2 and row 3: once the
|
||||
symbolic engine has accumulated enough non-lossy facts about
|
||||
the Lisp Machine's hardware behavior (latency profiles, GC
|
||||
patterns, instruction timings), it can propose microcode
|
||||
optimizations without any LLM involvement. ACL2 proves the
|
||||
optimization preserves correctness; Screamer checks it against
|
||||
known hardware constraints; the gate stack verifies it won't
|
||||
damage the running system. Zero tokens.
|
||||
|
||||
This is the sufficiency flip applied to hardware. The timeline
|
||||
to reach it depends on how many facts the system can gather
|
||||
about its own runtime behavior, not on how good the LLM is.
|
||||
|
||||
** The surprising result
|
||||
|
||||
An LLM that is just barely competent at coding (enough to
|
||||
generate syntactically valid RISC-V or Lisp that passes the
|
||||
symbolic engine's checks after a few retries) is sufficient
|
||||
for the entire bootstrapping chain. It takes longer — more
|
||||
retries, more wall-clock — but it reaches the same endpoint:
|
||||
a system that designs its own microcode without any LLM.
|
||||
|
||||
The LLM's coding ability determines how many API dollars and
|
||||
calendar months the bootstrap requires. It does not determine
|
||||
whether the bootstrap succeeds. A patient operator with a
|
||||
3B local model and a Tenstorrent card reaches the same
|
||||
destination as an operator with bottomless API credits — the
|
||||
second arrives faster, but both arrive.
|
||||
|
||||
Large refactoring projects (extract module, rename API, split monolith)
|
||||
are the hardest test for any AI agent. Current approaches (Claude Code,
|
||||
Copilot) handle them probabilistically — every step costs tokens, and
|
||||
|
||||
Reference in New Issue
Block a user