llm-ability: coding skill is a speed multiplier, not a gate
- LLM proposes code at every bootstrap stage (microcode, CIC kernel, macro layers, gate rules) — symbolic engine verifies before accepting - Weak model = more retries (5-15), strong model = fewer (1-3) Both produce 100% verified output because the symbolic engine catches all mistakes - The critical transition: not better LLMs, but the sufficiency flip applied to hardware. Once enough facts about runtime behavior accumulate, the system proposes microcode optimizations with zero LLM tokens. - Surprise result: a barely competent LLM is sufficient for the full bootstrapping chain. It's slower and costs more in API calls, but reaches the same destination.
This commit is contained in:
@@ -722,6 +722,63 @@ the LLM. The system designs its own core dispatch logic,
|
|||||||
loads it onto idle cores, and verifies the result with ACL2
|
loads it onto idle cores, and verifies the result with ACL2
|
||||||
before committing.
|
before committing.
|
||||||
|
|
||||||
|
** How the LLM's coding ability affects the bootstrapping timeline
|
||||||
|
|
||||||
|
The LLM writes code at every stage of the bootstrapping:
|
||||||
|
1. The Lisp Machine's microcode (RISC-V dispatch, GC barriers,
|
||||||
|
tagged memory operations)
|
||||||
|
2. The CIC prover kernel (if built as a skill)
|
||||||
|
3. ACL2 macro layers for new domains
|
||||||
|
4. Gate rules for previously uncodified domains
|
||||||
|
5. The initial self-optimization proposals
|
||||||
|
|
||||||
|
At each stage, the symbolic engine (ACL2, Screamer, gate stack)
|
||||||
|
verifies the LLM's output before accepting it. The LLM proposes;
|
||||||
|
the symbolic engine disposes.
|
||||||
|
|
||||||
|
This means the LLM's coding ability is a **speed multiplier, not
|
||||||
|
a gate**. A weak LLM (3B local model) produces correct code
|
||||||
|
after N retries where the symbolic engine catches the mistakes
|
||||||
|
and feeds them back. A strong LLM (Claude Sonnet, DeepSeek, GPT)
|
||||||
|
produces correct code after fewer retries. The cost difference
|
||||||
|
is in API calls and wall-clock time, not in the correctness of
|
||||||
|
the final output — the symbolic engine guarantees that.
|
||||||
|
|
||||||
|
| Scenario | LLM quality | Retries per unit | Wall-clock per unit | Correctness |
|
||||||
|
|----------|-------------|------------------|---------------------|-------------|
|
||||||
|
| Bootstrapping with local 3B model | Low | 5-15 | 10x slower | 100% (verified) |
|
||||||
|
| Bootstrapping with frontier API | High | 1-3 | 1x | 100% (verified) |
|
||||||
|
| Bootstrapping after sufficiency flip | None (symbolic only) | 0 | Instant | 100% (verified) |
|
||||||
|
|
||||||
|
The critical transition is between row 2 and row 3: once the
|
||||||
|
symbolic engine has accumulated enough non-lossy facts about
|
||||||
|
the Lisp Machine's hardware behavior (latency profiles, GC
|
||||||
|
patterns, instruction timings), it can propose microcode
|
||||||
|
optimizations without any LLM involvement. ACL2 proves the
|
||||||
|
optimization preserves correctness; Screamer checks it against
|
||||||
|
known hardware constraints; the gate stack verifies it won't
|
||||||
|
damage the running system. Zero tokens.
|
||||||
|
|
||||||
|
This is the sufficiency flip applied to hardware. The timeline
|
||||||
|
to reach it depends on how many facts the system can gather
|
||||||
|
about its own runtime behavior, not on how good the LLM is.
|
||||||
|
|
||||||
|
** The surprising result
|
||||||
|
|
||||||
|
An LLM that is just barely competent at coding (enough to
|
||||||
|
generate syntactically valid RISC-V or Lisp that passes the
|
||||||
|
symbolic engine's checks after a few retries) is sufficient
|
||||||
|
for the entire bootstrapping chain. It takes longer — more
|
||||||
|
retries, more wall-clock — but it reaches the same endpoint:
|
||||||
|
a system that designs its own microcode without any LLM.
|
||||||
|
|
||||||
|
The LLM's coding ability determines how many API dollars and
|
||||||
|
calendar months the bootstrap requires. It does not determine
|
||||||
|
whether the bootstrap succeeds. A patient operator with a
|
||||||
|
3B local model and a Tenstorrent card reaches the same
|
||||||
|
destination as an operator with bottomless API credits — the
|
||||||
|
second arrives faster, but both arrive.
|
||||||
|
|
||||||
Large refactoring projects (extract module, rename API, split monolith)
|
Large refactoring projects (extract module, rename API, split monolith)
|
||||||
are the hardest test for any AI agent. Current approaches (Claude Code,
|
are the hardest test for any AI agent. Current approaches (Claude Code,
|
||||||
Copilot) handle them probabilistically — every step costs tokens, and
|
Copilot) handle them probabilistically — every step costs tokens, and
|
||||||
|
|||||||
Reference in New Issue
Block a user