Add note: faster theorem proving — engineering approaches (incremental, ATP oracle, decision procedures, LLM guidance, hardware split)
This commit is contained in:
@@ -135,10 +135,9 @@
|
||||
"be9bccc7-5adf-4d0d-8ee4-8855892189bf": "ideas/neurosymbolic-loop-architectures.org",
|
||||
"3a4b5c6d-7e8f-9a0b-1c2d-3e4f5a6b7c8d": "ideas/architectural-integration-three-pronged.org",
|
||||
"2cdca4b0-6b41-44b4-acb0-af21d0e27b00": "ideas/orders-of-magnitude-time.org",
|
||||
"a6a04686-56d3-4cb0-9c3d-4722f8f586f3": "ideas/cl-modernization-corrected-phases.org",
|
||||
"9c0d1e2f-3a4b-5c6d-7e8f-9a0b1c2d3e4f": "ideas/knowledge-tree-middle.org",
|
||||
"dddd52a7-adb8-470e-a459-614ade5f76af": "ideas/closing-the-lisp-gap.org",
|
||||
"a2811c83-b315-47fd-8ab8-25627c389d1a": "ideas/cl-modernization-timeline.org",
|
||||
"3129eae6-f9f2-40fe-a419-8c1af728c86d": "ideas/faster-theorem-proving.org",
|
||||
"f4e5d6c7-b8a9-0c1d-2e3f-4a5b6c7d8e9f": "ideas/schafmeister-clasp-nanotechnology.org",
|
||||
"2afd9a3c-e96a-54c7-ac77-a05a28065b4b": "ideas/biology-parallels.org",
|
||||
"85f963a7-a10f-45cc-ace6-6edfeefee762": "ideas/lisp-provers-and-rust-comparison.org",
|
||||
@@ -147,7 +146,6 @@
|
||||
"0d1e2f3a-4b5c-6d7e-8f9a-0b1c2d3e4f5a": "ideas/world-models-middle-domain.org",
|
||||
"7a8b9c0d-1e2f-3a4b-5c6d-7e8f9a0b1c2d": "ideas/open-source-wolfram-lisp.org",
|
||||
"329a30cd-55fb-496d-a60b-91388c211bba": "ideas/_index.org",
|
||||
"e199190b-5eff-4a77-8aa5-b6c77647e316": "ideas/cl-modernization-initiative.org",
|
||||
"5c6d7e8f-9a0b-1c2d-3e4f-5a6b7c8d9e0f": "ideas/wider-implications-three-pronged.org",
|
||||
"8b9c0d1e-2f3a-4b5c-6d7e-8f9a0b1c2d3e": "ideas/passepartout-bootstrap-mathematica.org",
|
||||
"2f3a4b5c-6d7e-8f9a-0b1c-2d3e4f5a6b7c": "ideas/practical-powers-three-pronged.org"
|
||||
|
||||
76
ideas/faster-theorem-proving.org
Normal file
76
ideas/faster-theorem-proving.org
Normal file
@@ -0,0 +1,76 @@
|
||||
:PROPERTIES:
|
||||
:CREATED: [2026-06-03 Tue]
|
||||
:ID: 3129eae6-f9f2-40fe-a419-8c1af728c86d
|
||||
:END:
|
||||
#+title: Faster Theorem Proving — Engineering Approaches
|
||||
#+filetags: :theorem-proving:engineering:performance:ACL2:HOL:search:verification:
|
||||
|
||||
* Architecture
|
||||
|
||||
Proof engineering has two phases fundamentally different in cost:
|
||||
|
||||
- **Proof checking** — verifying that a candidate proof is correct. Polynomial in proof size, typically microseconds. Never the bottleneck.
|
||||
- **Proof search** — finding the proof. Ranges from polynomial (decidable fragments) to undecidable (general first-order logic). Always the bottleneck.
|
||||
|
||||
Every optimization targets search, not checking.
|
||||
|
||||
* Practical levers, ranked by impact
|
||||
|
||||
**1. Incremental verification.** Only re-prove what changed. Maintain a dependency graph of theorems. On code change, mark downstream theorems stale and re-prove only those. For Passepartout's self-modification loop — where most changes are small relative to the full codebase — this is the single biggest win. A theorem dependency graph is cheap to maintain (proved theorems record which axioms or earlier theorems they depend on) and reduces each verify cycle from "full proof search" to "diff search."
|
||||
|
||||
**2. ATP oracle (Sledgehammer pattern).** First-order automated theorem provers (E, Vampire, Z3) solve the majority of verification conditions (type safety, memory safety, simple invariants) in milliseconds. The architecture: translate the goal to first-order logic, send to ATPs in parallel, reconstruct the proof in the HOL kernel on success. Isabelle/HOL's Sledgehammer has proven this works at scale. For Passepartout, where most properties are first-order expressible, this alone covers 70-80% of verify calls.
|
||||
|
||||
**3. Decision procedures for decidable fragments.** Each decidable fragment gets a dedicated solver:
|
||||
|
||||
- Linear arithmetic → Presburger arithmetic or Z3
|
||||
- Equality with uninterpreted functions → congruence closure
|
||||
- Propositional logic → SAT solver or BDDs
|
||||
- Bit vectors → SAT with bit-blasting
|
||||
|
||||
These run in polynomial or near-polynomial time. The kernel must either trust the solver's output (verified oracle via reflection) or reconstruct the proof in kernel primitives (LCF-style, slower but fully trusted). Reflection — where the kernel itself runs the decision procedure and certifies the result — eliminates the reconstruction overhead entirely.
|
||||
|
||||
**4. Term rewriting DB.** Every proved equality becomes a rewrite rule at no extra cost. The DB is indexed by the outermost function symbol — O(1) lookup by symbol, then pattern match on the term structure. Over a library of thousands of proved theorems, the rewrite engine gets faster without any algorithmic improvement because the rewrite DB covers more cases with each addition.
|
||||
|
||||
The rewrite engine is the hottest inner loop in any prover. Its performance depends on pattern matching speed, which is pointer-chasing through a tree — the worst case for modern CPU cache hierarchies. The optimization: store terms in contiguous arrays (a vector of CONS cells, as Lisp already does via its own heap) so the pattern matcher walks linear memory rather than chasing pointers. SBCL's GC compacts automatically, which helps. A dedicated term store in a typed array would be faster.
|
||||
|
||||
**5. LLM for search guidance, not correctness.** The LLM suggests the next lemma or tactic (cheap, heuristic, runs on GPU). The kernel verifies the result (expensive, reliable, runs on CPU). This separates the problem: the LLM compensates for the prover's blindness, the prover compensates for the LLM's unreliability. Google's AlphaProof and the Thor project for Coq both work this way.
|
||||
|
||||
The LLM's suggestions improve over time as the proof library grows — embedding search over existing proofs finds the closest proved theorem and adapts its structure. This compounds because proved theorems are both the correctness foundation and the search guidance database.
|
||||
|
||||
**6. Parallel subgoal dispatch.** Independent subgoals are sent to separate cores. Each runs a full prover instance with its own term store and rewrite DB. No inter-core communication during search. Shared-memory CPU (AMD EPYC, Threadripper) is ideal here — the rewrite DB lives in L3 cache accessible by all cores without copying. A distributed-memory mesh (Tenstorrent P150) fights the term-rewriting hot loop because term trees are pointer-chasing and don't fit per-core SRAM. The proven split: GPU for LLM guidance, many-core shared-memory CPU for symbolic verification.
|
||||
|
||||
**7. Proof by analogy / structure reuse.** Closely related theorems share proof structure. Given a new goal, find the nearest proved theorem (embedding similarity on the proof library), instantiate its proof template, try the adapted proof. The LLM is a natural engine for this — it can recognize structural patterns that a syntactic search would miss. The agent says "this new property looks like the one we proved for the social protocol's identity layer" and reuses that proof structure with adjusted parameters.
|
||||
|
||||
**8. Cache-friendly data structures.** Pattern matching is pointer-chasing. Modern CPUs are optimized for linear access (prefetchers, wide cache lines). A discrimination tree or path index for rewrite rules — which walks the /rule store/ as a tree rather than the /term/ as a tree — brings the hot path into L1 cache. ACL2's rewrite system is not cache-optimized. A CL implementation using typed arrays for term storage and path-indexed rewrite lookup would see 10-100× speedup on the inner loop, on the same hardware, with no algorithmic change.
|
||||
|
||||
* Hardware split
|
||||
|
||||
The right hardware division for an LLM-guided prover:
|
||||
|
||||
| Component | Hardware | Workload |
|
||||
|-----------|----------|----------|
|
||||
| Search guidance | GPU (NVIDIA) | Neural inference — suggest next lemma, tactic, or proof structure |
|
||||
| Term rewriting | CPU (many-core, shared memory) | Symbolic — pattern matching, substitution, kernel operations |
|
||||
| Decision procedures | CPU (shared memory) or FPGA | SAT/SMT — propositional and arithmetic solving |
|
||||
| Parallel dispatch | CPU cores | Embarrassingly parallel — fan out independent subgoals |
|
||||
|
||||
The key insight: the GPU is for the LLM that suggests proofs. The CPU is for the prover that verifies them. One is neural and benefits from streaming throughput; the other is symbolic and benefits from shared cache and low-latency pointer access. Collapsing both into the same distributed-memory architecture (Tenstorrent mesh) helps the neural side at the cost of the symbolic side.
|
||||
|
||||
* Compounding effect
|
||||
|
||||
An LLM-guided prover gets faster over time in a way a conventional prover does not:
|
||||
|
||||
1. Every proved theorem adds a rewrite rule → rewrite engine covers more cases
|
||||
2. Every proved theorem adds an analogy candidate → LLM has more structural templates
|
||||
3. Every proved theorem adds a ground truth → verified KB grows, reducing future search space
|
||||
4. The prover's own correctness is used to optimize the prover (self-modification, verified by the prover)
|
||||
|
||||
This is not a claim about algorithmic improvement. It is a claim about /coverage compounding/ — the rewrite DB, analogy library, and verified rule set all grow monotonically, and each makes the next proof faster because there is more to build on. The first proof of a new domain is the hardest. The hundredth is nearly free.
|
||||
|
||||
* Relationship to CL Modernization
|
||||
|
||||
The CL Modernization project's Phase 0 (verified HOL kernel) is the smallest and most constrained instance of this problem — ~500-800 lines, well-specified, does not need most of these optimizations. Phase 4 (self-verifying CL stack) requires all of them, because the compiler verification bootstrapping problem is the hardest instance of proof search that exists. The approaches above are ordered by implementation feasibility within Passepartout's timeline: incremental verification and Sledgehammer integration (months), decision procedures and rewrite DB optimization (months to a year), LLM guidance (available now via existing LLMs, improves with domain-specific fine-tuning), cache-friendly data structures (available now, implementation effort), proof analogy (the hardest, because it requires a mature proof library to draw from).
|
||||
|
||||
References:
|
||||
- [[id:971cd9e7-2cc5-4743-8042-2469dbe4078f][CL Modernization]] — the project that builds the prover
|
||||
- [[id:84a537b4-4256-50c8-91f5-dd5b4538418f][Verification appliance]] — hardware for verification
|
||||
Reference in New Issue
Block a user