refactoring: neurosymbolic planner for large codebase changes

Six-stage workflow: codebase ingestion (AST as facts), goal translation (LLM, 10%), Screamer constraint satisfaction (80%), ACL2 plan verification, incremental execution with Merkle snapshots per step and rollback on test failure, final re-verification. Key limit: ACL2 cannot prove semantic equivalence of arbitrary programs. Gap filled by: tests as empirical verification, API contract checking (structural equivalence of public interfaces), human review with full provenance of semantic changes. Comparison with Claude Code: Passepartout trades higher up-front planning overhead for zero-token constraint checks, ACL2-verified scope control, instant per-step rollback, and a Merkle chain from before to after.
2026-05-21 18:32:19 +00:00
parent 18b7c2f06f
commit 852fcae4a6
1 changed files with 104 additions and 0 deletions
--- a/ideas/passepartout-economics.org
+++ b/ideas/passepartout-economics.org
@@ -594,6 +594,110 @@ context would not accept an unverified upgrade anyway.
  signed and verified against the hardware root of trust before
  applying.

+** Large refactoring in a neurosymbolic planner
+
+Large refactoring projects (extract module, rename API, split monolith)
+are the hardest test for any AI agent. Current approaches (Claude Code,
+Copilot) handle them probabilistically — every step costs tokens, and
+there is no formal guarantee the final system is consistent.
+
+Passepartout's Phase 7 planner (10-80-10) transforms this: the symbolic
+engine handles planning, ordering, and structural verification; the LLM
+handles only the code transformation itself.
+
+*** The workflow
+
+1. **Codebase ingestion.** The scanner walks the entire codebase, builds
+   an abstract syntax graph (not just flat files — imports, type
+   dependencies, function call chains, test coverage). Each module
+   becomes a fact in the fact store with `:provenance :codebase-scan`.
+
+2. **Goal translation (LLM, 10%).** "Extract authentication into its own
+   service" becomes a structured goal plist:
+   (:goal :extract-module
+    :source :auth
+    :target-service auth-service
+    :files (:affected (:app/auth/* :app/middleware/auth.clj :tests/*))
+    :constraints (:no-breaking-api (:public-api :unchanged))
+    :verification (:all-tests-pass :api-contract-preserved))
+
+3. **Constraint satisfaction (Screamer, 80%).** Screamer expresses the
+   refactoring as a constraint satisfaction problem:
+   - Variables: file modifications ordered by dependency (auth middleware
+     must be updated after auth module is extracted but before its callers)
+   - Constraints: no circular dependencies, no step that creates broken
+     intermediate state, no test file modified before its source
+   - Objective: minimize total steps while respecting all constraints
+
+   Screamer returns a viable plan or reports unsolvability with the
+   conflicting constraints — for example, "auth middleware and auth
+   module have a circular type dependency that must be resolved before
+   extraction."
+
+4. **Plan verification (ACL2, part of 80%).** ACL2 proves:
+   - No dependency cycles in the plan (A must run before B, B before C,
+     C before A → rejected)
+   - No deadlocks (two modules waiting on each other)
+   - Every planned write is within the refactoring scope (no stray
+     modifications to unrelated files)
+   - The gate stack will not reject any planned command (no blocked
+     patterns in the refactoring scripts)
+
+5. **Incremental execution.** The planner executes each step:
+   a. Take Merkle snapshot of the current state
+   b. LLM proposes the code transformation for this step
+   c. Gate stack checks the proposal (no forbidden file writes,
+      no shell commands outside the allowed refactoring scope)
+   d. Transformation is applied
+   e. Tests run. If they pass → commit the snapshot, update the
+      fact store with the new codebase graph. If they fail → roll
+      back to the previous snapshot, flag the LLM's proposal as
+      `:provenance :failed-transformation`, and attempt a corrected
+      proposal.
+
+6. **Final verification.** After all steps complete, ACL2 re-verifies
+   the entire codebase fact store — all dependencies, all public API
+   surfaces, all constraints. The result is a Merkle chain from
+   "before" to "after" with every intermediate state verified.
+
+*** What the symbolic engine cannot do
+
+The fundamental limit: **first-order logic cannot prove semantic
+equivalence of arbitrary programs.** ACL2 can verify that the
+refactoring plan has no structural flaws, that the dependency graph
+is acyclic, that every step is within scope, and that tests pass.
+It cannot prove "the extracted auth service behaves identically to
+the inline auth module from the caller's perspective" for a
+general-purpose language.
+
+This gap is filled by:
+- **Tests as empirical verification.** The planner runs the full test
+  suite after each step. A passing test suite is not a proof of
+  correctness, but combined with ACL2's structural verification, it
+  is strong empirical evidence.
+- **API contract checking.** For refactoring that preserves public APIs,
+  ACL2 can verify that the type signatures, argument counts, and
+  return types of the extracted module's public interface match
+  exactly — a structural equivalence that does not require semantic
+  reasoning.
+- **Human review of semantic concerns.** The planner flags steps that
+  involve semantic choices (e.g., "the extracted auth service now
+  handles token refresh differently"). These steps are presented to
+  the developer for review with full provenance: before state, after
+  state, and the diff. The developer's approval or rejection becomes
+  a Merkle fact with `:provenance :human-reviewed`.
+
+*** What makes this better than Claude Code
+
+| Dimension | Claude Code | Passepartout Planner |
+|-----------|-------------|---------------------|
+| Planning | Prompt-based, implicit | Screamer CSP, explicit, verified |
+| Step ordering | Greedy, every step costs tokens | Dependency-ordered, zero-token constraint check |
+| Rollback | Limited (git reset, no per-step) | Merkle snapshot per step, instant rollback |
+| Scope control | Prompt-based ("only touch auth files") | ACL2-verified write scope, cannot escape |
+| Cost | $0.50-$2 per refactoring session | Near-zero (CPU cycles for planning, LLM for code only) |
+| Final proof | None — you trust that tests caught everything | Merkle chain from before→after, ACL2-verified |
+
 A software ecosystem changing hardware economics has never happened before.
 Passepartout's most realistic path: verification appliances for regulated
 industries — RISC-V cores with Lisp microcode on FPGA, sold as hardened