refactoring: neurosymbolic planner for large codebase changes

Six-stage workflow: codebase ingestion (AST as facts), goal translation (LLM, 10%), Screamer constraint satisfaction (80%), ACL2 plan verification, incremental execution with Merkle snapshots per step and rollback on test failure, final re-verification. Key limit: ACL2 cannot prove semantic equivalence of arbitrary programs. Gap filled by: tests as empirical verification, API contract checking (structural equivalence of public interfaces), human review with full provenance of semantic changes. Comparison with Claude Code: Passepartout trades higher up-front planning overhead for zero-token constraint checks, ACL2-verified scope control, instant per-step rollback, and a Merkle chain from before to after.
2026-05-21 18:32:19 +00:00
parent 18b7c2f06f
commit 852fcae4a6
1 changed files with 104 additions and 0 deletions
--- a/ideas/passepartout-economics.org
+++ b/ideas/passepartout-economics.org
@@ -594,6 +594,110 @@ context would not accept an unverified upgrade anyway.
  signed and verified against the hardware root of trust before
  applying.
 ** Large refactoring in a neurosymbolic planner
 Large refactoring projects (extract module, rename API, split monolith)
 are the hardest test for any AI agent. Current approaches (Claude Code,
 Copilot) handle them probabilistically — every step costs tokens, and
 there is no formal guarantee the final system is consistent.
 Passepartout's Phase 7 planner (10-80-10) transforms this: the symbolic
 engine handles planning, ordering, and structural verification; the LLM
 handles only the code transformation itself.
 *** The workflow
 1. **Codebase ingestion.** The scanner walks the entire codebase, builds
   an abstract syntax graph (not just flat files — imports, type
   dependencies, function call chains, test coverage). Each module
   becomes a fact in the fact store with `:provenance :codebase-scan`.
 2. **Goal translation (LLM, 10%).** "Extract authentication into its own
   service" becomes a structured goal plist:
   (:goal :extract-module
    :source :auth
    :target-service auth-service
    :files (:affected (:app/auth/* :app/middleware/auth.clj :tests/*))
    :constraints (:no-breaking-api (:public-api :unchanged))
    :verification (:all-tests-pass :api-contract-preserved))
 3. **Constraint satisfaction (Screamer, 80%).** Screamer expresses the
   refactoring as a constraint satisfaction problem:
   - Variables: file modifications ordered by dependency (auth middleware
     must be updated after auth module is extracted but before its callers)
   - Constraints: no circular dependencies, no step that creates broken
     intermediate state, no test file modified before its source
   - Objective: minimize total steps while respecting all constraints
   Screamer returns a viable plan or reports unsolvability with the
   conflicting constraints — for example, "auth middleware and auth
   module have a circular type dependency that must be resolved before
   extraction."
 4. **Plan verification (ACL2, part of 80%).** ACL2 proves:
   - No dependency cycles in the plan (A must run before B, B before C,
     C before A → rejected)
   - No deadlocks (two modules waiting on each other)
   - Every planned write is within the refactoring scope (no stray
     modifications to unrelated files)
   - The gate stack will not reject any planned command (no blocked
     patterns in the refactoring scripts)
 5. **Incremental execution.** The planner executes each step:
   a. Take Merkle snapshot of the current state
   b. LLM proposes the code transformation for this step
   c. Gate stack checks the proposal (no forbidden file writes,
      no shell commands outside the allowed refactoring scope)
   d. Transformation is applied
   e. Tests run. If they pass → commit the snapshot, update the
      fact store with the new codebase graph. If they fail → roll
      back to the previous snapshot, flag the LLM's proposal as
      `:provenance :failed-transformation`, and attempt a corrected
      proposal.
 6. **Final verification.** After all steps complete, ACL2 re-verifies
   the entire codebase fact store — all dependencies, all public API
   surfaces, all constraints. The result is a Merkle chain from
   "before" to "after" with every intermediate state verified.
 *** What the symbolic engine cannot do
 The fundamental limit: **first-order logic cannot prove semantic
 equivalence of arbitrary programs.** ACL2 can verify that the
 refactoring plan has no structural flaws, that the dependency graph
 is acyclic, that every step is within scope, and that tests pass.
 It cannot prove "the extracted auth service behaves identically to
 the inline auth module from the caller's perspective" for a
 general-purpose language.
 This gap is filled by:
 - **Tests as empirical verification.** The planner runs the full test
  suite after each step. A passing test suite is not a proof of
  correctness, but combined with ACL2's structural verification, it
  is strong empirical evidence.
 - **API contract checking.** For refactoring that preserves public APIs,
  ACL2 can verify that the type signatures, argument counts, and
  return types of the extracted module's public interface match
  exactly — a structural equivalence that does not require semantic
  reasoning.
 - **Human review of semantic concerns.** The planner flags steps that
  involve semantic choices (e.g., "the extracted auth service now
  handles token refresh differently"). These steps are presented to
  the developer for review with full provenance: before state, after
  state, and the diff. The developer's approval or rejection becomes
  a Merkle fact with `:provenance :human-reviewed`.
 *** What makes this better than Claude Code
 | Dimension | Claude Code | Passepartout Planner |
 |-----------|-------------|---------------------|
 | Planning | Prompt-based, implicit | Screamer CSP, explicit, verified |
 | Step ordering | Greedy, every step costs tokens | Dependency-ordered, zero-token constraint check |
 | Rollback | Limited (git reset, no per-step) | Merkle snapshot per step, instant rollback |
 | Scope control | Prompt-based ("only touch auth files") | ACL2-verified write scope, cannot escape |
 | Cost | $0.50-$2 per refactoring session | Near-zero (CPU cycles for planning, LLM for code only) |
 | Final proof | None — you trust that tests caught everything | Merkle chain from before→after, ACL2-verified |
 A software ecosystem changing hardware economics has never happened before.
 Passepartout's most realistic path: verification appliances for regulated
 industries — RISC-V cores with Lisp microcode on FPGA, sold as hardened