diff --git a/ideas/passepartout-economics.org b/ideas/passepartout-economics.org index 33eac30..c9005ef 100644 --- a/ideas/passepartout-economics.org +++ b/ideas/passepartout-economics.org @@ -594,6 +594,110 @@ context would not accept an unverified upgrade anyway. signed and verified against the hardware root of trust before applying. +** Large refactoring in a neurosymbolic planner + +Large refactoring projects (extract module, rename API, split monolith) +are the hardest test for any AI agent. Current approaches (Claude Code, +Copilot) handle them probabilistically — every step costs tokens, and +there is no formal guarantee the final system is consistent. + +Passepartout's Phase 7 planner (10-80-10) transforms this: the symbolic +engine handles planning, ordering, and structural verification; the LLM +handles only the code transformation itself. + +*** The workflow + +1. **Codebase ingestion.** The scanner walks the entire codebase, builds + an abstract syntax graph (not just flat files — imports, type + dependencies, function call chains, test coverage). Each module + becomes a fact in the fact store with `:provenance :codebase-scan`. + +2. **Goal translation (LLM, 10%).** "Extract authentication into its own + service" becomes a structured goal plist: + (:goal :extract-module + :source :auth + :target-service auth-service + :files (:affected (:app/auth/* :app/middleware/auth.clj :tests/*)) + :constraints (:no-breaking-api (:public-api :unchanged)) + :verification (:all-tests-pass :api-contract-preserved)) + +3. **Constraint satisfaction (Screamer, 80%).** Screamer expresses the + refactoring as a constraint satisfaction problem: + - Variables: file modifications ordered by dependency (auth middleware + must be updated after auth module is extracted but before its callers) + - Constraints: no circular dependencies, no step that creates broken + intermediate state, no test file modified before its source + - Objective: minimize total steps while respecting all constraints + + Screamer returns a viable plan or reports unsolvability with the + conflicting constraints — for example, "auth middleware and auth + module have a circular type dependency that must be resolved before + extraction." + +4. **Plan verification (ACL2, part of 80%).** ACL2 proves: + - No dependency cycles in the plan (A must run before B, B before C, + C before A → rejected) + - No deadlocks (two modules waiting on each other) + - Every planned write is within the refactoring scope (no stray + modifications to unrelated files) + - The gate stack will not reject any planned command (no blocked + patterns in the refactoring scripts) + +5. **Incremental execution.** The planner executes each step: + a. Take Merkle snapshot of the current state + b. LLM proposes the code transformation for this step + c. Gate stack checks the proposal (no forbidden file writes, + no shell commands outside the allowed refactoring scope) + d. Transformation is applied + e. Tests run. If they pass → commit the snapshot, update the + fact store with the new codebase graph. If they fail → roll + back to the previous snapshot, flag the LLM's proposal as + `:provenance :failed-transformation`, and attempt a corrected + proposal. + +6. **Final verification.** After all steps complete, ACL2 re-verifies + the entire codebase fact store — all dependencies, all public API + surfaces, all constraints. The result is a Merkle chain from + "before" to "after" with every intermediate state verified. + +*** What the symbolic engine cannot do + +The fundamental limit: **first-order logic cannot prove semantic +equivalence of arbitrary programs.** ACL2 can verify that the +refactoring plan has no structural flaws, that the dependency graph +is acyclic, that every step is within scope, and that tests pass. +It cannot prove "the extracted auth service behaves identically to +the inline auth module from the caller's perspective" for a +general-purpose language. + +This gap is filled by: +- **Tests as empirical verification.** The planner runs the full test + suite after each step. A passing test suite is not a proof of + correctness, but combined with ACL2's structural verification, it + is strong empirical evidence. +- **API contract checking.** For refactoring that preserves public APIs, + ACL2 can verify that the type signatures, argument counts, and + return types of the extracted module's public interface match + exactly — a structural equivalence that does not require semantic + reasoning. +- **Human review of semantic concerns.** The planner flags steps that + involve semantic choices (e.g., "the extracted auth service now + handles token refresh differently"). These steps are presented to + the developer for review with full provenance: before state, after + state, and the diff. The developer's approval or rejection becomes + a Merkle fact with `:provenance :human-reviewed`. + +*** What makes this better than Claude Code + +| Dimension | Claude Code | Passepartout Planner | +|-----------|-------------|---------------------| +| Planning | Prompt-based, implicit | Screamer CSP, explicit, verified | +| Step ordering | Greedy, every step costs tokens | Dependency-ordered, zero-token constraint check | +| Rollback | Limited (git reset, no per-step) | Merkle snapshot per step, instant rollback | +| Scope control | Prompt-based ("only touch auth files") | ACL2-verified write scope, cannot escape | +| Cost | $0.50-$2 per refactoring session | Near-zero (CPU cycles for planning, LLM for code only) | +| Final proof | None — you trust that tests caught everything | Merkle chain from before→after, ACL2-verified | + A software ecosystem changing hardware economics has never happened before. Passepartout's most realistic path: verification appliances for regulated industries — RISC-V cores with Lisp microcode on FPGA, sold as hardened