refactoring: neurosymbolic planner for large codebase changes
Six-stage workflow: codebase ingestion (AST as facts), goal translation (LLM, 10%), Screamer constraint satisfaction (80%), ACL2 plan verification, incremental execution with Merkle snapshots per step and rollback on test failure, final re-verification. Key limit: ACL2 cannot prove semantic equivalence of arbitrary programs. Gap filled by: tests as empirical verification, API contract checking (structural equivalence of public interfaces), human review with full provenance of semantic changes. Comparison with Claude Code: Passepartout trades higher up-front planning overhead for zero-token constraint checks, ACL2-verified scope control, instant per-step rollback, and a Merkle chain from before to after.
This commit is contained in:
@@ -594,6 +594,110 @@ context would not accept an unverified upgrade anyway.
|
|||||||
signed and verified against the hardware root of trust before
|
signed and verified against the hardware root of trust before
|
||||||
applying.
|
applying.
|
||||||
|
|
||||||
|
** Large refactoring in a neurosymbolic planner
|
||||||
|
|
||||||
|
Large refactoring projects (extract module, rename API, split monolith)
|
||||||
|
are the hardest test for any AI agent. Current approaches (Claude Code,
|
||||||
|
Copilot) handle them probabilistically — every step costs tokens, and
|
||||||
|
there is no formal guarantee the final system is consistent.
|
||||||
|
|
||||||
|
Passepartout's Phase 7 planner (10-80-10) transforms this: the symbolic
|
||||||
|
engine handles planning, ordering, and structural verification; the LLM
|
||||||
|
handles only the code transformation itself.
|
||||||
|
|
||||||
|
*** The workflow
|
||||||
|
|
||||||
|
1. **Codebase ingestion.** The scanner walks the entire codebase, builds
|
||||||
|
an abstract syntax graph (not just flat files — imports, type
|
||||||
|
dependencies, function call chains, test coverage). Each module
|
||||||
|
becomes a fact in the fact store with `:provenance :codebase-scan`.
|
||||||
|
|
||||||
|
2. **Goal translation (LLM, 10%).** "Extract authentication into its own
|
||||||
|
service" becomes a structured goal plist:
|
||||||
|
(:goal :extract-module
|
||||||
|
:source :auth
|
||||||
|
:target-service auth-service
|
||||||
|
:files (:affected (:app/auth/* :app/middleware/auth.clj :tests/*))
|
||||||
|
:constraints (:no-breaking-api (:public-api :unchanged))
|
||||||
|
:verification (:all-tests-pass :api-contract-preserved))
|
||||||
|
|
||||||
|
3. **Constraint satisfaction (Screamer, 80%).** Screamer expresses the
|
||||||
|
refactoring as a constraint satisfaction problem:
|
||||||
|
- Variables: file modifications ordered by dependency (auth middleware
|
||||||
|
must be updated after auth module is extracted but before its callers)
|
||||||
|
- Constraints: no circular dependencies, no step that creates broken
|
||||||
|
intermediate state, no test file modified before its source
|
||||||
|
- Objective: minimize total steps while respecting all constraints
|
||||||
|
|
||||||
|
Screamer returns a viable plan or reports unsolvability with the
|
||||||
|
conflicting constraints — for example, "auth middleware and auth
|
||||||
|
module have a circular type dependency that must be resolved before
|
||||||
|
extraction."
|
||||||
|
|
||||||
|
4. **Plan verification (ACL2, part of 80%).** ACL2 proves:
|
||||||
|
- No dependency cycles in the plan (A must run before B, B before C,
|
||||||
|
C before A → rejected)
|
||||||
|
- No deadlocks (two modules waiting on each other)
|
||||||
|
- Every planned write is within the refactoring scope (no stray
|
||||||
|
modifications to unrelated files)
|
||||||
|
- The gate stack will not reject any planned command (no blocked
|
||||||
|
patterns in the refactoring scripts)
|
||||||
|
|
||||||
|
5. **Incremental execution.** The planner executes each step:
|
||||||
|
a. Take Merkle snapshot of the current state
|
||||||
|
b. LLM proposes the code transformation for this step
|
||||||
|
c. Gate stack checks the proposal (no forbidden file writes,
|
||||||
|
no shell commands outside the allowed refactoring scope)
|
||||||
|
d. Transformation is applied
|
||||||
|
e. Tests run. If they pass → commit the snapshot, update the
|
||||||
|
fact store with the new codebase graph. If they fail → roll
|
||||||
|
back to the previous snapshot, flag the LLM's proposal as
|
||||||
|
`:provenance :failed-transformation`, and attempt a corrected
|
||||||
|
proposal.
|
||||||
|
|
||||||
|
6. **Final verification.** After all steps complete, ACL2 re-verifies
|
||||||
|
the entire codebase fact store — all dependencies, all public API
|
||||||
|
surfaces, all constraints. The result is a Merkle chain from
|
||||||
|
"before" to "after" with every intermediate state verified.
|
||||||
|
|
||||||
|
*** What the symbolic engine cannot do
|
||||||
|
|
||||||
|
The fundamental limit: **first-order logic cannot prove semantic
|
||||||
|
equivalence of arbitrary programs.** ACL2 can verify that the
|
||||||
|
refactoring plan has no structural flaws, that the dependency graph
|
||||||
|
is acyclic, that every step is within scope, and that tests pass.
|
||||||
|
It cannot prove "the extracted auth service behaves identically to
|
||||||
|
the inline auth module from the caller's perspective" for a
|
||||||
|
general-purpose language.
|
||||||
|
|
||||||
|
This gap is filled by:
|
||||||
|
- **Tests as empirical verification.** The planner runs the full test
|
||||||
|
suite after each step. A passing test suite is not a proof of
|
||||||
|
correctness, but combined with ACL2's structural verification, it
|
||||||
|
is strong empirical evidence.
|
||||||
|
- **API contract checking.** For refactoring that preserves public APIs,
|
||||||
|
ACL2 can verify that the type signatures, argument counts, and
|
||||||
|
return types of the extracted module's public interface match
|
||||||
|
exactly — a structural equivalence that does not require semantic
|
||||||
|
reasoning.
|
||||||
|
- **Human review of semantic concerns.** The planner flags steps that
|
||||||
|
involve semantic choices (e.g., "the extracted auth service now
|
||||||
|
handles token refresh differently"). These steps are presented to
|
||||||
|
the developer for review with full provenance: before state, after
|
||||||
|
state, and the diff. The developer's approval or rejection becomes
|
||||||
|
a Merkle fact with `:provenance :human-reviewed`.
|
||||||
|
|
||||||
|
*** What makes this better than Claude Code
|
||||||
|
|
||||||
|
| Dimension | Claude Code | Passepartout Planner |
|
||||||
|
|-----------|-------------|---------------------|
|
||||||
|
| Planning | Prompt-based, implicit | Screamer CSP, explicit, verified |
|
||||||
|
| Step ordering | Greedy, every step costs tokens | Dependency-ordered, zero-token constraint check |
|
||||||
|
| Rollback | Limited (git reset, no per-step) | Merkle snapshot per step, instant rollback |
|
||||||
|
| Scope control | Prompt-based ("only touch auth files") | ACL2-verified write scope, cannot escape |
|
||||||
|
| Cost | $0.50-$2 per refactoring session | Near-zero (CPU cycles for planning, LLM for code only) |
|
||||||
|
| Final proof | None — you trust that tests caught everything | Merkle chain from before→after, ACL2-verified |
|
||||||
|
|
||||||
A software ecosystem changing hardware economics has never happened before.
|
A software ecosystem changing hardware economics has never happened before.
|
||||||
Passepartout's most realistic path: verification appliances for regulated
|
Passepartout's most realistic path: verification appliances for regulated
|
||||||
industries — RISC-V cores with Lisp microcode on FPGA, sold as hardened
|
industries — RISC-V cores with Lisp microcode on FPGA, sold as hardened
|
||||||
|
|||||||
Reference in New Issue
Block a user