refactoring: neurosymbolic planner for large codebase changes
Six-stage workflow: codebase ingestion (AST as facts), goal translation (LLM, 10%), Screamer constraint satisfaction (80%), ACL2 plan verification, incremental execution with Merkle snapshots per step and rollback on test failure, final re-verification. Key limit: ACL2 cannot prove semantic equivalence of arbitrary programs. Gap filled by: tests as empirical verification, API contract checking (structural equivalence of public interfaces), human review with full provenance of semantic changes. Comparison with Claude Code: Passepartout trades higher up-front planning overhead for zero-token constraint checks, ACL2-verified scope control, instant per-step rollback, and a Merkle chain from before to after.
This commit is contained in:
@@ -594,6 +594,110 @@ context would not accept an unverified upgrade anyway.
|
||||
signed and verified against the hardware root of trust before
|
||||
applying.
|
||||
|
||||
** Large refactoring in a neurosymbolic planner
|
||||
|
||||
Large refactoring projects (extract module, rename API, split monolith)
|
||||
are the hardest test for any AI agent. Current approaches (Claude Code,
|
||||
Copilot) handle them probabilistically — every step costs tokens, and
|
||||
there is no formal guarantee the final system is consistent.
|
||||
|
||||
Passepartout's Phase 7 planner (10-80-10) transforms this: the symbolic
|
||||
engine handles planning, ordering, and structural verification; the LLM
|
||||
handles only the code transformation itself.
|
||||
|
||||
*** The workflow
|
||||
|
||||
1. **Codebase ingestion.** The scanner walks the entire codebase, builds
|
||||
an abstract syntax graph (not just flat files — imports, type
|
||||
dependencies, function call chains, test coverage). Each module
|
||||
becomes a fact in the fact store with `:provenance :codebase-scan`.
|
||||
|
||||
2. **Goal translation (LLM, 10%).** "Extract authentication into its own
|
||||
service" becomes a structured goal plist:
|
||||
(:goal :extract-module
|
||||
:source :auth
|
||||
:target-service auth-service
|
||||
:files (:affected (:app/auth/* :app/middleware/auth.clj :tests/*))
|
||||
:constraints (:no-breaking-api (:public-api :unchanged))
|
||||
:verification (:all-tests-pass :api-contract-preserved))
|
||||
|
||||
3. **Constraint satisfaction (Screamer, 80%).** Screamer expresses the
|
||||
refactoring as a constraint satisfaction problem:
|
||||
- Variables: file modifications ordered by dependency (auth middleware
|
||||
must be updated after auth module is extracted but before its callers)
|
||||
- Constraints: no circular dependencies, no step that creates broken
|
||||
intermediate state, no test file modified before its source
|
||||
- Objective: minimize total steps while respecting all constraints
|
||||
|
||||
Screamer returns a viable plan or reports unsolvability with the
|
||||
conflicting constraints — for example, "auth middleware and auth
|
||||
module have a circular type dependency that must be resolved before
|
||||
extraction."
|
||||
|
||||
4. **Plan verification (ACL2, part of 80%).** ACL2 proves:
|
||||
- No dependency cycles in the plan (A must run before B, B before C,
|
||||
C before A → rejected)
|
||||
- No deadlocks (two modules waiting on each other)
|
||||
- Every planned write is within the refactoring scope (no stray
|
||||
modifications to unrelated files)
|
||||
- The gate stack will not reject any planned command (no blocked
|
||||
patterns in the refactoring scripts)
|
||||
|
||||
5. **Incremental execution.** The planner executes each step:
|
||||
a. Take Merkle snapshot of the current state
|
||||
b. LLM proposes the code transformation for this step
|
||||
c. Gate stack checks the proposal (no forbidden file writes,
|
||||
no shell commands outside the allowed refactoring scope)
|
||||
d. Transformation is applied
|
||||
e. Tests run. If they pass → commit the snapshot, update the
|
||||
fact store with the new codebase graph. If they fail → roll
|
||||
back to the previous snapshot, flag the LLM's proposal as
|
||||
`:provenance :failed-transformation`, and attempt a corrected
|
||||
proposal.
|
||||
|
||||
6. **Final verification.** After all steps complete, ACL2 re-verifies
|
||||
the entire codebase fact store — all dependencies, all public API
|
||||
surfaces, all constraints. The result is a Merkle chain from
|
||||
"before" to "after" with every intermediate state verified.
|
||||
|
||||
*** What the symbolic engine cannot do
|
||||
|
||||
The fundamental limit: **first-order logic cannot prove semantic
|
||||
equivalence of arbitrary programs.** ACL2 can verify that the
|
||||
refactoring plan has no structural flaws, that the dependency graph
|
||||
is acyclic, that every step is within scope, and that tests pass.
|
||||
It cannot prove "the extracted auth service behaves identically to
|
||||
the inline auth module from the caller's perspective" for a
|
||||
general-purpose language.
|
||||
|
||||
This gap is filled by:
|
||||
- **Tests as empirical verification.** The planner runs the full test
|
||||
suite after each step. A passing test suite is not a proof of
|
||||
correctness, but combined with ACL2's structural verification, it
|
||||
is strong empirical evidence.
|
||||
- **API contract checking.** For refactoring that preserves public APIs,
|
||||
ACL2 can verify that the type signatures, argument counts, and
|
||||
return types of the extracted module's public interface match
|
||||
exactly — a structural equivalence that does not require semantic
|
||||
reasoning.
|
||||
- **Human review of semantic concerns.** The planner flags steps that
|
||||
involve semantic choices (e.g., "the extracted auth service now
|
||||
handles token refresh differently"). These steps are presented to
|
||||
the developer for review with full provenance: before state, after
|
||||
state, and the diff. The developer's approval or rejection becomes
|
||||
a Merkle fact with `:provenance :human-reviewed`.
|
||||
|
||||
*** What makes this better than Claude Code
|
||||
|
||||
| Dimension | Claude Code | Passepartout Planner |
|
||||
|-----------|-------------|---------------------|
|
||||
| Planning | Prompt-based, implicit | Screamer CSP, explicit, verified |
|
||||
| Step ordering | Greedy, every step costs tokens | Dependency-ordered, zero-token constraint check |
|
||||
| Rollback | Limited (git reset, no per-step) | Merkle snapshot per step, instant rollback |
|
||||
| Scope control | Prompt-based ("only touch auth files") | ACL2-verified write scope, cannot escape |
|
||||
| Cost | $0.50-$2 per refactoring session | Near-zero (CPU cycles for planning, LLM for code only) |
|
||||
| Final proof | None — you trust that tests caught everything | Merkle chain from before→after, ACL2-verified |
|
||||
|
||||
A software ecosystem changing hardware economics has never happened before.
|
||||
Passepartout's most realistic path: verification appliances for regulated
|
||||
industries — RISC-V cores with Lisp microcode on FPGA, sold as hardened
|
||||
|
||||
Reference in New Issue
Block a user