refactoring: neurosymbolic planner for large codebase changes

Six-stage workflow: codebase ingestion (AST as facts), goal translation
(LLM, 10%), Screamer constraint satisfaction (80%), ACL2 plan verification,
incremental execution with Merkle snapshots per step and rollback on test
failure, final re-verification.

Key limit: ACL2 cannot prove semantic equivalence of arbitrary programs.
Gap filled by: tests as empirical verification, API contract checking
(structural equivalence of public interfaces), human review with full
provenance of semantic changes.

Comparison with Claude Code: Passepartout trades higher up-front planning
overhead for zero-token constraint checks, ACL2-verified scope control,
instant per-step rollback, and a Merkle chain from before to after.
This commit is contained in:
Hermes
2026-05-21 18:32:19 +00:00
parent 18b7c2f06f
commit 852fcae4a6

View File

@@ -594,6 +594,110 @@ context would not accept an unverified upgrade anyway.
signed and verified against the hardware root of trust before
applying.
** Large refactoring in a neurosymbolic planner
Large refactoring projects (extract module, rename API, split monolith)
are the hardest test for any AI agent. Current approaches (Claude Code,
Copilot) handle them probabilistically — every step costs tokens, and
there is no formal guarantee the final system is consistent.
Passepartout's Phase 7 planner (10-80-10) transforms this: the symbolic
engine handles planning, ordering, and structural verification; the LLM
handles only the code transformation itself.
*** The workflow
1. **Codebase ingestion.** The scanner walks the entire codebase, builds
an abstract syntax graph (not just flat files — imports, type
dependencies, function call chains, test coverage). Each module
becomes a fact in the fact store with `:provenance :codebase-scan`.
2. **Goal translation (LLM, 10%).** "Extract authentication into its own
service" becomes a structured goal plist:
(:goal :extract-module
:source :auth
:target-service auth-service
:files (:affected (:app/auth/* :app/middleware/auth.clj :tests/*))
:constraints (:no-breaking-api (:public-api :unchanged))
:verification (:all-tests-pass :api-contract-preserved))
3. **Constraint satisfaction (Screamer, 80%).** Screamer expresses the
refactoring as a constraint satisfaction problem:
- Variables: file modifications ordered by dependency (auth middleware
must be updated after auth module is extracted but before its callers)
- Constraints: no circular dependencies, no step that creates broken
intermediate state, no test file modified before its source
- Objective: minimize total steps while respecting all constraints
Screamer returns a viable plan or reports unsolvability with the
conflicting constraints — for example, "auth middleware and auth
module have a circular type dependency that must be resolved before
extraction."
4. **Plan verification (ACL2, part of 80%).** ACL2 proves:
- No dependency cycles in the plan (A must run before B, B before C,
C before A → rejected)
- No deadlocks (two modules waiting on each other)
- Every planned write is within the refactoring scope (no stray
modifications to unrelated files)
- The gate stack will not reject any planned command (no blocked
patterns in the refactoring scripts)
5. **Incremental execution.** The planner executes each step:
a. Take Merkle snapshot of the current state
b. LLM proposes the code transformation for this step
c. Gate stack checks the proposal (no forbidden file writes,
no shell commands outside the allowed refactoring scope)
d. Transformation is applied
e. Tests run. If they pass → commit the snapshot, update the
fact store with the new codebase graph. If they fail → roll
back to the previous snapshot, flag the LLM's proposal as
`:provenance :failed-transformation`, and attempt a corrected
proposal.
6. **Final verification.** After all steps complete, ACL2 re-verifies
the entire codebase fact store — all dependencies, all public API
surfaces, all constraints. The result is a Merkle chain from
"before" to "after" with every intermediate state verified.
*** What the symbolic engine cannot do
The fundamental limit: **first-order logic cannot prove semantic
equivalence of arbitrary programs.** ACL2 can verify that the
refactoring plan has no structural flaws, that the dependency graph
is acyclic, that every step is within scope, and that tests pass.
It cannot prove "the extracted auth service behaves identically to
the inline auth module from the caller's perspective" for a
general-purpose language.
This gap is filled by:
- **Tests as empirical verification.** The planner runs the full test
suite after each step. A passing test suite is not a proof of
correctness, but combined with ACL2's structural verification, it
is strong empirical evidence.
- **API contract checking.** For refactoring that preserves public APIs,
ACL2 can verify that the type signatures, argument counts, and
return types of the extracted module's public interface match
exactly — a structural equivalence that does not require semantic
reasoning.
- **Human review of semantic concerns.** The planner flags steps that
involve semantic choices (e.g., "the extracted auth service now
handles token refresh differently"). These steps are presented to
the developer for review with full provenance: before state, after
state, and the diff. The developer's approval or rejection becomes
a Merkle fact with `:provenance :human-reviewed`.
*** What makes this better than Claude Code
| Dimension | Claude Code | Passepartout Planner |
|-----------|-------------|---------------------|
| Planning | Prompt-based, implicit | Screamer CSP, explicit, verified |
| Step ordering | Greedy, every step costs tokens | Dependency-ordered, zero-token constraint check |
| Rollback | Limited (git reset, no per-step) | Merkle snapshot per step, instant rollback |
| Scope control | Prompt-based ("only touch auth files") | ACL2-verified write scope, cannot escape |
| Cost | $0.50-$2 per refactoring session | Near-zero (CPU cycles for planning, LLM for code only) |
| Final proof | None — you trust that tests caught everything | Merkle chain from before→after, ACL2-verified |
A software ecosystem changing hardware economics has never happened before.
Passepartout's most realistic path: verification appliances for regulated
industries — RISC-V cores with Lisp microcode on FPGA, sold as hardened