add note: architectural integration of three-pronged system

2026-05-25 00:38:48 +00:00
parent 73fc33f02f
commit 0b77ea0ac9
1 changed files with 106 additions and 0 deletions
--- a/ideas/architectural-integration-three-pronged.org
+++ b/ideas/architectural-integration-three-pronged.org
@@ -0,0 +1,106 @@
 :PROPERTIES:
 :CREATED:  [2026-05-25 Mon]
 :ID:       3a4b5c6d-7e8f-9a0b-1c2d-3e4f5a6b7c8d
 :END:
 #+title: Architectural Integration of the Three-Pronged System
 #+filetags: :ideas:passepartout:architecture:
 An analysis of how the deductive / provenance-tracked empirical / probabilistic oracle model fits into Passepartout's architecture, at what stage it becomes operational, and what it means for the existing subsystems.
 **The three prongs are not three engines.**
 The initial framing (deductive + provenance + probabilistic) implies three parallel reasoning systems. It is more accurate to say: two reasoning engines and one data layer.
 - **The symbolic engine** handles everything that can be formalized: deductive proofs, empirical equations, validity predicates, pipeline composition, uncertainty propagation. This is one engine — it reasons about symbols using rules that are either proven (ACL2) or well-defined (force field equations).
 - **The probabilistic oracle** (LLM) handles everything that cannot be formalized: parameter selection, model choice, interpretation of results in natural language, failure diagnosis, creative hypothesis generation. It proposes; the symbolic engine checks.
 - **The provenance store** is not an engine. It is a structured database that stores empirical parameter sets, validity envelopes, experimental benchmarks, and comparison histories. Neither engine reasons about it as a whole. The symbolic engine queries it for parameters and validity predicates. The LLM queries it for context and updates it with new data.
 Two reasoning engines. One curated data layer. This is a cleaner architecture than three parallel systems.
 **Where it lives in the existing subsystems.**
 The current architecture has four subsystems: Environment, Knowledge, Verification, Social Protocol. The three-pronged model cross-cuts them:
 | Subsystem | Deductive role | Empirical role | Oracle role |
 |---|---|---|---|
 | Environment | Hosts the symbolic engine, runs ACL2 | Hosts the provenance store | Runs the LLM |
 | Knowledge | Stores formal theorems and proofs (symbolic index) | Stores empirical parameters and benchmarks (provenance store) | Neural index for semantic search |
 | Verification | ACL2 proof checking, formal gate rules | Validity envelope checks, parameter provenance checks | Gate policy interpretation (LLM evaluates natural-language rules) |
 | Social Protocol | Sharing verified proofs between instances | Sharing validation histories and benchmark results between instances | Sharing model selection strategies |
 The verification subsystem (the gate) is the integration point. Every action that reaches the gate is checked against:
 1. Security policy (is this action safe?)
 2. Scientific validity (is this model valid in this context?)
 3. Consistency (do the symbolic check and the oracle's assessment agree?)
 These three checks run as separate gate vectors with the same architecture as every other gate check. No new mechanism needed — just new predicates with access to the provenance store.
 **At what stage it becomes operational.**
 The infrastructure is staged, not all-at-once:
 - **Stage 0 (now)** — The probabilistic oracle exists (the LLM). The provenance store does not. The deductive engine partially exists through Hermes skills (symbolic gate rules as Python, not ACL2). The empirical layer is invisible — the LLM reasons about chemistry, biology, and engineering using training data alone, without systematic provenance.
 - **Stage 1 (social protocol)** — The provenance store prototype can be introduced here as a side effect of signed messages and data exchange. When instances share a validated force field parameter, the message carries a signature and a source. The receiving instance stores it with provenance. This is a natural crawl before the full infrastructure.
 - **Stage 2 (gate as software)** — The provenance store becomes operational infrastructure. The gate needs to check validity envelopes to do its job properly. This is the correct stage to introduce it because: (a) the gate is being built anyway, (b) validity checking is a gate predicate like any other, and (c) the provenance store is just a structured knowledge base — the Knowledge subsystem already has the machinery for storing and querying structured data. The symbolic index (formal facts) and the provenance store (empirical parameters) differ in what they store, not in how they store it.
 - **Stage 3 (Lisp machine)** — The symbolic engine is native in one address space. ACL2 runs at hardware level. The provenance store becomes a native Lisp hash table with persistence. The empirical layer is fully integrated: the symbolic engine queries the provenance store directly, the gate checks validity predicates in the evaluation loop itself, and the LLM still proposes model selections but every proposal is verified against the provenance store before execution.
 - **Stage 4+ (in-process inference)** — The LLM moves in-process. The three components (symbolic engine, provenance store, LLM oracle) share one address space. No IPC between them. The query cycle is: LLM proposes a model → symbolic engine checks it against the provenance store → if valid, execute → if invalid, return to LLM with diagnostic. This loop runs at native speed.
 **The empirical middle is not a separate kind of reasoning.**
 The deepest question in the set: does the middle empirical part have both neuro and symbolic aspects?
 Yes, and the split is clean.
 The equations that describe an empirical model — Hooke's law for bond stretching, the Lennard-Jones potential for van der Waals interactions, the Born equation for solvation — are formal symbolic expressions. They can be parsed, manipulated, differentiated, verified by ACL2, and composed into pipelines. This is symbolic engine territory.
 The parameters in those equations — the spring constants, the well depths, the atomic radii — are derived from experimental data through optimization and fitting. They cannot be derived from the equations themselves. This is not reasoning; it is curation. The provenance store holds them with sources and confidence intervals.
 The selection of which model to apply to a given problem requires judgment about the domain, the available data, and the intended use of the result. The LLM handles this: it knows that a protein-protein docking problem needs a different force field than a small-molecule conformational search.
 The composition of models into a pipeline (compute this, pipe into that, plot the other) is a program. The symbolic engine runs the pipeline. The LLM may propose the pipeline structure, but the execution is deterministic.
 The diagnosis of failure — "this prediction was wrong and here is why" — is the hardest part and requires the most integration. The symbolic engine detects the validity envelope violation and reports the specific parameter that caused it. The LLM interprets the failure in context: "the bond angle term for this functional group was parameterized against small molecules; your molecule has bulky substituents that change the preferred angle."
 | Aspect of the empirical middle | Handled by | Why |
 |---|---|---|
 | The equations themselves | Symbolic engine | They are symbolic expressions — verifiable, differentiable, composable |
 | The parameter values | Provenance store (data) | Fitted to data, not reasoned about |
 | Model selection | LLM oracle | Requires contextual judgment |
 | Pipeline composition | Symbolic engine (execute) + LLM (propose) | Execution is deterministic; design is creative |
 | Validity envelope checking | Symbolic engine | A logical predicate over known state |
 | Uncertainty propagation | Symbolic engine | A formula that composes component uncertainties |
 | Interpretation of results | LLM oracle | Requires natural language |
 | Failure diagnosis | Both: symbolic engine pinpoints the violation, LLM explains why | The factual cause is formal; the narrative cause is contextual |
 | Creative design (new molecules, new experiments) | LLM oracle | Requires open-ended generation |
 The empirical middle does not require a new kind of reasoning engine. It requires the two existing engines (symbolic and probabilistic) to cooperate on data (the provenance store) that is neither formal theorem nor raw text — it is curated empirical knowledge with structure, provenance, and uncertainty.
 **Does it require "world models"?**
 The word "world model" is not necessary for the architecture. What the architecture requires is:
 1. A store of mathematical models (equations + parameters) with provenance
 2. A mechanism for checking validity envelopes (predicates over conditions)
 3. A mechanism for composing models into pipelines (the existing program execution)
 4. A mechanism for propagating uncertainty (formulas + tracked parameters)
 The provenance store, the validity predicates, the pipeline executor, and the uncertainty tracker do not need to be called "world model infrastructure." They are features of the existing subsystems:
 - The provenance store extends the Knowledge subsystem (it is a structured database, not a new category).
 - The validity predicates are gate rules (they check conditions before allowing computation).
 - The pipeline executor is the existing neurosymbolic loop (LLM proposes, symbolic engine executes).
 - The uncertainty tracker is a mathematical library (error propagation formulas, statistical calculations).
 Calling them "world models" is conceptually clarifying but architecturally optional. The infrastructure is the same either way.
 **The practical implementation takeaway.**
 Stage 2 is the correct entry point. The provenance store is built as a structured data extension to the Knowledge subsystem. Validity predicates are added as gate vectors. No new subsystems are needed — just new data types in the knowledge store, new predicates in the gate, and new functions in the symbolic engine for uncertainty propagation.
 The three-pronged model describes what the system does, not what it is built from. The system is still one machine, one address space, one gate, one memex. It just has a more sophisticated understanding of what it knows and how it knows it.