From 73fc33f02fb92196b0d6cfe7a72625c4741cbe4c Mon Sep 17 00:00:00 2001
From: Hermes <hermes@hermes.local>
Date: Mon, 25 May 2026 00:26:49 +0000
Subject: [PATCH] add note: 10 practical powers of the three-pronged
 architecture

---
 ideas/practical-powers-three-pronged.org | 88 ++++++++++++++++++++++++
 1 file changed, 88 insertions(+)
 create mode 100644 ideas/practical-powers-three-pronged.org

diff --git a/ideas/practical-powers-three-pronged.org b/ideas/practical-powers-three-pronged.org
new file mode 100644
index 0000000..387dd79
--- /dev/null
+++ b/ideas/practical-powers-three-pronged.org
@@ -0,0 +1,88 @@
+:PROPERTIES:
+:CREATED:  [2026-05-24 Sun]
+:ID:       2f3a4b5c-6d7e-8f9a-0b1c-2d3e4f5a6b7c
+:END:
+#+title: Practical Powers of the Three-Pronged System
+#+filetags: :ideas:passepartout:architecture:
+
+What can Passepartout do with its three layers — deductive proofs, provenance-tracked empirical models, and probabilistic oracle — that a conventional system cannot? This note catalogs the practical powers that fall out of the architecture, not as abstract potential but as concrete capabilities.
+
+**1. It can tell you how wrong every result might be.**
+
+This is the single most important power. Computational science today produces precise-looking numbers with no error bars. A molecular dynamics simulation outputs "binding free energy: −9.2 kcal/mol" and the number looks definitive. It is not. It depends on a chain of models (force field, solvation, sampling, scoring) each with its own uncertainty.
+
+Passepartout traces the chain automatically. It reports: "binding free energy: −9.2 kcal/mol ± 1.4 kcal/mol. Breakdown: force field uncertainty ±0.8 kcal/mol, solvation model ±0.5 kcal/mol, conformational sampling ±0.3 kcal/mol, scoring function ±0.6 kcal/mol. Model validity regime: proteins in water at 298K ± 25K. Your conditions fall within this regime."
+
+No computational chemistry package does this today. Every one outputs a precise number and leaves the uncertainty to the scientist's judgment.
+
+**2. It can prevent you from using a model outside its valid range.**
+
+A force field parameterized for soluble proteins at room temperature gives plausible-looking numbers for a membrane protein at body temperature, but those numbers are not physically meaningful. The simulation runs, produces output, and a human who does not know the force field's history may trust the result.
+
+Passepartout's gate catches this at the check level: "This force field was validated for aqueous solutions of soluble proteins at 273-373K. Your simulation involves a lipid bilayer environment at 310K. Three of the lipid-specific parameters are outside their validated range. Recommendation: use a membrane-specific force field (CHARMM36m) instead. Confidence reduction: 40% if you proceed with current selection."
+
+This is a fundamentally new kind of safety. Not "is this action malicious?" but "is this computation sound?"
+
+**3. It can detect when a model is getting worse.**
+
+Empirical models degrade over time. A force field fitted to 1990s experimental data may be worse than a later version fitted to more data, but there is no automatic mechanism to detect this. A scientist who has been using the same force field for a decade may not realize it has been superseded.
+
+Passepartout tracks every model version. When it processes a new publication with updated parameters, it can compare: "The AMBER ff99 parameters you are using were superseded by ff14SB in 2014 and ff19SB in 2019. The newer parameter sets improve backbone dihedral prediction by 30% for the protein class you are simulating. Migrate to ff19SB?" It does this because every parameter has a timestamp, a source, and a validation record.
+
+**4. It can compare predictions to experiments automatically.**
+
+Every time Passepartout makes a computational prediction and receives (or the user provides) an experimental measurement for the same system, it records the comparison. Over hundreds of comparisons, it builds a systematic bias profile for each model: "This force field consistently underestimates binding affinity for charged ligands by 0.5-1.0 kcal/mol. This solvation model overestimates solubility for aromatic compounds."
+
+These bias profiles are not research papers. They are accumulated operational knowledge that makes future predictions more interpretable. No existing system does this because no existing system treats models as entities with provenance rather than as files on disk.
+
+**5. It can red-team its own reasoning.**
+
+The probabilistic oracle (LLM) proposes a conclusion. The deductive layer (ACL2) checks the formal steps. The provenance layer (empirical knowledge base) checks whether the models used are valid for the context. If all three agree, the conclusion is as reliable as the system can make it. If they disagree, the conflict itself is informative: "The formal mathematics checks out, but the models supporting it are outside their validated range. Your conclusion may be mathematically correct but physically unsupported."
+
+This is a kind of epistemic hygiene that no single-layer system can achieve. A purely probabilistic system (LLM alone) can be confidently wrong. A purely deductive system (prover alone) can only reason within its formal domain. A purely empirical system (database alone) cannot synthesize across domains. The three layers cross-check each other.
+
+**6. It can build a community knowledge graph of what works.**
+
+When multiple Passepartout instances use the same model in different conditions and compare to experimental data, the combined record extends the model's validity envelope. One instance validates a force field for ethanol. Another validates it for DMSO. A third validates it for mixed solvents. The model's validity envelope grows across the network without any single instance having to run all the experiments.
+
+The social protocol becomes the mechanism for this sharing: instances publish validation results as signed, provenance-tracked claims. The network aggregates them. A model that starts with a narrow validity envelope (water, 298K) gradually accumulates enough validation data to cover a wide range of conditions.
+
+No existing scientific software network does this. Journals publish individual validation studies; nobody aggregates them into a living validity map for each model.
+
+**7. It can generate a defensible record for regulatory submission.**
+
+If a pharmaceutical company uses Passepartout in a drug discovery pipeline, every simulation result carries a full provenance chain: force field version and source, solvation model parameters and validation benchmark, conformational sampling algorithm and integrator settings, gate checks that passed, uncertainty budget per component.
+
+This record is essentially a compliance document. It answers the question "how do I know this result is reliable?" with a traceable chain of evidence, not a scientist's assertion. For industries regulated by the FDA, EMA, or similar bodies, this is the difference between a simulation being used for guidance and a simulation being accepted as evidence.
+
+**8. It can be wrong honestly.**
+
+This sounds trivial but it is the hardest thing for software to do. Every scientific software package presents its outputs with equal authority. A result from a high-quality QM calculation and a rough empirical estimate look the same in the output file — just numbers.
+
+Passepartout would output: "This result is deductively proven (ACL2-verified, level 0-7)." or "This result is computationally rigorous within an empirical model (provenance-tracked, level 8-14, validity envelope intact)." or "This result is an extrapolation outside the model's validated range. Confidence is low. Here is what would need to be measured to increase confidence."
+
+Honesty about uncertainty is a power because it changes what you can do with the result. A deductively proven result can be used as a building block for further proofs. An empirical result within its validity envelope can be used for design decisions with known risk. An extrapolation should only be used for hypothesis generation. Passepartout would know which is which and tell you.
+
+**9. It can refuse an unsound instruction.**
+
+Today, if you ask a computational chemistry package to run a simulation, it runs the simulation. It does not check whether the settings are physically meaningful. The error is not caught until a human reviews the output — if they ever do.
+
+Passepartout's gate can say: "I will not run this simulation. The requested temperature (500K) exceeds the force field's validated range (273-373K). The solvent (hexane) has no validated parameters in this force field version. The simulation will produce numerically precise but physically meaningless results. If you want to proceed, I will flag all output as extrapolation with a confidence score of 0.3 out of 1.0."
+
+The power is not that Passepartout prevents the simulation. It is that Passepartout makes the choice explicit: the human can override, but the override is recorded, and the result is tagged with its true reliability rather than appearing to be definitive.
+
+**10. It can connect mathematics to reality without faking it.**
+
+This is the deepest power. A conventional system either stays in the pure formal domain (proof assistants, CAS) or stays in the empirical domain (simulation software, ML). Passepartout bridges them by making the boundary explicit.
+
+A mathematician can prove a theorem (layers 0-3). An engineer can build a bridge using empirical models (layers 8-12). Passepartout can connect the two: "The finite element equations for this bridge are verified against classical mechanics (layer 4). The material parameters come from ASTM standard tests on this specific steel alloy (layer 8-9, validity envelope: −20°C to 60°C, validated by 200+ measurements). The load calculations carry ±3% uncertainty based on material parameter variance." The bridge is not proven safe — no software can prove a physical structure is safe — but the chain from mathematical foundation to empirical measurement is fully transparent.
+
+**Summary: three kinds of power.**
+
+| Layer | What it verifies | What it enables |
+|---|---|---|
+| Deductive proofs | Correctness against axioms | Autonomous generation of verified algorithms |
+| Provenance-tracked models | Implementation fidelity + data source | Scientific integrity, uncertainty budgets, audit trails |
+| Probabilistic oracle | N/A (generates hypotheses) | Synthesis, model selection, natural language interface |
+
+Alone, each layer is a tool. Together, they form a system that can reason formally, model empirically, communicate naturally — and tell you which mode is in effect for every result it produces.