hermes-brain/ideas/neurological-software-empirical-middle.org

:PROPERTIES:
:CREATED:  [2026-05-25 Mon]
:ID:       4b5c6d7e-8f9a-0b1c-2d3e-4f5a6b7c8d9e
:END:
#+title: Neurological Software in the Empirical Middle
#+filetags: :ideas:passepartout:architecture:world-models:

The empirical middle of the knowledge tree (layers 8-14) is increasingly dominated by neural networks trained on data — not symbolic equations with fitted parameters. ANI, MACE, SchNet for molecular energies and forces. AlphaFold for protein structure prediction. Neural docking scores, learned solvation models, QSAR neural nets, RL-based molecular design agents. These are not traditional empirical models with interpretable parameters. They are learned function approximators with millions of inscrutable weights.

The three-pronged architecture must accommodate them. This note analyzes how.

**What changes when the model is a neural network.**

A traditional empirical model (force field, solvation equation, docking scoring function) has:

- A **symbolic expression** for the relationship between inputs and outputs (E = k_b(r - r_0)² + ...)
- **Interpretable parameters** that correspond to physical quantities (spring constant = 600 kcal/mol/Å²)
- **Known failure modes** from the equation's form (harmonic approximation fails at extreme bond lengths)

A neural network model has:

- A **learned function** with no simple symbolic expression
- **Inscrutable parameters** (weights) that do not correspond to physical quantities
- **Unknown failure modes** — neural networks interpolate well in-distribution and fail unpredictably out-of-distribution

From the architecture's perspective, the critical difference is not that neural networks are harder to verify (they are, but that is a secondary concern). The critical difference is that the provenance information shifts: instead of tracking where a parameter value came from and what it means, you track what the network was trained on, what it was validated against, and whether the current input resembles its training distribution.

**The provenance store handles the shift by tracking three things instead of one.**

A traditional empirical model's provenance entry:

```
Model: AMBER ff14SB
Equation: Harmonic bond + harmonic angle + Fourier torsion + LJ + Coulomb
Parameters:
  - k_b(C-C): 600 kcal/mol/Å², source: Cornell et al. (1995), validated: 50+ small molecules
  - r_0(C-C): 1.525 Å, source: Cornell et al. (1995), validated: 50+ small molecules
  - ...
Validity envelope:
  - Temperature: 273-373K
  - Solvents: water, methanol, ethanol
  - Molecule classes: proteins, nucleic acids
```

A neural network model's provenance entry:

```
Model: ANI-2x
Architecture: Ensemble of 8 evidential ANI networks
Parameters: ~8 million weights — not interpretable individually
Training data:
  - Level of theory: ωB97M-D3(BJ)/def2-TZVPPD (DFT)
  - Molecules: ~8 million conformations from 63,000 organic molecules
  - Elements: H, C, N, O, S, F, Cl, Br
  - Conformational coverage: ANI-2x conformational space (RDKit + stochastic sampling)
Validation benchmarks:
  - COMP6 benchmark (drug-like molecules): MAE 1.2 kcal/mol
  - Dihedral profiles: MAE 0.8 kcal/mol
  - Isomerization energies: MAE 0.9 kcal/mol
Validity envelope (domain check):
  - Elements: H, C, N, O, S, F, Cl, Br only
  - Atomic charge range: not validated for charged species outside training distribution
  - Conformational novelty flag: activated if RMSD to nearest training point > threshold
```

The structure is the same: model → training/validation data → domain of applicability. The content differs: traditional models have interpretable parameters with experimental sources; neural networks have training dataset provenance and aggregated validation benchmarks.

**The gate checks the same things regardless of model type.**

The gate predicates for model validity are:

1. **Does the model support the elements/atoms/molecule types in the current input?** — This is the same check for a force field (does the force field have parameters for this atom type?) and a neural network (was this element in the training data?).

2. **Are the conditions within the model's validated range?** — Temperature, pressure, solvent, etc. Same predicate, same structure. The neural network's validated range may be narrower or less well-defined, but the check is the same.

3. **Is the input within the model's training/validation distribution?** — For traditional models, this is a direct validity envelope check. For neural networks, this is a **distribution match** — a statistical check that the current molecular conformation resembles the training set. If the input is far from the training distribution in latent space, the gate flags it regardless of whether the model predicts confidently.

The distribution match check is the new machinery that neural network models require. It is a standard technique in reliable ML (distance to training data, density estimation in latent space, conformal prediction). It integrates into the gate as a predicate: "input is within training distribution: PASS" or "input is outside training distribution: FLAG with confidence reduction."

**The symbolic engine does not need to understand the network.**

This is the key simplification. The symbolic engine — ACL2, the gate predicates, the formal reasoning — does not need to parse the neural network's weights or architecture. It needs to:

- Query the provenance store for the model's training data description
- Compute a distribution match score for the current input against the training data
- Compare the result to a threshold from the validity envelope
- Output: pass, flag, or block

None of these operations require understanding what the network does. They are metadata operations on the provenance store and geometric operations on the input space. The network itself is a black box — the symbolic engine treats it as a function with a known domain of applicability, the same way it treats a force field as a function with a known validity envelope.

**The oracle handles model selection.**

Which model to use for a given problem — traditional force field or learned neural network? The LLM oracle handles this, informed by the provenance store. The store tells the LLM what models are available, what they are validated for, and how they perform on relevant benchmarks. The LLM recommends. The gate checks the recommendation against the validity envelope before execution.

This is where the architecture connects to the real world of model selection that computational scientists face daily. There is no single best force field or neural network architecture for all problems. The choice depends on the molecule class, the property of interest, the required accuracy, and the computational budget. The LLM, with its broad knowledge of the literature, is well-suited to making this recommendation — not by reasoning about the models from first principles, but by knowing which models are preferred for which use cases from training data.

**The full picture: three kinds of empirical model.**

The provenance store now handles three data types:

| Model type | Example | Parameters | Validation method | Gate check |
|---|---|---|---|---|
| Symbolic equation + fitted parameters | AMBER force field | Interpretable (spring constants, partial charges) | Per-parameter: source experiment, confidence interval | Validity envelope: temperature, solvent, molecule class |
| Trained neural network | ANI-2x | Inscrutable (8M weights) | Per-dataset: benchmark MAE, held-out test set | Distribution match: is input like training data? |
| Hybrid (learned correction to symbolic model) | Δ-ML corrections to DFT | Partially interpretable corrections + network weights | Per-benchmark + per-component | Both envelope + distribution match |

All three are handled by the same provenance store, the same gate predicates, and the same LLM oracle. The only new infrastructure required is the **distribution match check** for neural network models — a piece of statistical machinery that computes how similar the current input is to the model's training distribution.

**Where this fits in the stage plan.**

- **Stage 0-1**: The provenance store does not exist. Neural network models are loaded as black boxes with no systematic validity checking. This is current practice in computational science — the user is responsible for knowing whether a model applies to their problem.

- **Stage 2**: The provenance store begins operation. Initially it handles traditional symbolic-fitted models because they have clear provenance chains and validity envelopes. Neural network models require the distribution match infrastructure, which is a separate development track.

- **Stage 3**: The distribution match infrastructure is operational. The gate can check whether an input is within a neural network's training distribution. The provenance store holds training dataset descriptions, validation benchmarks, and distribution summary statistics for each supported neural network model.

- **Stage 4+**: Neural network models are loaded into the same address space as the symbolic engine and the provenance store. The distribution match check runs at the level of the evaluation loop itself. The gate's validity check becomes a fast native predicate — no querying a separate data store, just reading a hash table and computing a distance in the same process.

**The summary.**

Neural network models trained on empirical data are not a problem for the three-pronged architecture. They fit into the existing framework:

- **The provenance store** tracks training data sources, validation benchmarks, and distribution statistics — instead of parameter sources and confidence intervals.
- **The gate** checks domain match and training distribution coverage — instead of validity envelopes and parameter regimes.
- **The symbolic engine** does not need to understand the network — it treats it as a black box with a known domain, the same way it treats a force field.
- **The LLM oracle** handles model selection — recommending which neural network or traditional model fits the user's problem, informed by the provenance store's benchmark records.

The new infrastructure required is not large — a distribution match function and a training dataset descriptor in the provenance store. Everything else is existing mechanism applied to a new data type.