hermes-brain/projects/passepartout/architecture/design/the-two-brains.org at 2ee60b5b4a9f96a18f9958695f2350ffd00c2d5e

amr/hermes-brain

Fork 0

Files

Hermes 2ee60b5b4a Split design-decisions.org into design/ subfolder (10 files, one per section)

2026-06-04 20:16:02 +00:00

12 KiB

Raw Blame History

The Two Brains

— title: The Two Brains type: reference tags: :passepartout:architecture: —

The Two Brains

The Probabilistic-Deterministic Split

The architecture divides cognition into two fundamentally different reasoning systems. This is not arbitrary engineering but a structural response to a fundamental truth: probabilistic systems will hallucinate, and you cannot build reliable autonomy on an unreliable foundation.

The Hallucination Problem

An LLM is a statistical engine trained on token sequences. It generates the most probable continuation of a prompt. Given sufficient context, that continuation is correct. Given novel context, it is often wrong in confident-sounding ways.

This is not a training deficiency. Hallucination is a fundamental property of probabilistic inference. You can reduce it with better models, longer contexts, and clever prompting, but you cannot eliminate it by making the LLM better. You eliminate it by not asking the LLM to do things that require certainty.

This is the architectural bet at the heart of Passepartout's neurosymbolic design. The LLM should not be the reasoning engine. It should be the creative engine — proposing possibilities, surfacing connections, translating between natural language and formal representation. The reasoning engine should be symbolic: deterministic, verification-grounded, provenance-tracked, and incapable of hallucination by construction.

The Division of Labor

An LLM is a statistical engine. It generates outputs based on patterns in training data. It is remarkable at translation, generation, pattern matching, and fuzzy reasoning. It can take messy human intent and produce structured queries. It can take structured results and produce natural language. It is, in the terminology of the system, the creative brain.

But it cannot be trusted. Not because it is poorly designed or insufficiently trained, but because hallucination is a fundamental property of probabilistic inference. The model generates the most likely continuation, not the correct one. Given sufficient context, the most likely continuation is correct. Given novel context, it is often wrong in confident-sounding ways.

The deterministic engine addresses this by being what the probabilistic engine is not: mathematically rigorous, formally verifiable, and incapable of hallucination by design. It operates on explicit symbolic representations — lists, property lists, knowledge graphs — not on floating-point activations. When it evaluates a path confinement check, it returns true or false, not a probability distribution.

The division of labor is architectural. The LLM handles the fuzzy interface between human language and structured representation. It translates what the user wants into what the system can reason about. The deterministic engine receives those structured representations and evaluates them against formal invariants. It decides whether to execute, not whether the translation was semantically plausible.

This separation is the source of Passepartout's safety guarantee. Other agents add "guardrails" as an afterthought — a layer of filtering around a dangerous core. Passepartout makes the division explicit: the LLM never touches the file system, never executes a command, never modifies memory. It generates proposals. The deterministic engine evaluates and executes. The dangerous operations are never in the probabilistic path.

The split also explains why the system gets safer over time without the LLM improving. The deterministic engine accumulates rules. The LLM proposes actions, the engine evaluates them against a growing rule set. Early versions block obvious dangers. Later versions block sophisticated attacks that were previously unknown. The safety grows logarithmically with the number of interactions, not linearly with model capability.

The 10-80-10 Architecture

The target for a coding agent: 10% neural for input translation (natural language → structured queries), 80% symbolic for reasoning (Screamer plans, ACL2 verifies, VivaceGraph retrieves facts), 10% neural for output formatting (structured results → natural language). The 80% that happens in the symbolic middle layer costs zero LLM tokens.

For the broader memex — literature, poetry, personal reflection, daily logs — the ratios are different and less important than the metaphor itself. The neuro is the brain — generative, associative, creative, comfortable with ambiguity. It produces insights that are provisional, connections that are speculative, hypotheses that may be wrong. The symbolic engine is the education — accumulated, verified, provenance-tracked knowledge that the brain draws on and is disciplined by. It doesn't think creatively. It remembers, checks, and constrains. It prevents the brain from being confidently wrong.

This framing resolves a tension in the original architecture. The 10-80-10 implies the symbolic engine replaces the neuro for reasoning. But a symbolic engine is terrible at creativity, ambiguity, and associative leaps across unrelated domains — exactly what you need for a memex that contains Pale Fire, a shopping list, and a project plan. The brain proposes that your sudden interest in unreliable narrators coincides with a week where your project retrospective used the word "deception." The education verifies: "those two diary entries are 4 days apart; the word 'deception' appears in both; here are the headings." The brain makes the leap. The education makes it trustworthy.

This means the symbolic engine never needs to be "complete." Education isn't complete knowledge — it's structured knowledge. You don't need a fact for every sentence in your diary. You need facts for what can be mechanically verified: dates, citations, entities, contradictions, temporal order. The brain handles the rest.

Core Knowledge: The Four Pillars of Agentic Reliability

Every reliable AI agent must possess four types of Core Knowledge — not as prompt instructions, but as encoded symbolic rules that the neural engine cannot override. These are the "laws of physics" for the agent's computational universe. Passepartout encodes each pillar as deterministic Lisp functions in the Dispatcher gate stack.

Digital Object Permanence & State. The agent must know what exists independently of its attention. Passepartout achieves this through the Merkle-tree memory: every memory-object carries a SHA-256 content hash. If the agent deletes a file, the hash proves it's gone. If an external process modifies it, the hash mismatch triggers a warning. The copy-on-write snapshot mechanism preserves the state at every decision point, enabling rollback if an action chain fails.
Causality and Temporal Logic. Actions must execute in order. Step B cannot run if Step A failed. Passepartout enforces this through the pipeline's depth counter (signals cannot recurse past depth 10, preventing infinite loops) and the sequential Perceive → Reason → Act ordering. The batch tool calls feature allows parallel execution of independent actions while enforcing sequential execution of dependent ones — actions that share a dependency are ordered; actions that don't are parallelized.
Agentic Boundaries (The "Self"). The agent must know where its authority ends and the host system begins. Passepartout encodes this through the Dispatcher gate stack: path protection blocks access to sensitive directories (~/.ssh, etc, ~.aws). Shell safety blocks destructive commands (rm -rf /, dd, injection vectors). Network exfiltration detection blocks unauthorized outbound connections. The permission table allows per-tool, per-path granularity. These are not prompt instructions — they are Lisp functions that execute unconditionally for every action. The self-build safety boundary extends this to the agent's own core pipeline files: the agent can modify skills and system modules freely, but cannot modify its own brain stem without human review.
Epistemic Certainty (Knowing How It Knows). The agent must distinguish between a verified fact, a retrieved memory, and an LLM prediction. Passepartout encodes this through the gate trace: every action carries a record of which gates passed, which blocked, and why. The provenance system (LOGBOOK entries on memory-objects) records who modified what and when. The Dispatcher's existence-check gate verifies that a file exists before allowing a read. The process-status gate verifies that a command completed before allowing its output to be used. The agent cannot "hallucinate" a file path or a process result because the Dispatcher checks each against the live state before execution.

These four pillars are not features. They are the definition of a reliable agent. Every agent architecture either provides them or compensates for their absence in ways that make the agent less trustworthy, more expensive, or both.

The Dispatcher as Learning System

The Dispatcher begins as a static guard — a set of rules that block obviously dangerous actions. But defining "obviously" is the hard problem. The agent encounters situations the rules do not anticipate. The Dispatcher must grow.

The human-in-the-loop exception is the seed. When the LLM proposes an action the Dispatcher does not recognize, the system does not default to blocking or allowing. It suspends. It writes the proposed action to an Org buffer in a format the human can read and understand. The human reviews and approves or denies. The Dispatcher observes the decision.

From this single observation, the Dispatcher extracts a rule. Not merely "allow this specific action" but "allow this class of actions parameterized by these dimensions." The human approved a write to ~/projects/myapp/src/core.clj. The Dispatcher generalizes: writes to ~/projects/*/src/*.lisp are approved for this session, or for this project, or indefinitely depending on the context and the user's pattern of decisions.

Shadow mode is where rules are tested before deployment. When the Dispatcher encounters a novel situation and is uncertain, it can run the proposed action in a simulated environment. It observes the side effects — what files would be modified, what processes would be spawned, what network calls would be made. If the simulation produces dangerous side effects, the rule is discarded. If it appears safe, the rule is added to the active set with a confidence rating.

Formal verification is where the learned rules are checked against invariants. The Dispatcher's rules are not merely patterns observed from human behavior. They are formulas in a logic that the system can reason about. A rule that would enable path traversal is not discarded because it was observed to be safe in prior instances — it is discarded because it violates the path-confinement invariant by construction.

The Dispatcher becomes, over time, not a guard that blocks bad actions but a reasoning system that understands why actions are good or bad. Early versions learn from human decisions. Later versions learn from their own logical analysis. The human's role transitions from approver to auditor to, eventually, unnecessary oversight.

This is the bootstrap. The system begins dependent on human judgment because it has no basis for judgment of its own. Through accumulated decisions, it constructs a model of what is permitted and why. That model is the foundation for the deterministic symbolic engine that in v1.0.0 takes over the reasoning that the Dispatcher learned to perform.

12 KiB Raw Blame History