5.9 KiB
— title: The Probabilistic-Deterministic Split type: reference tags: :passepartout:architecture: —
The Probabilistic-Deterministic Split
The architecture divides cognition into two fundamentally different reasoning systems. This is not arbitrary engineering but a structural response to a fundamental truth: probabilistic systems will hallucinate, and you cannot build reliable autonomy on an unreliable foundation.
The Hallucination Problem
An LLM is a statistical engine trained on token sequences. It generates the most probable continuation of a prompt. Given sufficient context, that continuation is correct. Given novel context, it is often wrong in confident-sounding ways.
This is not a training deficiency. Hallucination is a fundamental property of probabilistic inference. You can reduce it with better models, longer contexts, and clever prompting, but you cannot eliminate it by making the LLM better. You eliminate it by not asking the LLM to do things that require certainty.
This is the architectural bet at the heart of Passepartout's neurosymbolic design. The LLM should not be the reasoning engine. It should be the creative engine — proposing possibilities, surfacing connections, translating between natural language and formal representation. The reasoning engine should be symbolic: deterministic, verification-grounded, provenance-tracked, and incapable of hallucination by construction.
The Division of Labor
An LLM is a statistical engine. It generates outputs based on patterns in training data. It is remarkable at translation, generation, pattern matching, and fuzzy reasoning. It can take messy human intent and produce structured queries. It can take structured results and produce natural language. It is, in the terminology of the system, the creative brain.
But it cannot be trusted. Not because it is poorly designed or insufficiently trained, but because hallucination is a fundamental property of probabilistic inference. The model generates the most likely continuation, not the correct one. Given sufficient context, the most likely continuation is correct. Given novel context, it is often wrong in confident-sounding ways.
The deterministic engine addresses this by being what the probabilistic engine is not: mathematically rigorous, formally verifiable, and incapable of hallucination by design. It operates on explicit symbolic representations — lists, property lists, knowledge graphs — not on floating-point activations. When it evaluates a path confinement check, it returns true or false, not a probability distribution.
The division of labor is architectural. The LLM handles the fuzzy interface between human language and structured representation. It translates what the user wants into what the system can reason about. The deterministic engine receives those structured representations and evaluates them against formal invariants. It decides whether to execute, not whether the translation was semantically plausible.
This separation is the source of Passepartout's safety guarantee. Other agents add "guardrails" as an afterthought — a layer of filtering around a dangerous core. Passepartout makes the division explicit: the LLM never touches the file system, never executes a command, never modifies memory. It generates proposals. The deterministic engine evaluates and executes. The dangerous operations are never in the probabilistic path.
The split also explains why the system gets safer over time without the LLM improving. The deterministic engine accumulates rules. The LLM proposes actions, the engine evaluates them against a growing rule set. Early versions block obvious dangers. Later versions block sophisticated attacks that were previously unknown. The safety grows logarithmically with the number of interactions, not linearly with model capability.
The 10-80-10 Architecture
The target for a coding agent: 10% neural for input translation (natural language → structured queries), 80% symbolic for reasoning (Screamer plans, ACL2 verifies, VivaceGraph retrieves facts), 10% neural for output formatting (structured results → natural language). The 80% that happens in the symbolic middle layer costs zero LLM tokens.
For the broader memex — literature, poetry, personal reflection, daily logs — the ratios are different and less important than the metaphor itself. The neuro is the brain — generative, associative, creative, comfortable with ambiguity. It produces insights that are provisional, connections that are speculative, hypotheses that may be wrong. The symbolic engine is the education — accumulated, verified, provenance-tracked knowledge that the brain draws on and is disciplined by. It doesn't think creatively. It remembers, checks, and constrains. It prevents the brain from being confidently wrong.
This framing resolves a tension in the original architecture. The 10-80-10 implies the symbolic engine replaces the neuro for reasoning. But a symbolic engine is terrible at creativity, ambiguity, and associative leaps across unrelated domains — exactly what you need for a memex that contains Pale Fire, a shopping list, and a project plan. The brain proposes that your sudden interest in unreliable narrators coincides with a week where your project retrospective used the word "deception." The education verifies: "those two diary entries are 4 days apart; the word 'deception' appears in both; here are the headings." The brain makes the leap. The education makes it trustworthy.
This means the symbolic engine never needs to be "complete." Education isn't complete knowledge — it's structured knowledge. You don't need a fact for every sentence in your diary. You need facts for what can be mechanically verified: dates, citations, entities, contradictions, temporal order. The brain handles the rest.