memex: update passepartout submodule → v0.7.2, add notes

passepartout v0.7.2 (Gate Trace + HITL + Search + 11 more features): - Gate trace visualization with Ctrl+G toggle - HITL inline panels with styled collapse on approve/deny - Agent identity file + /identity command - Safe-tool read-only allowlist - Message search mode with Up/Down nav and highlights - Context budget visibility with section breakdown - Session rewind /sessions /resume /rewind - Undo/redo per operation - Context debugging /context why /context dropped - Tool hardening (timeouts, write verify, read-only cache) - Tag stack severity tiers + trigger counts - Merkle provenance audit + audit-verify - Self-help /help <topic> reads USER_MANUAL.org - Live CONFIG section in system prompts - Pads: Page Up/Down scroll by 10 lines Core 92/92 TUI Main 104/104 TUI View 29/29 Neuro 13/13
2026-05-08 21:56:11 -04:00
parent 8c64b18335
commit 4e9431ec1d
254 changed files with 55970 additions and 3 deletions
--- a/notes/passepartout-neurosymbolic-design-decisions-and-options.org
+++ b/notes/passepartout-neurosymbolic-design-decisions-and-options.org
@@ -0,0 +1,719 @@
+#+TITLE: Passepartout Neurosymbolic Engine — Design Decisions and Architecture Options
+#+AUTHOR: Agent
+#+FILETAGS: :notes:design-decisions:neurosymbolic:architecture:v3.0.0:
+#+CREATED: [2026-05-08 Fri]
+
+* The Hallucination Problem — Why Neurosymbolic
+
+An LLM is a statistical engine trained on token sequences. It generates the most
+probable continuation of a prompt. Given sufficient context, that continuation is
+correct. Given novel context, it is often wrong in confident-sounding ways.
+
+This is not a training deficiency. Hallucination is a fundamental property of
+probabilistic inference. You can reduce it with better models, longer contexts,
+and clever prompting, but you cannot eliminate it by making the LLM better. You
+eliminate it by not asking the LLM to do things that require certainty.
+
+This is the architectural bet at the heart of Passepartout's neurosymbolic design.
+The LLM should not be the reasoning engine. It should be the *creative* engine —
+proposing possibilities, surfacing connections, translating between natural
+language and formal representation. The *reasoning* engine should be symbolic:
+deterministic, verification-grounded, provenance-tracked, and incapable of
+hallucination by construction.
+
+This is not a rejection of neural methods. It is a division of labor. The neuro
+is the brain — generative, associative, creative, comfortable with ambiguity. It
+produces hypotheses. The symbolic engine is the education — accumulated, verified,
+provenance-tracked knowledge that the brain draws on and is disciplined by. It
+doesn't think. It remembers, checks, and constrains.
+
+The brain is always smarter than the education, but the education prevents the
+brain from being confidently wrong.
+
+** See also:
+
+- =passepartout/docs/DESIGN_DECISIONS.org=: "The Probabilistic-Deterministic Split"
+  for the gate-level version of this argument.
+- =notes/passepartout-whitehead.org=: Whitehead's ramified theory of types as
+  the structural guarantee against self-referential contradictions.
+- =notes/passepartout-symbolic-engine-exploration.org=: the full design space and
+  the lossiness problem at the neural-symbolic boundary.
+
+* The Five Architecture Options
+
+The symbolic engine must relate to the human memex. The relationship is not
+obvious because knowledge lives in two incompatible forms: natural language
+prose (what the human reads and writes) and formal facts (what the symbolic
+engine reasons about). The translation between them is lossy by nature. The
+architecture is defined by how it handles that lossiness.
+
+=notes/passepartout-symbolic-engine-exploration.org= explores five options. They are
+summarized here to make subsequent decisions legible.
+
+** Option 1: The Auto-Formalizer
+
+A separate knowledge graph stores symbolic facts. The LLM populates it by
+extracting triples from unstructured data — documentation, manuals, logs,
+session histories. The KG becomes co-authoritative with the human prose.
+
+This is the simplest to implement but inherits the dual-representation problem
+in its most acute form. The KG and the prose can disagree, and the architecture
+provides no mechanism for resolving disagreements. It also stores knowledge
+twice — once in the user's Org files, once in the KG — with no guarantee that
+they stay synchronized.
+
+** Option 2: Two Intentionally Separate Memexes
+
+The human memex contains prose: thoughts, diaries, decisions, documentation.
+The symbolic memex contains formal facts: constraints, rules, relationships,
+deductions. The archivist bridges between them but does not try to keep them
+synchronized. They are allowed to diverge because they serve different purposes.
+The prose captures what the human intended. The symbolic memex captures what
+the symbolic engine has proven.
+
+This is philosophically honest — it admits that no lossless translation between
+natural language and formal logic is possible. But it forces the user to reason
+about two separate knowledge stores and understand when to trust each.
+
+** Option 3: Tangled Fact Blocks in Org Files
+
+The tangle mechanism already handles the dual-representation problem for code.
+Lisp code lives in literate blocks within Org files (=#+begin_src lisp=). The
+tangle mechanism extracts these blocks and generates =.lisp= files. A new block
+type — =#+begin_src knowledge= — would contain symbolic facts in a formal
+language. The tangle mechanism would load these facts into the symbolic engine's
+in-memory store, just as it loads Lisp code into the SBCL image.
+
+This is aesthetically appealing because it unifies the format. One toolchain,
+one version control system, one Merkle tree. But the block language itself IS
+the knowledge representation language, and that language is the ontology we
+have not yet defined. The format is unified but the content is unspecified.
+
+** Option 4: One Memex, Two Indices
+
+The prose remains in human language in Org files. The prose is always the ground
+truth. Two indices sit on top of the prose as derived views:
+
+- The *neural index* uses vector embeddings to enable semantic search. The LLM
+  navigates the prose through embedding space, retrieving relevant headings.
+- The *symbolic index* stores formal assertions about what the prose says —
+  predicates, relations, constraints — each grounded to a specific heading or
+  block in the Org file.
+
+Each index serves its own side of the machine. They do not need to understand
+each other's representations. They only need to agree on which heading or block
+they are referring to. Because the prose is always the ground truth, the symbolic
+index can be thrown away and rebuilt from scratch if it becomes corrupted or
+stale. No information is lost — only the extracted assertions.
+
+** Option 5: Ephemeral Symbolic Facts
+
+No persistence, no serialization format, no knowledge graph stored on disk.
+VivaceGraph exists in memory during the session. Screamer derives facts from the
+prose as needed. When the session ends, the facts are discarded and re-derived
+from the prose on the next start.
+
+This punts the ontological design problem entirely. You never have to decide on
+a serialization format because you never serialize. The cost is compute
+(re-derivation on every restart) and the inability to accumulate facts across
+sessions. But it is the correct first step — a way to learn what kinds of facts
+are actually useful before committing to a storage format.
+
+* The Chosen Path: Option 4, Starting with Option 5
+
+The one-memex-two-indices architecture (Option 4) is the correct long-term
+architecture. The prose is the ground truth. The symbolic index is a derived
+view that can be rebuilt. The neural index handles what the symbolic index
+cannot — semantic search, fuzzy matching, associative leaps.
+
+But committing to a persistence format before knowing what facts are useful
+is premature. The practical path starts with Option 5 (ephemeral facts) as the
+Phase 1-4 implementation, then graduates to Option 4 with VivaceGraph
+persistence in Phase 5 when the fact language has been battle-tested (=see
+=passepartout-neurosymbolic-roadmap.org=).
+
+** Why the dual index is permanent, not transitional
+
+In the coding domain, there is an aspiration that the symbolic index could
+eventually capture enough of the prose's propositional content to become a
+complete representation — the "flip" described in the architecture note. But
+for the broader memex (literature, poetry, personal reflection, daily logs),
+completeness is neither possible nor desirable. You cannot formalize what makes
+a poem beautiful. You cannot extract a triple that captures the emotional weight
+of a diary entry. The neural index will always be the gateway to the full
+richness of the prose. The symbolic index handles what can be mechanically
+verified: citations, entities, temporal order, contradictions, provenance.
+The division of labor between the two indices is permanent because the domains
+they serve are fundamentally different kinds of knowledge.
+
+* The Neuro as Brain, the Symbolic as Education
+
+The original 10-80-10 architecture (10% neural, 80% symbolic, 10% neural)
+describes the target ratios for a *coding* agent — a domain where most reasoning
+is formalizable. For the broader memex, the ratios are different and less
+important than the metaphor itself.
+
+The neuro is the *brain* — generative, associative, creative, comfortable with
+ambiguity. It produces insights that are provisional, connections that are
+speculative, hypotheses that may be wrong. It is the driver.
+
+The symbolic engine is the *education* — accumulated, verified,
+provenance-tracked knowledge that the brain draws on and is disciplined by. It
+doesn't think creatively. It remembers, checks, and constrains. It prevents the
+brain from being confidently wrong.
+
+This framing resolves a tension in the original architecture. The 10-80-10
+implies the symbolic engine /replaces/ the neuro for reasoning. But a symbolic
+engine is terrible at creativity, ambiguity, and associative leaps across
+unrelated domains — exactly what you need for a memex that contains /Pale Fire/,
+a shopping list, and a project plan. The brain proposes that your sudden interest
+in unreliable narrators coincides with a week where your project retrospective
+used the word "deception." The education verifies: "those two diary entries are
+4 days apart; the word 'deception' appears in both; here are the headings." The
+brain makes the leap. The education makes it trustworthy.
+
+This means the symbolic engine never needs to be "complete." Education isn't
+complete knowledge — it's structured knowledge. You don't need a fact for every
+sentence in your diary. You need facts for what can be mechanically verified:
+dates, citations, entities, contradictions, temporal order. The brain handles
+the rest.
+
+* The Gate-to-Fact Bootstrap — Extracting the First Ontology from Existing Code
+
+The Dispatcher gate stack already encodes an implicit ontology. Every gate
+vector asserts the existence of a category of things:
+
+- Gate vector 2 asserts there exists a class of files called /secrets/.
+- Gate vector 7 asserts there exists a class of commands called /destructive/.
+- Gate vector 8 asserts there exists a class of domains called /trusted/.
+- The self-build boundary asserts there exists a class of files called
+  /core-harness/ and a class called /skills/.
+
+These claims are currently expressed as code — Lisp functions that pattern-match
+against file paths, shell commands, and URLs. They are not facts the symbolic
+engine can query, derive from, or check for consistency. But they can be made
+explicit.
+
+The bootstrap makes every gate a set of initial symbolic facts:
+=(:file ".env" :member-of-class :secret-files :source gate-vector-2)=,
+=(:command "rm -rf /" :classified-as :catastrophic :source gate-vector-7)=,
+=(:domain "api.telegram.org" :classified-as :trusted :source gate-vector-8)=.
+
+This produces 50-70 entity classes directly from the existing gate stack,
+without any new infrastructure:
+
+| Source                                 | Count | Example categories                                 |
+|----------------------------------------+-------+----------------------------------------------------|
+| ~*dispatcher-protected-paths*~         | 11    | :secret-config-file, :ssh-key-file, :gpg-key-file  |
+| ~*dispatcher-shell-blocked*~           | 8     | :catastrophic-command, :injection-pattern           |
+| ~*dispatcher-network-whitelist*~       | 2     | :trusted-domain, :untrusted-domain                  |
+| Self-build boundary                    | 2     | :core-harness-file, :skill-file                    |
+| Privacy tags                           | 3     | :private-content, :financial-content                |
+| Permission table                       | 3     | :read-only-tool, :write-tool, :eval-tool            |
+| Cognitive tools                        | 6     | :code-search-tool, :file-io-tool, :shell-tool       |
+| Relations (all gates)                  | ~15   | :member-of-class, :classified-as, :depends-on        |
+| Qualities                              | ~8    | :catastrophic, :dangerous, :moderate, :harmless     |
+| Provenance sources                     | 4     | :gate-outcome, :human-authored, :deduced, :llm-proposed |
+|----------------------------------------+-------+----------------------------------------------------|
+
+This is the seed. It gives Screamer a domain to reason about immediately, without
+any LLM involvement. It proves the pattern — code becomes facts, facts enable
+reasoning — at the cost of approximately 30 lines of Lisp.
+
+* The LLM as Proposer — Verified Extraction
+
+The LLM cannot be trusted to populate the symbolic index directly. Its outputs are
+sampled, not proven. A probabilistic extraction feeding a deterministic engine
+defeats the purpose of being deterministic.
+
+But the LLM is still useful. It can surface facts that are obvious to a human
+reader of prose but would take the symbolic engine many deduction steps to reach
+independently. The solution is to demote the LLM from /extractor/ to /proposer/:
+
+1. The archivist reads a prose heading.
+2. The LLM proposes candidate triples.
+3. Screamer checks each triple for consistency against the existing fact store.
+4. Only consistent triples are admitted to the symbolic index, flagged with
+   =:provenance :llm-proposed= and grounded to the source heading.
+
+The LLM might hallucinate facts that don't correspond to the prose. It might
+extract facts that contradict existing knowledge. It might produce syntactically
+malformed triples. None of these failures contaminate the symbolic index because
+proposals are not admitted automatically. The admission gate (Screamer) is
+deterministic.
+
+This is the core architecture pattern. Everything else — the entity classes, the
+deduction engine, the persistence layer — follows from this single design decision:
+*the LLM proposes; the symbolic engine decides whether to accept.*
+
+* Three Contradiction Policies — Domain-Dependent Consistency
+
+Classical logic requires consistency. A contradiction implies everything
+(=ex contradictione quodlibet=). Screamer, as a constraint solver, also requires
+consistency — a contradictory constraint set has no solutions. But the symbolic
+engine operates across domains where the meaning of contradiction is fundamentally
+different.
+
+A single architecture serves all domains by applying different contradiction
+policies, scoped to the entity class:
+
+** Policy :exclusive — Contradiction Rejected at Admission
+
+For domains where the world is physically singular — a file either exists or it
+doesn't, a command either was blocked or it wasn't, a gate rule either applies or
+it doesn't. When a new fact contradicts an existing one in an :exclusive domain,
+the new fact is rejected. The existing fact is authoritative unless a human
+explicitly retracts it.
+
+Use for: security classifications, file system state, gate rules, code
+correctness, deterministic safety constraints.
+
+** Policy :coexistent — Contradiction Flagged, Both Retained
+
+For domains where multiple truths coexist — literary interpretations, historical
+accounts, personal beliefs held at different times, multi-source factual
+disagreement (Wikidata vs. DBpedia vs. your memex). When a new fact contradicts
+an existing one in a :coexistent domain, the contradiction is recorded with a
+cross-reference flag. Both facts are stored. Queries return all facts with
+provenance display.
+
+Use for: literature, history, personal knowledge evolution, scientific consensus
+shift, multi-author knowledge bases.
+
+** Policy :temporal — Contradiction Accepted as Version Change
+
+For domains where truth changes over time. When a new fact contradicts an old one
+in a :temporal domain, the old fact is marked =:superseded= but retained. The
+timeline is queryable: "You believed X on Tuesday, Y on Friday, Z on Sunday."
+
+Use for: personal belief evolution, project plan revisions, scientific
+consensus shift over time, any knowledge where the change itself is information.
+
+** Policy Assignment
+
+The policy is assigned when a category is defined. New categories default to
+=:coexistent= (never loses information). Core security categories are explicitly
+=:exclusive=. The gate stack's bootstrapped facts are =:exclusive= because they
+describe the actual filesystem, not perspectives.
+
+The Screamer admission gate does not reject all contradictions. It rejects
+contradictions in =:exclusive= domains and flags them in =:coexistent= and
+=:temporal= domains. The constraint solver still works because queries scope
+their constraint set to a single provenance domain. "Is X true according to my
+memex?" is a different query than "Is X true according to Wikidata?" Each has
+a self-consistent internal logic. The contradiction is between domains, not
+within them.
+
+** Why This Matters for the Broader Memex
+
+In the coding domain, contradiction is rare and must be resolved — a gate can't
+both allow and block the same path. In the broader memex, contradiction is the
+product, not the error. Your poetry analysis contradicts your last diary entry
+on the same topic. Your reading of /Pale Fire/ changed between 2023 and 2025.
+Wikidata says Mount Everest is 8848m (China: rock height); DBpedia says 8849m
+(Nepal: snow height). The symbolic engine's job is not to decide which is right.
+It is to surface the tension with provenance — "these three sources disagree.
+Here is the chain for each."
+
+* How Categories Grow — The Organic Ontology
+
+Whitehead's /Principia Mathematica/ took over 300 pages to define the logical
+foundations before it could prove that one plus one equals two. Every category
+introduced carried a burden of justification. Every inference rule had to be
+demonstrated sound. This is the classical approach to ontology: define everything
+upfront, exhaustively, formally.
+
+Passepartout cannot afford this and does not need it. Its domain is bounded
+(software engineering, personal knowledge, literary engagement, daily life) and
+its ontology grows from the system's own operation:
+
+1. *The gate stack seeds the ontology.* Every gate vector is an implicit claim
+   about a category of things. The bootstrap makes these claims explicit. The
+   seed is 50-70 entity classes with no human authoring required — they are
+   mechanically extracted from the existing code.
+
+2. *New gate vectors add categories directly.* As the Dispatcher grows (new
+   shell patterns, new path protections, new tool classifications), the ontology
+   grows with it. Every new pattern in the gate stack becomes a fact on skill
+   load. No human effort. The gate stack grows, the ontology grows.
+
+3. *Screamer generalizes from gate outcomes.* After 37 shell commands are blocked
+   as destructive, Screamer extracts structural commonalities: "commands writing
+   to block devices," "commands recursively deleting outside the workspace."
+   These become new subcategories (=:block-device-command=,
+   =:workspace-external-delete=) that didn't exist in the original gate patterns.
+   The ontology deepens through observation.
+
+4. *The archivist proposes from prose.* The archivist reads a diary entry about
+   a book: "Nabokov's lectures on Kafka." The LLM proposes =(:entity :nabokov
+   :relation :lectures-on :value :kafka)=. Screamer checks consistency. Admitted.
+   The categories =:author=, =:lectures-on=, and =:subject= didn't exist before —
+   they are created on first use. This is the primary growth mechanism for the
+   broader memex.
+
+5. *The human declares explicitly.* The human writes a declarative fact directly
+   into the symbolic index. No extraction step. No LLM involvement. The fact is
+   admitted with =:provenance :human-authored= — the highest trust level.
+
+6. *Temporal patterns crystallize into categories.* Every Sunday the memex gets a
+   retrospective heading. Every Monday a planning heading. The time-awareness
+   system observes the periodicity and proposes =:weekly-retrospective= and
+   =:weekly-planning= as fact types. Screamer verifies they don't contradict
+   existing categorizations. Admitted.
+
+7. *Cross-domain overlap produces parent categories.* Screamer notices that
+   =:secret-files= (from the gate stack) and =:private-content= (from privacy
+   tags) share members — =.env= is both a secret file and private content. It
+   proposes =:sensitive-material= as a parent with both as children. Taxonomy
+   building happens automatically through overlap detection.
+
+** Growth is self-limiting by design
+
+Not every conceivable category is added. The system prunes through use:
+
+- New categories are admitted only through Screamer's consistency check. A
+  category that contradicts an existing classification is rejected.
+- A category that never gets queried costs nothing (a hash table entry) but
+  produces no value. It fades from use naturally.
+- Overly fine-grained categories (=.env.foo.bar.baz= as its own class) are
+  rejected because they are redundant with the wildcard pattern that already
+  covers them.
+- Overly broad categories that subsume meaningful distinctions ("everything is
+  a =:file=") produce contradictions when Screamer tries to apply existing rules.
+  Rejected.
+
+The system converges on a useful granularity through use, not through upfront
+design. The gate stack provides the seed. Gate outcomes, prose extraction,
+deduction, and human authoring grow the shoots. Screamer prunes contradictions.
+The ontology is a garden, not a building.
+
+* Semantic Wikipedia as Entity Backbone
+
+The gate stack provides 50-70 entity classes — adequate for a coding agent where
+the domain is bounded to files, commands, and code symbols. For a general-knowledge
+memex, 50-70 is starvation. Your memex mentions Nabokov, /Pale Fire/, Kinbote,
+Zembla, paranoid reading, unreliable narrators, postmodernism, butterfly
+migration, chess problems, and the Russian exile experience. The gate stack knows
+none of these. Organic growth through prose extraction would take years just to
+cover the entities in one person's engagement with a single novel.
+
+Wikidata has already done this work: approximately 2 million entity classes, over
+100 million entities, a decade of human curation. By loading the neighborhood of
+your memex into the symbolic index (entities referenced in your prose, plus their
+N-hop property net from Wikidata), the entity recognition problem vanishes. The
+archivist doesn't need to discover Nabokov from your diary. It needs to connect
+your heading to the existing Wikidata entity. That is a simpler task — reference
+resolution, not knowledge extraction.
+
+The LLM's role shrinks to three thin boundaries:
+
+1. *Input translation* — natural language question to structured query. "What do
+   I think about monorepos?" → =(fact-query :entity :monorepo :relation :opinion
+   :source :memex)=. Formulaic, ~100 tokens, any model sufficient.
+
+2. *Prose to candidate triple* — for personal memex entries that have no Wikidata
+   counterpart: your opinions, your day's events, your project plans. Proposals
+   are verified by Screamer before admission. This is the only extraction path
+   that still requires an LLM, and its scope is limited to what Wikidata cannot
+   provide — your subjective, personal, or novel content.
+
+3. *Result to prose* — structured answer to readable sentence. "Your 2023 diary
+   says 8848m. Wikidata (last edited Feb 2024) says 8849m. They disagree on
+   height." The reasoning is done; the LLM wraps the plist in grammar. ~100
+   tokens, any model sufficient, purely cosmetic. Users who prefer no LLM at all
+   can navigate through command-driven interaction (=/query=, =/contradictions=,
+   =/audit=, =/context why=).
+
+Everything else — the gate stack, the fact store, the constraint solver, the type
+hierarchy, the provenance tracking, the contradiction surfacing, the cross-domain
+comparison — is pure deterministic Lisp with zero LLM tokens.
+
+** The decisive simplification
+
+Without Semantic Wikipedia, the archivist must /discover/ entities from prose:
+extract a triple for every person, place, work, concept, and event mentioned in
+the memex. This is unbounded LLM work and the quality depends on extraction
+accuracy.
+
+With Wikidata loaded, the entity graph is pre-structured. The archivist's job
+changes from "discover that Nabokov wrote /Pale Fire/ and lectured on Kafka" to
+"verify that the Nabokov referenced in heading #47 is the same entity as Wikidata
+item Q36591." The second task is simpler, more reliable, and in many cases can
+be done without an LLM at all — a simple entity name match against the loaded
+Wikidata graph may suffice for unambiguous names.
+
+* The "Flip" — From Lossy Extraction to Deterministic Derivation
+
+The symbolic index begins its life as a lossy construct. The initial extraction
+from the prose — the first population of facts from LLM proposals verified by
+Screamer — is built from an uncertain foundation. Some facts are correct. Some
+are missing. Some are wrong.
+
+But the symbolic engine accumulates non-lossy facts through three independent
+mechanisms:
+
+1. *Gate outcomes* — every gate rejection is a fact. No LLM involved. These
+   accumulate at the rate of user interactions.
+2. *Screamer deductions* — new facts derived from existing facts. No LLM
+   involved. These accumulate whenever the fact store crosses a density threshold
+   where structural patterns emerge.
+3. *Human authoring* — the human explicitly declares facts. No LLM involved.
+
+At some point, the non-lossy facts constitute a sufficient foundation that the
+symbolic engine can reverse the flow: instead of the LLM extracting facts from
+prose, the symbolic engine reads prose through its own lens — its now-substantial
+ontology of categories, rules, and constraints — and asserts facts in its own
+language. The extraction mechanism ceases to be probabilistic and becomes
+deterministic.
+
+** The sufficiency criterion
+
+The architecture note (=notes/passepartout-symbolic-engine-exploration.org=) describes
+this "flip" as aspirational: "at some point, the non-lossy facts constitute a
+sufficient foundation." This design decision makes it operational:
+
+=(/ (count-provenance :gate-outcome :human-authored :deduced) total-facts)=
+
+When this ratio exceeds a configurable threshold (=SUFFICIENCY_THRESHOLD=,
+default 0.7), the system considers its foundation sufficient. The archivist
+switches from "LLM proposes, Screamer verifies" to "Screamer queries existing
+facts, applies to the new prose, and deduces new facts directly."
+
+The flip is visible to the user through the TUI sidebar or =/status= command:
+"Symbolic index: 847 facts (73% non-lossy, 12% LLM-proposed, 15% Wikidata).
+Sufficient foundation: YES."
+
+** The flip does not mean "complete"
+
+In the broader memex, completeness is neither possible nor desirable. The flip
+means "deterministic enough to be trustworthy," not "comprehensive enough to be
+self-sufficient." The neural index remains the gateway to the full richness of
+prose. The symbolic index handles what can be mechanically verified. The boundary
+is permanent.
+
+* Ephemeral First, Persistent Later
+
+The architecture note's Option 5 (ephemeral facts, no disk persistence) is the
+correct first implementation. Three reasons:
+
+1. *The fact language is unproven.* Triples with provenance and grounding is a
+   hypothesis. It may be too simple for some domains, too complex for others.
+   Committing to a serialization format before knowing what's useful is premature.
+
+2. *The ontology is emergent.* Categories are created on first use. What proves
+   useful stays; what doesn't fades. A persistent format would need a migration
+   story every time the category structure changes. Ephemeral avoids this entirely
+   — the facts are re-derived on each session start using the current (evolved)
+   ontology.
+
+3. *Rebuildability is the safety net.* Because all facts have a =:grounding= to
+   an Org heading, and gate-outcome facts are regenerated from the gate stack on
+   every load, the entire symbolic index can be thrown away and rebuilt from
+   scratch. The cost is compute, not data. This is the practical realization of
+   "the prose is always the ground truth."
+
+The transition to persistence (Phase 5: VivaceGraph) happens when two conditions
+are met: the fact language has stabilized through use, and the accumulated
+deductions across sessions provide value that justifies the serialization cost.
+
+* Whitehead's Concrete Contributions — Four Operational Contributions
+
+=notes/passepartout-whitehead.org= extracts four concrete, engineerable ideas
+from Whitehead's /Principia Mathematica/ and /Process and Reality/. They are
+summarized here because each informs the neurosymbolic design.
+
+** Contribution 1: PM-Type-Level Gates
+
+PM's ramified theory of types solved Russell's paradox by assigning every
+propositional function a type level, making self-application syntactically
+invalid. Passepartout applies the same principle to prevent a request from
+modifying the rules that validate it. Every cognitive tool and gate vector
+carries a =:type-level= integer. Before any gate predicate runs, the dispatcher
+checks: if the signal's type level equals or exceeds the gate's type level, the
+signal is rejected. A request to modify dispatcher rules (type-level 5) cannot
+pass a gate of type-level 4 or lower. This is a structural prohibition, not a
+heuristic — self-modification of the safety layer is impossible by construction.
+
+Implementation: approximately 30 lines in the existing dispatcher. No new
+dependencies. Backward compatible. This is Phase 0 of the symbolic engine
+roadmap.
+
+** Contribution 2: Theory of Descriptions → Reference Resolution
+
+PM's theory of descriptions addressed the problem of referring to nonexistent
+entities: "the current king of France is bald" is false, not meaningless, when
+there is no unique referent. Passepartout applies this to reference resolution:
+when the user says "the function that validates secrets," a cognitive tool checks
+uniqueness before resolving. Ambiguous references trigger a clarification prompt
+rather than a blind guess.
+
+Implementation: approximately 40 lines as a cognitive tool. When the knowledge
+graph ships, descriptions become native Prolog queries with uniqueness constraints.
+
+** Contribution 3: Process and Reality → Architectural Vocabulary
+
+Whitehead's process ontology maps with surprising precision to Passepartout's
+pipeline architecture. Prehension = a gate grasping a signal. Positive prehension
+= a gate passing. Negative prehension = a gate rejecting. Concrescence = the
+pipeline process from input to output. Satisfaction = the final agent response.
+This vocabulary is precise, standard, and already mapped to the architecture. It
+provides the language for the =/why= command, the gate trace, and the ARCHITECTURE
+documentation. It is descriptive, not operational — the design would be correct
+without it, but it would lack the vocabulary to describe /why/ it is correct.
+
+** Contribution 4: VivaceGraph + PM Types → KG Type Hierarchy
+
+When the knowledge graph ships, every entity inherits PM's type hierarchy.
+Entities carry =:pm-type-level= metadata. Queries cannot return entities of the
+same level as the querying function. Self-referential knowledge becomes
+structurally impossible — no "this entity defines its own type level." This is
+Contribution 1 applied to the knowledge layer rather than the execution layer.
+The dispatcher prevents self-referential /actions/; the KG prevents
+self-referential /facts/.
+
+* The Provenance Chain as Product
+
+In the coding domain, the value of the symbolic engine is the verified fact:
+"this command is safe." In the broader memex, the value is the provenance itself:
+"this claim originated in that diary entry on that date, has been referenced 7
+times across 4 different projects, was contradicted in a retrospective 6 months
+later, and was revised in a note 3 weeks after that."
+
+The symbolic engine doesn't tell you what is true. It tells you what you wrote,
+when, where, and how it connects to everything else you wrote — with a verifiable
+audit trail. It is a memory prosthesis that makes your own mind legible to you.
+
+Every fact carries:
+
+- =:grounding= — the specific Org heading from which it was extracted
+- =:provenance= — who or what produced it (gate-outcome, human-authored, deduced,
+  LLM-proposed)
+- =:timestamp= — when it was admitted to the symbolic index
+- =:referenced-by= — other facts that depend on or reference this one
+- =:contradicted-by= — other facts that disagree with this one (if any)
+- =:superseded-by= — if this fact was replaced by a newer version
+
+These fields make every fact auditable. The =/audit <node-id>= command renders
+the full provenance chain as an Org headline tree. The provenance is not a
+logging feature. It is the product.
+
+* The Competitive Argument
+
+No competitor has this problem because no competitor has a symbolic engine. The
+55 systems surveyed in =notes/competitive-landscape.org= range from pure chat
+agents (Claude, ChatGPT) to agent harnesses (Claude Code, OpenCode, Hermes) to
+platform agents (OpenClaw). None of them encode knowledge as formal facts with
+provenance. None of them verify extractions against an existing knowledge base.
+None of them can prove properties about their own rulesets.
+
+Their safety is heuristic (prompt-based guardrails that consume LLM tokens and
+can be evaded with clever phrasing). Their memory is flat (JSONL transcripts
+without content-addressed identity or provenance chains). Their reasoning is
+entirely neural — when you ask "why did you decide that?", the answer is a
+regenerated LLM explanation, not a retrieved inference chain.
+
+Passepartout's architectural bet is that this problem is worth solving — that a
+system which can surface contradictions with provenance, derive new facts from
+observations, and verify claims against a provenanced knowledge graph is
+fundamentally different from a system that can only call an LLM and hope the
+response is correct.
+
+The cost is the ontological work that is genuinely difficult. The reward is a
+system that cannot hallucinate at the reasoning level, whose memory is provable
+rather than empirical, and whose knowledge accumulates across sessions through
+deduction rather than through LLM re-prompting. For a life's knowledge stored in
+a personal memex, this is not a performance advantage. It is a category difference.
+
+* Open Questions
+
+Several design questions are unresolved and should remain unresolved at this
+stage. They represent research decisions that require experience running the
+system.
+
+** What is the minimum viable fact language?
+
+Triples — =(:entity :relation :value)= with provenance and grounding — is the
+current hypothesis. It is simple enough to be parseable, expressive enough to
+capture the gate stack's implicit claims, and extensible enough that Screamer
+can operate on it. But it may be too simple. Triples do not naturally express
+temporal relations ("was X before Y?"), modal claims ("should not do X unless
+Y"), or counterfactuals — all of which may be essential for a symbolically-aided
+memex. The right granularity depends on what queries actually need to be made,
+and that cannot be known in advance.
+
+** How does ontology refactoring work?
+
+If the seed produces 50 categories from gate extraction and later experience
+shows they are wrong — wrong granularity, missing cross-cutting concerns, conflated
+categories — how are they migrated without invalidating all existing deductions
+that cross the old category boundaries? The ephemeral-first approach (no
+persistence, rebuild from scratch) is a temporary answer. Once persistence is
+committed (VivaceGraph), refactoring the category hierarchy is a schema migration
+problem that deduction provenance makes harder — every deduced fact's chain may
+cross the old category boundary. This is not addressed in the current architecture.
+
+** What is the appropriate role of the human?
+
+The human can explicitly declare facts, write constraints, and correct wrong
+extractions. But how much of the ontology should the human need to maintain? If
+the human must write a definition for every new category the symbolic engine
+encounters, the overhead is prohibitive. If the symbolic engine can generalize
+from instances, the human role becomes supervision rather than authorship — review
+and approve proposed generalizations. The balance cannot be set without experience.
+
+** How much Wikidata is the right amount?
+
+Loading Wikidata entities referenced in the memex is the minimum. Loading all
+Wikidata entities within N hops of those references expands the graph
+exponentially. The right N depends on the memex's breadth — a memex focused on
+software engineering needs fewer hops than a memex spanning literature, history,
+philosophy, and science. The query performance and memory costs of a large
+Wikidata load are unknown.
+
+** Can the symbolic engine satisfy queries from the user without LLM involvement?
+
+The design aims for zero-LLM query answering: the user issues a structured
+command (=/query=, =/contradictions=, =/audit=), and the symbolic engine responds
+directly. But natural language questions ("what do I think about monorepos?")
+still require the LLM as a thin translation layer. Whether the structured command
+interface is sufficient for daily use, or whether users will demand natural
+language interaction, determines how much LLM involvement remains in the mature
+system.
+
+** Is the triplestore physically bounded or does it explode?
+
+A personal memex with years of diary entries, project notes, reading logs, and
+literary analyses could produce millions of triples. A naive hash table scales
+linearly but VivaceGraph's Prolog-like queries may not. The performance
+characteristics of graph queries over a million-triple knowledge base have not
+been estimated.
+
+* Relation to Passepartout's Existing Architecture
+
+The neurosymbolic engine is an extension of the existing probabilistic-deterministic
+split, not a replacement for it. The current architecture divides cognition into
+LLM-driven proposals and Lisp-driven verification. The symbolic engine deepens the
+verification side from "is this action safe?" to "is this claim supported?" — the
+same architectural pattern applied to a broader domain.
+
+The self-repair criterion (a file belongs in core only if, when corrupted, the
+agent cannot fix it without human help) applies to every component of the symbolic
+engine. Screamer, VivaceGraph, the fact store, the archivist — all are skills,
+loaded at runtime, hot-reloadable, and recoverable from corruption. A corrupted
+symbolic engine degrades reasoning capability but does not kill the agent. The
+eight existing core ASDF files are unchanged.
+
+The symbolic engine is not v3.0.0 alone. It is the layer that sits between the
+existing gate stack (which it makes explicit as facts) and the existing skill
+system (which it extends with deduction, contradiction detection, and provenance
+tracking). It grows within the current architecture without replacing any existing
+component.
+
+See also:
+
+- =passepartout-neurosymbolic-roadmap.org= — the concrete phased implementation plan
+- =notes/passepartout-symbolic-engine-exploration.org= — the original architecture note
+- =notes/passepartout-whitehead.org= — the four Whitehead contributions
+- =passepartout/docs/DESIGN_DECISIONS.org= — the existing design decisions
+- =passepartout/docs/ARCHITECTURE.org= — the current pipeline architecture
+- =passepartout/docs/ROADMAP.org= — the feature roadmap through v0.13.0