3.8 KiB
— title: The Five Architecture Options type: reference tags: :passepartout:architecture: —
The Five Architecture Options
The symbolic engine must relate to the human memex. The relationship is not obvious because knowledge lives in two incompatible forms: natural language prose (what the human reads and writes) and formal facts (what the symbolic engine reasons about). The translation between them is lossy by nature. The architecture is defined by how it handles that lossiness.
Option 1: The Auto-Formalizer
A separate knowledge graph stores symbolic facts. The LLM populates it by extracting triples from unstructured data. The KG becomes co-authoritative with the human prose.
This is the simplest to implement but inherits the dual-representation problem in its most acute form. The KG and the prose can disagree, and the architecture provides no mechanism for resolving disagreements. It also stores knowledge twice — once in the user's Org files, once in the KG — with no guarantee that they stay synchronized.
Option 2: Two Intentionally Separate Memexes
The human memex contains prose: thoughts, diaries, decisions, documentation. The symbolic memex contains formal facts: constraints, rules, relationships, deductions. The archivist bridges between them but does not try to keep them synchronized. They are allowed to diverge because they serve different purposes.
This is philosophically honest — it admits that no lossless translation between natural language and formal logic is possible. But it forces the user to reason about two separate knowledge stores.
Option 3: Tangled Fact Blocks in Org Files
A new block type — #+begin_src knowledge — would contain symbolic facts in a formal language. The tangle mechanism would load these facts into the symbolic engine's in-memory store, just as it loads Lisp code into the SBCL image.
This is aesthetically appealing because it unifies the format. One toolchain, one version control system, one Merkle tree. But the block language itself IS the knowledge representation language, and that language is the ontology we have not yet defined.
Option 4: One Memex, Two Indices
The prose remains in human language in Org files. The prose is always the ground truth. Two indices sit on top of the prose as derived views:
- The neural index uses vector embeddings to enable semantic search. The LLM navigates the prose through embedding space, retrieving relevant headings.
- The symbolic index stores formal assertions about what the prose says — predicates, relations, constraints — each grounded to a specific heading or block in the Org file.
Each index serves its own side of the machine. They do not need to understand each other's representations. They only need to agree on which heading or block they are referring to. Because the prose is always the ground truth, the symbolic index can be thrown away and rebuilt from scratch if it becomes corrupted or stale. No information is lost — only the extracted assertions.
Option 5: Ephemeral Symbolic Facts
No persistence, no serialization format, no knowledge graph stored on disk. VivaceGraph exists in memory during the session. Screamer derives facts from the prose as needed. When the session ends, the facts are discarded and re-derived on the next start.
This punts the ontological design problem entirely. You never have to decide on a serialization format because you never serialize. The cost is compute (re-derivation on every restart) and the inability to accumulate facts across sessions. But it is the correct first step — a way to learn what kinds of facts are actually useful before committing to a storage format.