hermes-brain/projects/passepartout/architecture/design/open-questions.org

:PROPERTIES:
:END:
#+title: Open Questions
#+filetags: :passepartout:architecture:

* Open Questions

** Open Questions
:PROPERTIES:
:ID:       27c03e1d-283f-4e3d-9f32-6476aafde97e
:ID:       design-open-questions
:CREATED:  [2026-05-08 Fri]
:WEIGHT:  40
:END:

Several design questions are unresolved and should remain unresolved at this stage. They represent research decisions that require experience running the system.

*** What is the minimum viable fact language?

Triples — =(:entity :relation :value)= with provenance and grounding — is the current hypothesis. It is simple enough to be parseable, expressive enough to capture the gate stack's implicit claims, and extensible enough that Screamer can operate on it. But it may be too simple. Triples do not naturally express temporal relations ("was X before Y?"), modal claims ("should not do X unless Y"), or counterfactuals — all of which may be essential for a symbolically-aided memex. The right granularity depends on what queries actually need to be made, and that cannot be known in advance.

*** How does ontology refactoring work?

This question is settled. See "Ontology Versioning" above. The category hierarchy is Merkle-hashed. Every fact stores its =:ontology-version=. Re-verification is heartbeat-driven. Worldviews are preserved, not overwritten. The shift is the artifact.

*** What is the appropriate role of the human?

The human can explicitly declare facts, write constraints, and correct wrong extractions. But how much of the ontology should the human need to maintain? If the human must write a definition for every new category the symbolic engine encounters, the overhead is prohibitive. If the symbolic engine can generalize from instances, the human role becomes supervision rather than authorship — review and approve proposed generalizations. The balance cannot be set without experience.

*** How much Wikidata is the right amount?

Query performance and memory costs are now bounded — 5 million entities ≈ 400MB RAM, O(1) hash lookups, domain-scoped Screamer checks. A large Wikidata load is a capital cost, not a recurring bill (see "Performance" above).

Remaining open: the right N hops from entities referenced in the memex depends on the memex's breadth. A software-engineering memex needs ~1 hop; a literary memex needs 3-4 hops (Nabokov → Kafka → expressionism → modernism → Baudelaire). The right value is empirical, testable, and user-specific — it cannot be set in the architecture.

*** Can the symbolic engine satisfy queries from the user without LLM involvement?

The design aims for zero-LLM query answering: the user issues a structured command (=/query=, =/contradictions=, =/audit=), and the symbolic engine responds directly. But natural language questions ("what do I think about monorepos?") still require the LLM as a thin translation layer. Whether the structured command interface is sufficient for daily use, or whether users will demand natural language interaction, determines how much LLM involvement remains in the mature system.

*** Is the triplestore physically bounded or does it explode?

A personal memex with years of diary entries, project notes, reading logs, and literary analyses could produce millions of triples. A naive hash table scales linearly but VivaceGraph's Prolog-like queries may not. The performance characteristics of graph queries over a million-triple knowledge base have not been estimated.