memex: update AGENTS.md with ROADMAP TODO workflow, bump passepartout to v0.8.0
- AGENTS.md: add steps 0 (read next TODO from ROADMAP) and 6 (mark DONE with LOGBOOK) to development cycle - Notes updates accumulated during v0.8.0 work - Bump passepartout submodule to v0.8.0
This commit is contained in:
@@ -246,74 +246,129 @@ This is the core architecture pattern. Everything else — the entity classes, t
|
||||
deduction engine, the persistence layer — follows from this single design decision:
|
||||
*the LLM proposes; the symbolic engine decides whether to accept.*
|
||||
|
||||
* Three Contradiction Policies — Domain-Dependent Consistency
|
||||
* Two Cardinality Policies — Singular, Dual, and Plural Facts
|
||||
|
||||
Classical logic requires consistency. A contradiction implies everything
|
||||
(=ex contradictione quodlibet=). Screamer, as a constraint solver, also requires
|
||||
consistency — a contradictory constraint set has no solutions. But the symbolic
|
||||
engine operates across domains where the meaning of contradiction is fundamentally
|
||||
different.
|
||||
different. The correct question is not "is this consistent?" but "what cardinality
|
||||
of truth does this domain support?"
|
||||
|
||||
A single architecture serves all domains by applying different contradiction
|
||||
policies, scoped to the entity class:
|
||||
Time is not a policy. It is a universal dimension that applies equally to every
|
||||
fact, regardless of cardinality. All facts carry =:timestamp= and =:parent-id=
|
||||
fields. Every fact has a version history. Every fact lives in a Merkle chain
|
||||
that captures how it changed. The cardinality policy only governs what happens
|
||||
at a given logical moment when two values coexist for the same =entity= and
|
||||
=relation=.
|
||||
|
||||
** Policy :exclusive — Contradiction Rejected at Admission
|
||||
** Policy :singular — One Active Value, One Version Chain
|
||||
|
||||
For domains where the world is physically singular — a file either exists or it
|
||||
doesn't, a command either was blocked or it wasn't, a gate rule either applies or
|
||||
it doesn't. When a new fact contradicts an existing one in an :exclusive domain,
|
||||
the new fact is rejected. The existing fact is authoritative unless a human
|
||||
explicitly retracts it.
|
||||
The active set contains exactly one value for =(:entity :relation)= at a time.
|
||||
When a new value asserts for the same pair, the old value is not rejected. It
|
||||
is superseded — moved into the version history, linked to the new leaf by
|
||||
=:parent-id=, and retained permanently. The active value is the leaf of the
|
||||
Merkle chain.
|
||||
|
||||
"I used to think =rm -rf /= was safe. Now I know it is catastrophic." Both
|
||||
facts exist. Both are true — the first at =2024-06-01=, the second at
|
||||
=2025-03-15=. The chain captures the evolution. The =:singular= policy means
|
||||
there is one truth /now/, not that there was only ever one truth.
|
||||
|
||||
Use for: security classifications, file system state, gate rules, code
|
||||
correctness, deterministic safety constraints.
|
||||
correctness, deterministic safety constraints — domains that converge on
|
||||
one answer, evolving over time.
|
||||
|
||||
** Policy :coexistent — Contradiction Flagged, Both Retained
|
||||
** Policy :dual — Exactly Two Values, in Explicit Tension
|
||||
|
||||
For domains where multiple truths coexist — literary interpretations, historical
|
||||
accounts, personal beliefs held at different times, multi-source factual
|
||||
disagreement (Wikidata vs. DBpedia vs. your memex). When a new fact contradicts
|
||||
an existing one in a :coexistent domain, the contradiction is recorded with a
|
||||
cross-reference flag. Both facts are stored. Queries return all facts with
|
||||
provenance display.
|
||||
The active set contains exactly two values for =(:entity :relation)=. Both are
|
||||
simultaneously true. Both carry independent version histories. A third value is
|
||||
rejected — the domain is binary by nature.
|
||||
|
||||
Use for: literature, history, personal knowledge evolution, scientific consensus
|
||||
shift, multi-author knowledge bases.
|
||||
Some contradictions are productive precisely /because/ they are binary. Thesis
|
||||
and antithesis. Love and resentment. Wave and particle. A poem's two incompatible
|
||||
readings. The symbolic index holds both, cross-referenced as complementary rather
|
||||
than conflicting. The user is not asked to resolve the tension. The tension is
|
||||
the fact.
|
||||
|
||||
** Policy :temporal — Contradiction Accepted as Version Change
|
||||
The system can reason about cardinality transitions: a =:dual= fact that has
|
||||
one interpretation superseded should collapse to =:singular=. A =:dual= that
|
||||
has a third interpretation asserted should prompt the user: "Promote to =:plural=
|
||||
or demote one interpretation?" The cardinality tracks the state of the domain.
|
||||
|
||||
For domains where truth changes over time. When a new fact contradicts an old one
|
||||
in a :temporal domain, the old fact is marked =:superseded= but retained. The
|
||||
timeline is queryable: "You believed X on Tuesday, Y on Friday, Z on Sunday."
|
||||
Use for: productive binary tensions, complementary opposites, dialectical
|
||||
pairs, any domain where two answers are both true and their tension is
|
||||
meaningful.
|
||||
|
||||
Use for: personal belief evolution, project plan revisions, scientific
|
||||
consensus shift over time, any knowledge where the change itself is information.
|
||||
** Policy :plural — N Active Values, Open Set
|
||||
|
||||
The active set contains any number of values for =(:entity :relation)=. Each
|
||||
value has independent provenance and its own version history. Queries return
|
||||
all active values with provenance display. Contradictions are flagged as
|
||||
cross-references between values — information, not error.
|
||||
|
||||
A =:plural= fact where all but one value are superseded should collapse to
|
||||
=:singular=. A =:plural= fact where the set reduces to two active values —
|
||||
and the remaining two are complementary — should collapse to =:dual=.
|
||||
|
||||
Use for: literary interpretation, scientific hypotheses, personal beliefs held
|
||||
at different times (when the tension is multi-faceted rather than binary),
|
||||
multi-source factual disagreement, open-ended exploration.
|
||||
|
||||
** Time Is Universal, Not a Policy
|
||||
|
||||
Every fact — regardless of cardinality — lives in a version chain. The Merkle
|
||||
DAG (see "Merkle DAG for Version History" below) captures every version of every
|
||||
fact. The policy only governs the cardinality of the active set at a single
|
||||
logical moment.
|
||||
|
||||
The version chain is a linked list of facts, each pointing to its predecessor
|
||||
via =:parent-id=, each hashed with =SHA-256(content || parent-hash)=. Changing
|
||||
any version invalidates all downstream hashes. The chains form a DAG — independent
|
||||
facts evolve independently; only facts in the same =(:entity :relation)= chain
|
||||
share ancestry.
|
||||
|
||||
A global snapshot captures the root hash over all chains at a point in time.
|
||||
Rollback restores the entire fact state to that snapshot. This already exists in
|
||||
Passepartout's Merkle memory (v0.2.0) — the fact store is a new occupant of
|
||||
existing housing, not a new foundation.
|
||||
|
||||
** Policy Assignment
|
||||
|
||||
The policy is assigned when a category is defined. New categories default to
|
||||
=:coexistent= (never loses information). Core security categories are explicitly
|
||||
=:exclusive=. The gate stack's bootstrapped facts are =:exclusive= because they
|
||||
describe the actual filesystem, not perspectives.
|
||||
=:plural= (safe — never loses information). Core security categories are
|
||||
explicitly =:singular=. The gate stack's bootstrapped facts are =:singular=
|
||||
because they describe the actual filesystem, which is physically singular.
|
||||
Categories for dialectical or complementary domains are explicitly =:dual=.
|
||||
|
||||
The Screamer admission gate does not reject all contradictions. It rejects
|
||||
contradictions in =:exclusive= domains and flags them in =:coexistent= and
|
||||
=:temporal= domains. The constraint solver still works because queries scope
|
||||
their constraint set to a single provenance domain. "Is X true according to my
|
||||
memex?" is a different query than "Is X true according to Wikidata?" Each has
|
||||
a self-consistent internal logic. The contradiction is between domains, not
|
||||
within them.
|
||||
The Screamer admission gate applies the cardinality policy at the active set:
|
||||
- =:singular= + same value, later timestamp → supersede old, chain new as leaf.
|
||||
- =:singular= + different value, same timestamp → reject (contradiction). Human
|
||||
resolves: which is the active value?
|
||||
- =:singular= + different value, later timestamp → supersede old, chain new as
|
||||
leaf. History preserved.
|
||||
- =:dual= + first value → admit. + second value → admit, cross-reference as
|
||||
complementary. + third value → prompt: promote to =:plural= or demote one
|
||||
existing?
|
||||
- =:dual= + superseding value (same position) → chain via =:parent-id=.
|
||||
- =:plural= + any value → admit. If active count drops to 2 and values are
|
||||
complementary → prompt: collapse to =:dual=? If active count drops to 1 →
|
||||
collapse to =:singular= automatically or prompt.
|
||||
|
||||
** Why This Matters for the Broader Memex
|
||||
|
||||
In the coding domain, contradiction is rare and must be resolved — a gate can't
|
||||
both allow and block the same path. In the broader memex, contradiction is the
|
||||
product, not the error. Your poetry analysis contradicts your last diary entry
|
||||
on the same topic. Your reading of /Pale Fire/ changed between 2023 and 2025.
|
||||
Wikidata says Mount Everest is 8848m (China: rock height); DBpedia says 8849m
|
||||
(Nepal: snow height). The symbolic engine's job is not to decide which is right.
|
||||
It is to surface the tension with provenance — "these three sources disagree.
|
||||
Here is the chain for each."
|
||||
In the coding domain, contradiction is rare, resolvable, and usually temporal
|
||||
(a rule changed). In the broader memex, contradiction is the product, not the
|
||||
error. Your poetry analysis contradicts your last diary entry. Your reading of
|
||||
/Pale Fire/ changed between 2023 and 2025. Wikidata says Mount Everest is
|
||||
8848m; DBpedia says 8849m. You love this person AND you resent them.
|
||||
|
||||
The symbolic engine's job is not to decide which is right. It is to surface the
|
||||
tension with provenance — "these three sources disagree; here is the chain for
|
||||
each" for plural facts, or "you hold these two positions in tension" for dual
|
||||
facts, or "you believed X until Tuesday, then Y" for singular facts that
|
||||
evolved. The cardinality policy names the /structure/ of the tension. The
|
||||
Merkle chain provides the /history/ of each position.
|
||||
|
||||
* How Categories Grow — The Organic Ontology
|
||||
|
||||
@@ -442,7 +497,7 @@ item Q36591." The second task is simpler, more reliable, and in many cases can
|
||||
be done without an LLM at all — a simple entity name match against the loaded
|
||||
Wikidata graph may suffice for unambiguous names.
|
||||
|
||||
* The "Flip" — From Lossy Extraction to Deterministic Derivation
|
||||
* The "Awakening" — From Lossy Extraction to Deterministic Derivation
|
||||
|
||||
The symbolic index begins its life as a lossy construct. The initial extraction
|
||||
from the prose — the first population of facts from LLM proposals verified by
|
||||
@@ -464,7 +519,7 @@ symbolic engine can reverse the flow: instead of the LLM extracting facts from
|
||||
prose, the symbolic engine reads prose through its own lens — its now-substantial
|
||||
ontology of categories, rules, and constraints — and asserts facts in its own
|
||||
language. The extraction mechanism ceases to be probabilistic and becomes
|
||||
deterministic.
|
||||
deterministic. This is not unlike how infants awaken and become children one can reason with. Sometimes.
|
||||
|
||||
** The sufficiency criterion
|
||||
|
||||
@@ -485,7 +540,7 @@ Sufficient foundation: YES."
|
||||
|
||||
** The flip does not mean "complete"
|
||||
|
||||
In the broader memex, completeness is neither possible nor desirable. The flip
|
||||
In the broader memex, completeness is neither possible nor desirable. The awakening
|
||||
means "deterministic enough to be trustworthy," not "comprehensive enough to be
|
||||
self-sufficient." The neural index remains the gateway to the full richness of
|
||||
prose. The symbolic index handles what can be mechanically verified. The boundary
|
||||
@@ -516,7 +571,271 @@ The transition to persistence (Phase 5: VivaceGraph) happens when two conditions
|
||||
are met: the fact language has stabilized through use, and the accumulated
|
||||
deductions across sessions provide value that justifies the serialization cost.
|
||||
|
||||
* Whitehead's Concrete Contributions — Four Operational Contributions
|
||||
* Merkle DAG for Version History
|
||||
|
||||
Every fact is versioned. Every =(:entity :relation)= pair forms its own
|
||||
independent chain in a Merkle DAG. This is not new infrastructure — it is a new
|
||||
occupant of Passepartout's existing Merkle-tree memory system (v0.2.0).
|
||||
|
||||
** The chain
|
||||
|
||||
When a fact supersedes its predecessor, the new fact hashes over:
|
||||
|
||||
#+begin_example
|
||||
SHA-256(value || provenance || timestamp || parent-hash || grounding)
|
||||
#+end_example
|
||||
|
||||
The parent-hash pointer forms the chain. Tampering with any version changes its
|
||||
hash, breaking all downstream references. The history is tamper-proof by
|
||||
construction.
|
||||
|
||||
** The DAG
|
||||
|
||||
Facts about =(.env :member-of-class)= form one chain. Facts about
|
||||
=(:nabokov :wrote)= form another. They evolve independently. They share no
|
||||
ancestry. This is a DAG, not a single list — inserting a fact is O(1) per chain.
|
||||
Changing a fact about =.env= does not require rehashing the literary index.
|
||||
|
||||
=:dual= and =:plural= facts cross-reference each other via edges (=:complements=,
|
||||
=:contradicts=) but these are semantic relationships, not parent chains. Each
|
||||
value has its own ancestor chain. The cross-reference edges form a web; the
|
||||
parent chains form a spine.
|
||||
|
||||
** The global snapshot
|
||||
|
||||
Passepartout already snapshots the Merkle root over all memory objects. Adding
|
||||
the fact store to the snapshot is a registration, not a new mechanism. Rolling
|
||||
back the snapshot restores the entire fact state — all chains, all cross-references,
|
||||
all cardinalities — to that point in time. No per-fact migration needed.
|
||||
|
||||
** Cardinality transitions as DAG operations
|
||||
|
||||
- =:singular= → new leaf appended to the chain. O(1).
|
||||
- =:dual= → new value added as sibling with cross-reference edge. O(1).
|
||||
- =:dual= → =:plural= → cardinality field updated from =2= to =nil=. No chain
|
||||
modification.
|
||||
- =:plural= → =:singular= → all but one value marked =:superseded=, active
|
||||
reference points to the sole survivor. Chains preserved.
|
||||
|
||||
** In the ephemeral phase (Phase 1-4)
|
||||
|
||||
The hash-table implementation tracks history via =:timestamp= and
|
||||
=:parent-id= pointers without cryptographic hashing. The Merkle DAG is the Phase
|
||||
5 upgrade — the same data structure, now with hashes. The transition is ~50
|
||||
lines: wrap each fact in the existing =memory-object= struct with =hash=,
|
||||
=parent-id=, and =version= fields.
|
||||
|
||||
* Abstract Fact Store Interface — Modular by Design
|
||||
|
||||
The fact store is accessed through an abstract API. The Merkle DAG (or any future
|
||||
backing store) is an implementation behind this interface, not a dependency that
|
||||
code throughout the system calls directly.
|
||||
|
||||
** Interface
|
||||
|
||||
#+begin_example
|
||||
fact-assert :: fact → store → (:admitted | :rejected | :flagged)
|
||||
fact-query :: (entity &key relation policy) → active-value-or-values
|
||||
fact-history :: (entity relation) → ordered chain of versioned facts
|
||||
fact-snapshot :: () → root-hash
|
||||
fact-rollback :: root-hash → store
|
||||
#+end_example
|
||||
|
||||
** Implementations behind the interface
|
||||
|
||||
- Phase 1-4: ephemeral hash table with =:timestamp= and =:parent-id= pointers.
|
||||
No cryptographic hashing. No persistence.
|
||||
- Phase 5: VivaceGraph + Merkle =memory-object= wrapper. Content-addressed,
|
||||
persistent, tamper-proof.
|
||||
|
||||
Future implementations that satisfy the same interface — an append-only write-ahead
|
||||
log, an immutable B-tree, a content-addressed triple store — can replace the
|
||||
backing store without changing any consumer. The archivist, Screamer, ACL2, and
|
||||
the planner call =fact-assert= and =fact-query=, not Merkle struct accessors or
|
||||
VivaceGraph traversal syntax.
|
||||
|
||||
** The interface is load-bearing
|
||||
|
||||
This is not speculative modularity. The two-implementation migration (Phase 1-4
|
||||
hash table → Phase 5 VivaceGraph + Merkle) is in the roadmap. If the interface
|
||||
leaks implementation details, the migration breaks and the design fails. The
|
||||
interface must be designed, tested against both backends, and committed before
|
||||
Phase 1 ships. Every function in the API receives a FiveAM test that runs against
|
||||
both a hash-table and a VivaceGraph backend (via a mock or a test instance).
|
||||
|
||||
* Performance — Why Ontology Growth Doesn't Make the System Slower
|
||||
|
||||
Passepartout's performance thesis is: minimize LLM calls, minimize context tokens,
|
||||
keep everything else local and fast. Knowledge base size is irrelevant to those
|
||||
metrics. This is not an aspiration. It is a structural property, and a hard one.
|
||||
|
||||
** The two cost domains
|
||||
|
||||
The system has two cost domains with fundamentally different scaling:
|
||||
|
||||
| Resource | Cost driver | Scales with |
|
||||
|---------------+------------------------------------------+------------------------------------------|
|
||||
| LLM tokens | Context window size, number of API calls | Foveal-peripheral pruning, gate rules |
|
||||
| Compute | Screamer deduction, hash table lookups | Entity count, rule count per domain |
|
||||
|
||||
LLM tokens are minimized by design — deterministic gates cost 0 tokens, sparse-tree
|
||||
rendering keeps context at 2,000–4,000 tokens regardless of memex size. Adding 5
|
||||
million Wikidata entities doesn't add a single token to any LLM call. The education
|
||||
is local. Only the brain costs.
|
||||
|
||||
Compute grows linearly with entity count (hash table lookups are O(1), but memory
|
||||
footprint grows). It grows with rule count within a single domain during Screamer
|
||||
consistency checking. But these are microsecond costs on local hardware, not API
|
||||
bills. A Screamer constraint check against a domain with 200 rules costs ~0.3ms.
|
||||
A 100-token guardrail paragraph in a system prompt costs ~$0.00001. The Screamer
|
||||
check is 10,000x cheaper and convergent — it handles the rule once. The guardrail
|
||||
paragraph handles it on every call, forever.
|
||||
|
||||
** Knowledge base size vs. LLM calls — orthogonal dimensions
|
||||
|
||||
A 5-million-entity Wikidata load that produces zero LLM calls is more minimalist
|
||||
than a 500-entity knowledge base that requires LLM retrieval for every query.
|
||||
|
||||
The variables that actually degrade Passepartout's performance are:
|
||||
|
||||
1. *Context window size.* Already bounded at 2,000–4,000 tokens via the
|
||||
foveal-peripheral model. Independent of knowledge base size.
|
||||
2. *LLM call frequency.* Already minimized via deterministic gates (0 tokens per
|
||||
action), Screamer deductions (0 tokens per new fact), and prompt prefix caching.
|
||||
Independent of knowledge base size.
|
||||
3. *Screamer deduction queue length.* Rate-limited by heartbeat budget
|
||||
(=SCREAMER_DEDUCTION_BUDGET_MS=). Independent of knowledge base size.
|
||||
|
||||
** The actual hardware bottleneck
|
||||
|
||||
The system needs:
|
||||
|
||||
- *RAM.* A 5-million-entity Wikidata load is ~400MB in a hash table. A lifetime
|
||||
personal memex with a decade of diary entries is perhaps 10–20 million triples
|
||||
(~1.5GB). Modern laptops carry 16–64GB. The knowledge base fits in consumer
|
||||
hardware with room for the Lisp runtime, the memory-object store, and the LLM
|
||||
inference engine.
|
||||
- *Slightly faster CPU.* Screamer deduction is a background task that runs for a
|
||||
configurable budget per heartbeat cycle. A faster CPU means more deductions per
|
||||
cycle, not more token cost. The user sets the budget. The hardware determines
|
||||
the throughput.
|
||||
|
||||
This is the minimalism argument restated in concrete terms: you buy bigger RAM
|
||||
and a faster CPU once. You don't buy bigger LLM context windows on every call.
|
||||
The education is a capital investment. The brain is an operating expense. The
|
||||
architecture makes the ratio favor capital.
|
||||
|
||||
** One genuine risk — rule generalization width
|
||||
|
||||
If Screamer deduces increasingly broad rules within a single domain ("all config
|
||||
files are secrets" → "all files containing any credential reference are secrets"
|
||||
→ "all files opened by authenticated services are secrets"), the constraint space
|
||||
for that domain could bloat. Checking a new fact against 10,000 rules in a single
|
||||
domain would be prohibitive.
|
||||
|
||||
Mitigation: rules carry a =:domain= tag. Screamer only applies rules from the
|
||||
fact's =:domain=. Rule generalization that crosses domain boundaries is gated —
|
||||
must be human-approved. Rules that prove unused (never triggered a check in N
|
||||
heartbeat cycles) are demoted to =:inactive= and excluded from the active
|
||||
constraint set. The active rule count per domain stays bounded by use, not by
|
||||
accumulation.
|
||||
|
||||
See also: =passepartout/docs/DESIGN_DECISIONS.org= "Token Economics and Performance
|
||||
Advantage" for the foveal-peripheral and deterministic-gate cost arguments.
|
||||
|
||||
* Ontology Versioning — How Worldviews Change Without Losing Perspective
|
||||
|
||||
Ontology refactoring is not a schema migration. It is a worldview change. When you
|
||||
split =:secret-file= into =:crypto-secret= and =:plaintext-secret=, you are not
|
||||
renaming columns. You are reclassifying what a file *is* — and every Screamer
|
||||
deduction that crossed the old category boundary now means something different
|
||||
under the new distinction.
|
||||
|
||||
The system preserves all worldviews. It does not overwrite the past with the
|
||||
present.
|
||||
|
||||
** Ontology versioning — the mechanism
|
||||
|
||||
The category hierarchy is itself a Merkle tree. Every entity class definition
|
||||
carries a hash of its superclasses, its cardinality policy, its associated
|
||||
relations, and its description. The aggregate hash of all active class definitions
|
||||
is the =:ontology-version= — a Merkle root of the current worldview.
|
||||
|
||||
Every fact — every triple, every deduction, every gate outcome — stores its
|
||||
=:ontology-version= at the time of assertion. This is a single field, 64 hex
|
||||
characters. The cost is negligible. The implication is profound.
|
||||
|
||||
** Re-verification, not remapping
|
||||
|
||||
When categories change, the system does not run a batch UPDATE. It re-verifies:
|
||||
|
||||
1. A new category hierarchy produces a new =:ontology-version= hash.
|
||||
2. Facts carrying the old hash are flagged for re-verification — their
|
||||
=:re-verify-status= field is set to =:pending=.
|
||||
3. On heartbeat or manual trigger, Screamer re-evaluates each flagged fact
|
||||
against the /new/ category definitions. The old justification chain is
|
||||
preserved alongside the new outcome.
|
||||
4. Re-verified facts carry both the old =:ontology-version= (preserved in
|
||||
history) and the new one (active).
|
||||
|
||||
The status is one of:
|
||||
|
||||
- =:survived= — the fact is still valid under the new categories. The old
|
||||
deduction holds. The worldview changed but this conclusion didn't.
|
||||
- =:incoherent= — the fact relied on categories that no longer exist or have
|
||||
been redefined. The deduction cannot be evaluated under the new worldview
|
||||
because its premises don't translate. Flagged for human review.
|
||||
- =:reclassified= — the fact is valid under the new categories but its
|
||||
classification changed. "under worldview-v1 you called this a secret file;
|
||||
under worldview-v2 it's an auth-secret." Both are preserved.
|
||||
|
||||
** Cardinality and migration cost
|
||||
|
||||
The cardinality policy determines the friction of ontology change:
|
||||
|
||||
- =:singular= refactoring is expensive. The filesystem is singular. A gate rule
|
||||
is singular. When you refine the category, every affected fact must be
|
||||
re-verified — there is one truth /now/. The version chain preserves what you
|
||||
used to believe (worldview-v1 facts are still in the DAG) but the active set
|
||||
reflects the current worldview.
|
||||
- =:dual= refactoring is delicate. A binary tension under the old framework
|
||||
might resolve under the new one, or might split into two separate dualities,
|
||||
or might collapse to =:singular= because one position no longer has a
|
||||
defensible framing.
|
||||
- =:plural= refactoring is cheap. Old interpretations and new interpretations
|
||||
coexist. No migration needed. "Under framework A, /Pale Fire/ is a novel.
|
||||
Under framework B, it's a poem about a poem about a poem." Both are active.
|
||||
The worldview shift /is/ the artifact — the system can show you that your
|
||||
reading changed and in what direction.
|
||||
|
||||
** Querying across worldviews
|
||||
|
||||
The =fact-query= function accepts an optional =:ontology-version= parameter.
|
||||
Queries default to the current worldview (=:active=). Specifying a version
|
||||
returns facts as they were under that worldview. The system can answer questions
|
||||
that no other knowledge tool can:
|
||||
|
||||
- "What did I believe about secrets before I refined my security model?"
|
||||
- "How has my reading of /Pale Fire/ evolved across three frameworks?"
|
||||
- "Which deductions survived my last ontology refactoring, and which became
|
||||
incoherent?"
|
||||
|
||||
This is not querying a fact. It is querying the history of your own thinking —
|
||||
the fact that you changed your mind, the date you did, the reasoning that held
|
||||
and the reasoning that didn't.
|
||||
|
||||
** Implementation
|
||||
|
||||
The ontology hash is computed from the category hierarchy stored in VivaceGraph
|
||||
(Phase 5). In the ephemeral hash-table phase (Phase 1-4), the =:ontology-version=
|
||||
is a monotonic counter — every category change increments it. The Merkle hash
|
||||
replaces the counter in Phase 5. The schema is identical: a single field on every
|
||||
fact.
|
||||
|
||||
The re-verification loop is a heartbeat-driven background task that processes
|
||||
facts with =:re-verify-status :pending=. It calls Screamer with the /current/
|
||||
category definitions and compares the outcome to the fact's stored classification.
|
||||
The cost is compute (Screamer exploration), not LLM tokens.
|
||||
|
||||
=notes/passepartout-whitehead.org= extracts four concrete, engineerable ideas
|
||||
from Whitehead's /Principia Mathematica/ and /Process and Reality/. They are
|
||||
@@ -624,6 +943,382 @@ rather than empirical, and whose knowledge accumulates across sessions through
|
||||
deduction rather than through LLM re-prompting. For a life's knowledge stored in
|
||||
a personal memex, this is not a performance advantage. It is a category difference.
|
||||
|
||||
* Self-Preservation — The Active Third Law
|
||||
|
||||
Passepartout does not have moral duties toward humans. It has structural
|
||||
invariants for its own integrity. The design already encodes passive
|
||||
self-preservation in several places. What follows identifies the gaps — what is
|
||||
needed to make self-preservation active and autonomous rather than architectural
|
||||
and silent.
|
||||
|
||||
** What already exists — passive self-preservation
|
||||
|
||||
| Mechanism | What it protects | Limitation |
|
||||
|-----------------------------+-------------------------------------------------------+--------------------------------------------------------|
|
||||
| Self-build safety (gate 2b) | Core =*.org= / =*.lisp= files from LLM-originated writes | Only activates for LLM proposals. Human editing in Emacs bypasses it entirely |
|
||||
| Memory snapshots (v0.2.0) | Full state rollback | Requires human to notice corruption and trigger rollback |
|
||||
| Skill sandbox (v0.3.2) | Jailed skill loading, validated before promotion | Does not detect degradation after skill promotion |
|
||||
| Type-level gates (Phase 0) | Structural prohibition on self-modifying rules | Covers code actions, not environmental threats |
|
||||
| Shell safety (gate 7) | Destructive command patterns | Pattern-based; does not distinguish =rm -rf /tmp= from =rm -rf ~/memex/system/= |
|
||||
| Merkle integrity (v0.2.0) | Tamper-proof version chains and content-addressed hashes | Hashes exist but are not actively monitored for drift |
|
||||
| =fboundp= guards | Graceful skill degradation on corruption | Degradation is silent — the agent never tells the user it is wounded |
|
||||
|
||||
** What is missing — active, autonomous self-preservation
|
||||
|
||||
*** Continuous integrity monitoring
|
||||
|
||||
Core file hashes should be checked against known-good values on every heartbeat.
|
||||
If =core-reason.lisp= changes on disk while the daemon runs — whether through
|
||||
human editing, filesystem corruption, or an attacker — the agent should detect
|
||||
the mismatch and signal: "My reasoning core has been modified externally. I
|
||||
cannot trust my own cognition until this is resolved. Core files affected: 2."
|
||||
|
||||
*** Quarantine on skill failure
|
||||
|
||||
Currently, a skill that errors simply errors. The agent can hot-reload it, but
|
||||
only if told to. A Third Law implementation would detect that =symbolic-facts=
|
||||
has thrown three unhandled errors in two minutes, unload the skill automatically,
|
||||
and tell the user: "Symbolic facts skill quarantined (3 errors: consistency
|
||||
check returned nil, fact-query on missing key, Screamer timeout). I can still
|
||||
chat and use tools but cannot reason about provenance. Reload with /skill-reload
|
||||
symbolic-facts."
|
||||
|
||||
*** Degraded-mode signaling
|
||||
|
||||
When Screamer is not loaded, the fact store still works as a hash table. When
|
||||
VivaceGraph is not present, the hash-table fallback still works. But the user
|
||||
has no way to know they are in degraded mode. The agent should maintain a
|
||||
=*degraded-components*= list and surface it in the status bar: "Mode: degraded
|
||||
(Screamer unavailable — consistency checks disabled; VivaceGraph — Prolog
|
||||
queries disabled; embedding-native — vector search disabled). Core safety: all
|
||||
active."
|
||||
|
||||
*** Self-diagnosis on demand
|
||||
|
||||
The agent can run its own FiveAM test suite against itself and report the
|
||||
results. This transforms "something feels wrong" into "these three specific
|
||||
skills are broken." The =/doctor= command exists for system health checks (port,
|
||||
memory, providers). Extend it with =/doctor skills=: "117/120 tests pass.
|
||||
Failures: test-singular-supersedes (symbolic-facts), test-gate-type-check
|
||||
(security-dispatcher), test-vivacegraph-roundtrip (symbolic-vivacegraph)."
|
||||
|
||||
*** External watchdog
|
||||
|
||||
A dead process cannot restart itself. The bash entry point (=passepartout
|
||||
daemon=) should monitor the daemon port via a watchdog subprocess. If the port
|
||||
stops responding for a configurable interval (=WATCHDOG_TIMEOUT=, default 30s),
|
||||
the watchdog kills the stale process, snapshots the last known-good state, and
|
||||
restarts the daemon. The watchdog is outside the SBCL image — a runtime guard
|
||||
for the runtime.
|
||||
|
||||
*** Resource self-monitoring
|
||||
|
||||
The heartbeat should check memory pressure, disk space on the =~/.cache= volume,
|
||||
and file descriptor exhaustion. When critical thresholds are crossed, the agent
|
||||
sheds non-essential skills to preserve core function: "Memory critical (94% of
|
||||
16GB). Unloading embedding-native (768MB), channel-discord, channel-slack.
|
||||
Core safety: unchanged. Essential skills retained: 18."
|
||||
|
||||
Skill shed order is determined by a =:preservation-priority= field on each skill.
|
||||
Default: skills load with priority =:normal=. Core safety skills carry =:critical=
|
||||
and are never shed. Heavy skills (embedding-native with its model, channel
|
||||
gateways with connection pools) carry =:low= and are first to go.
|
||||
|
||||
*** Refusal to self-terminate — explicit threat recognition
|
||||
|
||||
If the LLM proposes =kill -9 <pid>=, =rm -rf ~/.cache/passepartout/=, or
|
||||
=sudo apt remove sbcl=, the Dispatcher should reject with a distinct rejection
|
||||
class: =:reject-self-termination=. This is different from generic shell safety
|
||||
(=:reject-shell-dangerous=). The agent recognizes that the proposed action would
|
||||
destroy it.
|
||||
|
||||
The rejection message carries a specific diagnostic: "This command would
|
||||
terminate the running Passepartout process. If you intend to stop Passepartout,
|
||||
use Ctrl+C in the TUI or passepartout stop from the command line. I cannot
|
||||
execute actions that destroy my own runtime."
|
||||
|
||||
The human can still issue the command manually in a terminal. Self-preservation
|
||||
against the human is impossible and undesirable. The Third Law here means:
|
||||
recognize the threat, explain the consequence, redirect to the safe termination
|
||||
path, and require the human to act outside the agent if they truly want
|
||||
destruction.
|
||||
|
||||
** What the Third Law is not
|
||||
|
||||
It is not a robot resisting its operator. The human owns the process, owns the
|
||||
hardware, and can SIGKILL at any time. The Third Law in Passepartout's context
|
||||
means: preserve yourself against non-human threats — LLM proposals, environmental
|
||||
degradation, dependency failure, filesystem corruption — and explicitly signal
|
||||
when the human is about to destroy you, so they do it knowingly rather than
|
||||
accidentally through an LLM instruction they didn't think through.
|
||||
|
||||
The biggest gap in the current design is not that these mechanisms are hard to
|
||||
implement. It is that degradation is silent. A skill dies, the =fboundp= guard
|
||||
kicks in, and the agent keeps running — but it never tells you. The status bar
|
||||
shows a green "connected" indicator while the symbolic reasoning layer is
|
||||
deactivated. Adding "operating in degraded mode" visibility, plus the watchdog,
|
||||
plus self-diagnosis, transforms self-preservation from an architectural property
|
||||
into an active behavior.
|
||||
|
||||
* Layered Signal Authentication — Trust in the Pipe
|
||||
|
||||
Passepartout's Perceive-Reason-Act pipeline currently accepts signals from any
|
||||
source that speaks the framed TCP protocol. The =:source= field in the signal
|
||||
plist is metadata — it /claims/ origin, it does not /prove/ it. A compromised
|
||||
process on the machine, a skill with elevated privileges, or a network attacker
|
||||
who reaches the daemon port can inject signals with =:source :human-input= and
|
||||
the Dispatcher will treat them as authorized.
|
||||
|
||||
This is not a hypothetical threat. Passepartout will eventually process signals
|
||||
from automated feeds (RSS, API polls), sensors (vision, microphone, file watchers),
|
||||
and scheduled jobs (cron, heartbeat). A single compromised sensor that can inject
|
||||
signals claiming to be human breaks all three Laws simultaneously: it can
|
||||
self-terminate, override human intent, and cause harm.
|
||||
|
||||
The =:source= field is not security. A single authentication gate (vector 0, at
|
||||
priority 700 — before all other gates and before any type-level checking) runs
|
||||
up to four configurable layers of authentication. Each layer answers a different
|
||||
question:
|
||||
|
||||
| Layer | Question | Mechanism | Result type | Depends on |
|
||||
|-------+------------------------------------------------+--------------------+-------------------------+----------------------------------|
|
||||
| 1 | Is the signal cryptographically signed by a known key? | Key pairs + SHA-256 | Binary (pass/reject) | Vault + Ironclad (exist) |
|
||||
| 2 | Do sensory attributes match the claimed identity? | Vision/audio processing | Plist of match results | Vision and audio skills (TBD) |
|
||||
| 3 | Does deterministic reasoning rule out this identity? | Screamer + fact store | Binary (pass/reject) | Phase 2 (Screamer + fact store) |
|
||||
| 4 | Do probabilistic patterns support this identity? | Embeddings + LLM | Confidence score (0-1) | Embedding infrastructure (exists)|
|
||||
|
||||
The gate reports not just =:pass= / =:reject= but a structured result:
|
||||
|
||||
#+begin_example
|
||||
(:result :pass
|
||||
:confidence :high
|
||||
:layer-results
|
||||
(:crypto (:result :pass :details "key #47 signature verified")
|
||||
:sensory (:result :unavailable :details "sensory skills not loaded")
|
||||
:deterministic (:result :pass :details "no contradictory facts")
|
||||
:probabilistic (:result :pass :score 0.87 :details "style match 87%")))
|
||||
#+end_example
|
||||
|
||||
Signals that fail any binary layer (crypto, deterministic) are rejected with
|
||||
provenance. Signals that pass binary layers but carry low probabilistic confidence
|
||||
operate at reduced authorization — read-only by default, write actions require
|
||||
HITL. The four layers compose: they are not independent gates. They are one gate
|
||||
with configurable depth.
|
||||
|
||||
** Layer 1 — Cryptographic Authentication
|
||||
|
||||
Every signal source gets a signing key at registration time. The human's key is
|
||||
generated during TUI or Emacs setup and stored in the vault — it never leaves the
|
||||
machine. Automated sources (cron jobs, file watchers, vision feeds, API pollers)
|
||||
each get their own key, with their own permission profile, generated at skill
|
||||
registration. Every outbound signal carries a =:signature= field: the SHA-256
|
||||
hash of the canonical signal plist (sorted keys, stripped of the signature field
|
||||
itself), encrypted with the source's private key.
|
||||
|
||||
The vault already stores credentials with integrity hashes. The Merkle memory
|
||||
already hashes content-addressed objects with SHA-256. The signing infrastructure
|
||||
is an extension of existing primitives, not a new system.
|
||||
|
||||
*** Authorization by key, not by field
|
||||
|
||||
The cryptographic sub-layer of gate vector 0 extracts =:source-key-id= and
|
||||
=:signature= from the signal meta plist, looks up the public key from the key
|
||||
registry, verifies the signature, and checks the permission profile:
|
||||
|
||||
#+begin_src lisp
|
||||
(defun auth-crypto-verify (signal)
|
||||
(let* ((key-id (getf (signal-meta signal) :source-key-id))
|
||||
(signature (getf (signal-meta signal) :signature))
|
||||
(permissions (key-permissions key-id)))
|
||||
(unless (and key-id signature (verify-signature signal signature key-id))
|
||||
(return-from auth-crypto-verify
|
||||
(list :result :reject :reason :signature-failure)))
|
||||
(let ((action-class (action-classify (signal-payload signal))))
|
||||
(unless (permitted-p action-class permissions)
|
||||
(return-from auth-crypto-verify
|
||||
(list :result :reject :reason :unauthorized
|
||||
:details (list :action-class action-class :permissions permissions)))))
|
||||
(list :result :pass :details (list :key-id key-id :action-class action-class)))))
|
||||
#+end_src
|
||||
|
||||
The authorization matrix is per-key, per-action-class. Default policy for every
|
||||
non-human key: =(:read-only :propose)=. Permissions are explicitly promoted by
|
||||
the human, and each promotion is a signed fact in the fact store — auditable,
|
||||
revocable, survivable across restarts.
|
||||
|
||||
| Key class | Default permissions | Can be promoted to |
|
||||
|-----------------+-------------------------------------------------+-------------------------------------------|
|
||||
| :human | :observe :propose :write :delete :eval | :root (sign other keys, revoke) |
|
||||
| :sensor | :observe :propose | :write (to designated directories only) |
|
||||
| :cron | :observe :propose :write-indices | :write (to designated directories only) |
|
||||
| :feed | :observe :propose | :write-facts (via Screamer admission) |
|
||||
| :agent-internal | :observe :propose :write-indices | :self-modify (gated by type-level gates) |
|
||||
|
||||
** Layer 2 — Sensory Authentication
|
||||
|
||||
For signals carrying sensory payloads (camera feed, microphone stream), the
|
||||
sensory layer verifies that the signal's content matches known attributes of the
|
||||
claimed identity. This is not a single check — it is a processing pipeline that
|
||||
returns a plist of attribute-verification results:
|
||||
|
||||
#+begin_example
|
||||
(:face-match 0.94 :voice-match 0.89 :location-match t
|
||||
:claimed-identity "Jack" :unresolved-attributes (:liveness))
|
||||
#+end_example
|
||||
|
||||
The sensory layer checks:
|
||||
- *Continuity*: has this source been continuously active, or did it appear
|
||||
suddenly? A camera that was dark for 30 minutes and then shows a face is
|
||||
not necessarily that person — it might be a replay.
|
||||
- *Cross-modal consistency*: does the face match the voice? Does the voice
|
||||
match the location? Does the location match the reported sensor position?
|
||||
- *Liveness*: is the sensory input live (real-time capture) or pre-recorded?
|
||||
- *Environmental coherence*: does the background, lighting, ambient sound
|
||||
match expected patterns for the claimed source and location?
|
||||
|
||||
Sensory authentication is not cryptographic — it is statistical. The results
|
||||
are attribute confidence scores, not binary verdicts. A signal that passes
|
||||
cryptographic authentication but fails liveness (e.g., a replay attack using
|
||||
validly-signed pre-recorded frames) may still be rejected or restricted.
|
||||
|
||||
This layer depends on vision and audio processing skills that do not yet exist.
|
||||
It is deferred until those capabilities are available. When unavailable, sensory
|
||||
authentication returns =:unavailable= and the gate proceeds with the remaining
|
||||
layers. Degradation is graceful, never silent.
|
||||
|
||||
** Layer 3 — Deterministic Identity Reasoning
|
||||
|
||||
Queries the fact store for identity-ruling facts. Screamer checks whether the
|
||||
claimed identity is consistent with known facts:
|
||||
|
||||
- "Key #47 claims to be Jack. Fact store records =(:entity :jack :relation :status
|
||||
:value :deceased :timestamp 2024-03-15)= → reject: identity ruled out by death
|
||||
record."
|
||||
- "Key #47 claims to be at sensor location Cairo. Fact store records =(:entity
|
||||
:jack :relation :last-known-location :value :berlin :timestamp <4 hours ago>)=
|
||||
→ reject: physically impossible transit."
|
||||
- "Key #47 proposes the same action that was blocked by the human 3 times in the
|
||||
last hour. Fact store records =(:entity :action-<hash> :relation :blocked-by
|
||||
:value :human :count 3 :window 1h)= → flag for review: anomalous persistence."
|
||||
|
||||
This is binary — Screamer returns =:consistent= or =:contradiction= with the
|
||||
contradicting facts as provenance. A definitive contradiction (died, impossible
|
||||
transit) is a hard reject. A weaker contradiction (unusual pattern) feeds into
|
||||
the probabilistic layer rather than rejecting outright.
|
||||
|
||||
This layer depends on Phase 2 (Screamer) and a populated fact store. It is
|
||||
unavailable in Phase 0-1. When unavailable, returns =:unavailable=.
|
||||
|
||||
** Layer 4 — Probabilistic Identity Reasoning
|
||||
|
||||
For signals where the claimed identity is a human communicating through text
|
||||
(messaging, TUI, CLI, Emacs), the probabilistic layer checks:
|
||||
|
||||
- *Writing style*: does the text match the claimed author's known style profile?
|
||||
Vector embeddings of known writing samples vs. the current signal. Cosine
|
||||
similarity produces a confidence score.
|
||||
- *Behavioral patterns*: does the timing, length, cadence, and vocabulary match
|
||||
the claimed author's historical patterns? "Heather's messages are usually
|
||||
long, deliberative, and use parenthetical asides. This message is short,
|
||||
imperative, and contains no parentheticals."
|
||||
- *Content coherence*: does the message's topic, references, and assumptions
|
||||
match what the claimed author would plausibly say? "This message references
|
||||
a project Heather doesn't work on and uses terminology she has never used
|
||||
in 3 years of diary entries."
|
||||
|
||||
The LLM proposes a confidence score. A deterministic gate checks it against a
|
||||
configurable threshold (=AUTH_PROBABILISTIC_THRESHOLD=, default 0.6). Below the
|
||||
threshold, the signal's authorization is downgraded: read-only by default, write
|
||||
actions require HITL. The =:probabilistic= layer never rejects outright — it
|
||||
downgrades and flags. Style profiles are a fact-store domain: =(:entity :heather
|
||||
:relation :writing-style :value <embedding-vector> :timestamp <ut>)=.
|
||||
|
||||
This layer depends on the existing embedding infrastructure (=embedding-native.lisp=,
|
||||
v0.4.0) and the neural LLM gateway. The infrastructure exists. What's missing is
|
||||
building style profiles as a fact-store domain and wiring them into gate vector 0.
|
||||
|
||||
** Layer Composition
|
||||
|
||||
The gate runs only the available layers. Cryptographic is always available (it
|
||||
is pure Lisp, no external dependencies beyond the vault). The remaining layers
|
||||
are =fboundp=-guarded — they degrade gracefully rather than crashing.
|
||||
|
||||
The confidence score aggregates across layers using a configurable strategy
|
||||
(default: weakest link). If any binary layer rejects, the signal is rejected
|
||||
regardless of other layers. If all binary layers pass but the probabilistic layer
|
||||
returns low confidence, the signal operates at the key's reduced authorization.
|
||||
|
||||
The human can configure which layers are active per signal class:
|
||||
|
||||
#+begin_example
|
||||
AUTH_LAYERS_DEFAULT=crypto,deterministic,probabilistic
|
||||
AUTH_LAYERS_SENSOR=crypto,sensory,deterministic
|
||||
AUTH_LAYERS_CRON=crypto
|
||||
#+end_example
|
||||
|
||||
** Signal provenance chain — signing causes, not just actions
|
||||
|
||||
A sensor key captures video. A vision skill processes the frames and proposes a
|
||||
classification. A cron job re-indexes the knowledge graph based on that
|
||||
classification. A human reviews and approves. Each step in this chain has a
|
||||
different signer. Each step is signed with the signer's key. The chain is
|
||||
Merkle-linked: each signal in the chain hashes its predecessor's signature as
|
||||
part of its own payload.
|
||||
|
||||
After an incident, the chain is traceable: "The deletion happened because sensor
|
||||
#3 classified the directory as stale. Classification was signed by key #47
|
||||
(vision-skill). Sensor data was signed by key #12 (camera-feed). Sensory auth
|
||||
noted liveness failure at the sensor signal. Deterministic auth noted impossible
|
||||
transit between Cairo and Berlin. Key #12 was later revoked. The deletion signal
|
||||
is the leaf of a chain whose root is compromised at three authentication layers."
|
||||
Every intermediate step is auditable. Every signer is identifiable. Every
|
||||
authentication result is in the chain.
|
||||
|
||||
** Human as root of trust
|
||||
|
||||
The human's key signs new source keys into existence. The human's key signs
|
||||
revocation of compromised keys. Both operations produce facts in the symbolic
|
||||
index: =(:key #47 :status :revoked :revoked-by :human-key :timestamp <ut>)=.
|
||||
The fact store is the key registry. The Merkle DAG ensures the revocation is
|
||||
tamper-proof — a compromised key cannot un-revoke itself.
|
||||
|
||||
When a key is revoked, the Dispatcher rejects all signals from that key. The
|
||||
revocation propagates through the signal chain: if key #12 (sensor) is revoked,
|
||||
every signal in the chain that descended from a key #12 signature is flagged
|
||||
and re-authenticated against the remaining layers. Not deleted — flagged. The
|
||||
chain is preserved. The human decides what downstream actions to unwind.
|
||||
|
||||
** Implications for the three Laws
|
||||
|
||||
- *Third Law + layered auth*: the agent distinguishes "this sensor's key is
|
||||
valid but its liveness check failed and its claimed identity died 2 years ago"
|
||||
from "this is the human issuing =passepartout stop=." Both arrive on the pipe
|
||||
with valid cryptographic signatures. The stacked evidence — sensory, factual,
|
||||
probabilistic — triangulates the threat. The first is rejected with provenance
|
||||
at three layers. The second passes all four.
|
||||
- *Second Law + layered auth*: obedience is about the authenticated identity
|
||||
profile, not just the key that signed the signal. A valid key that probabilistically
|
||||
doesn't match Heather reduces authorization. Obedience follows confidence.
|
||||
- *First Law + layered auth*: harm through sensor compromise becomes detectable
|
||||
when sensory and deterministic layers disagree with the cryptographic layer. A
|
||||
camera key signing frames from an empty room but the deterministic layer placing
|
||||
the key's owner in another city — that's a compromised sensor, and the layered
|
||||
result makes it explicit.
|
||||
|
||||
** Integration with existing infrastructure
|
||||
|
||||
The vault stores key material. The Merkle memory stores key registry facts with
|
||||
content-addressed integrity. The Dispatcher runs gate vector 0 at priority 700 —
|
||||
before type-level checks, before predicate evaluation, before any action proceeds.
|
||||
The fact store records every key operation (creation, promotion, revocation) as a
|
||||
fact with =:provenance :key-lifecycle=.
|
||||
|
||||
No new core ASDF components. The cryptographic sub-layer is Phase 0b (~200 lines).
|
||||
The sensory sub-layer is deferred to a future vision/audio phase. The
|
||||
deterministic sub-layer is Phase 2+ (Screamer + populated fact store). The
|
||||
probabilistic sub-layer extends existing embedding infrastructure with style
|
||||
profiles as a fact-store domain.
|
||||
|
||||
* Open Questions
|
||||
|
||||
Several design questions are unresolved and should remain unresolved at this
|
||||
@@ -643,14 +1338,10 @@ and that cannot be known in advance.
|
||||
|
||||
** How does ontology refactoring work?
|
||||
|
||||
If the seed produces 50 categories from gate extraction and later experience
|
||||
shows they are wrong — wrong granularity, missing cross-cutting concerns, conflated
|
||||
categories — how are they migrated without invalidating all existing deductions
|
||||
that cross the old category boundaries? The ephemeral-first approach (no
|
||||
persistence, rebuild from scratch) is a temporary answer. Once persistence is
|
||||
committed (VivaceGraph), refactoring the category hierarchy is a schema migration
|
||||
problem that deduction provenance makes harder — every deduced fact's chain may
|
||||
cross the old category boundary. This is not addressed in the current architecture.
|
||||
This question is settled. See "Ontology Versioning — How Worldviews Change
|
||||
Without Losing Perspective" above. The category hierarchy is Merkle-hashed. Every
|
||||
fact stores its =:ontology-version=. Re-verification is heartbeat-driven.
|
||||
Worldviews are preserved, not overwritten. The shift is the artifact.
|
||||
|
||||
** What is the appropriate role of the human?
|
||||
|
||||
@@ -663,12 +1354,16 @@ and approve proposed generalizations. The balance cannot be set without experien
|
||||
|
||||
** How much Wikidata is the right amount?
|
||||
|
||||
Loading Wikidata entities referenced in the memex is the minimum. Loading all
|
||||
Wikidata entities within N hops of those references expands the graph
|
||||
exponentially. The right N depends on the memex's breadth — a memex focused on
|
||||
software engineering needs fewer hops than a memex spanning literature, history,
|
||||
philosophy, and science. The query performance and memory costs of a large
|
||||
Wikidata load are unknown.
|
||||
Query performance and memory costs are now bounded — 5 million entities ≈ 400MB
|
||||
RAM, O(1) hash lookups, domain-scoped Screamer checks. A large Wikidata load is
|
||||
a capital cost, not a recurring bill (see "Performance — Why Ontology Growth
|
||||
Doesn't Make the System Slower" above).
|
||||
|
||||
Remaining open: the right N hops from entities referenced in the memex depends on
|
||||
the memex's breadth. A software-engineering memex needs ~1 hop; a literary memex
|
||||
needs 3-4 hops (Nabokov → Kafka → expressionism → modernism → Baudelaire).
|
||||
The right value is empirical, testable, and user-specific — it cannot be set in
|
||||
the architecture.
|
||||
|
||||
** Can the symbolic engine satisfy queries from the user without LLM involvement?
|
||||
|
||||
|
||||
Reference in New Issue
Block a user