memex: update AGENTS.md, add passepartout design-decisions notes, SWOT + agora notes, bump submodules → v0.8.1
This commit is contained in:
@@ -12,7 +12,7 @@
|
||||
a. Write the test first → tangle → run → prove it FAILS (RED)
|
||||
b. Write the implementation → tangle → run → prove it PASSES (GREEN)
|
||||
c. Record both failure and success output
|
||||
5. **Reflect in org** — once tests pass, ensure the implementation is in the .org source
|
||||
5. **Reflect in org** — once tests pass, ensure the implementation is in the .org source, put each function in a separate code block.
|
||||
6. **Update literate prose** — write/update the explanatory text around the code:
|
||||
what it does, why it exists, how it connects to the rest of the system
|
||||
7. **Mark the origin TODO DONE** — in `docs/ROADMAP.org`, change the
|
||||
|
||||
868
notes/passepartout-SWOT.org
Normal file
868
notes/passepartout-SWOT.org
Normal file
@@ -0,0 +1,868 @@
|
||||
#+TITLE: Passepartout Neurosymbolic + Agora Integration — SWOT Analysis
|
||||
#+AUTHOR: Agent
|
||||
#+FILETAGS: :notes:analysis:swot:passepartout:agora:neurosymbolic:
|
||||
#+CREATED: [2026-05-09 Sat]
|
||||
|
||||
* Premise and Scope
|
||||
|
||||
This analysis assumes the engineering is possible — Screamer can be wrapped,
|
||||
VivaceGraph can persist facts, ACL2 can verify structural properties, the
|
||||
archivist can extract triples from prose with Screamer verification, and the
|
||||
note-publishing bridge to Agora can be implemented. The question is not "can it
|
||||
be built?" but "does the architecture cohere? What does it enable? What does it
|
||||
miss?"
|
||||
|
||||
* Will It Work Conceptually?
|
||||
|
||||
The short answer: yes, within a specific domain. The long answer: the boundary of
|
||||
that domain is the most important thing to get right.
|
||||
|
||||
** The architecture's core insight is correct and load-bearing
|
||||
|
||||
The central design decision — "the LLM proposes; the symbolic engine decides
|
||||
whether to accept" — is sound. It is the inverse of every existing agent
|
||||
architecture. Claude Code, OpenCode, Hermes — all of them put the LLM in the
|
||||
driver's seat and add safety as an afterthought (prompt-based guardrails that
|
||||
consume tokens and can be evaded). Passepartout inverts this: the LLM proposes
|
||||
actions and facts, but a deterministic layer of gates, constraint solvers, and
|
||||
formal verifiers decides what to admit and what to execute. This inversion is the
|
||||
correct response to the hallucination problem. You cannot eliminate hallucination
|
||||
by making the LLM better. You eliminate it by not asking the LLM to do things
|
||||
that require certainty.
|
||||
|
||||
The bootstrap mechanism — extracting 50-70 entity classes mechanically from the
|
||||
existing Dispatcher gate stack with zero new code — is genuinely elegant. It
|
||||
proves the pattern at minimal cost: code becomes facts, facts enable reasoning.
|
||||
Every new gate pattern adds to the ontology organically. This is the right way to
|
||||
start a knowledge base: not by designing a schema upfront, but by formalizing what
|
||||
the system already knows implicitly.
|
||||
|
||||
** The "one memex, two indices" architecture survives contact with reality
|
||||
|
||||
Option 4 (one memex with neural and symbolic indices over the same Org files) is
|
||||
the correct long-term architecture. The prose is the ground truth — always. The
|
||||
symbolic index is a derived view that can be thrown away and rebuilt. The neural
|
||||
index handles semantic search, associative leaps, and fuzzy matching. This
|
||||
division of labor is permanent, not transitional, because the domains they serve
|
||||
are fundamentally different kinds of knowledge.
|
||||
|
||||
The practical path — starting with Option 5 (ephemeral facts, no persistence)
|
||||
through Phases 1-4, then graduating to Option 4 with VivaceGraph persistence in
|
||||
Phase 5 — is the right sequence. It punts the serialization format problem until
|
||||
the fact language has been battle-tested. It keeps the cost of mistakes low. It
|
||||
treats the ontology as something discovered through use rather than designed
|
||||
upfront.
|
||||
|
||||
** Wikipedia's ontology WOULD give it a running start — with caveats
|
||||
|
||||
Wikidata contains approximately 100 million entities with a decade of human
|
||||
curation: type hierarchies, relations, dates, citations, disambiguation. For a
|
||||
personal memex that mentions Nabokov, /Pale Fire/, Kafka, postmodernism, and
|
||||
butterfly migration, the gate stack's 50-70 entity classes is starvation.
|
||||
Organic growth through prose extraction would take years to cover the entities in
|
||||
one person's engagement with a single novel.
|
||||
|
||||
Loading Wikidata's entity graph into the symbolic index transforms the
|
||||
archivist's job from "discover that Nabokov wrote /Pale Fire/" to "connect your
|
||||
heading to Wikidata entity Q36591." The second task is reference resolution, not
|
||||
knowledge extraction — simpler, more reliable, and in many cases doable without
|
||||
an LLM at all (string match against loaded entities). The notes claim this
|
||||
collapses the LLM's role to three thin boundaries: input translation, prose-to-
|
||||
candidate-triple for personal content, and result-to-prose formatting.
|
||||
|
||||
The caveats are real:
|
||||
|
||||
- Entity resolution (matching prose mentions to Wikidata entities) is genuinely
|
||||
hard. "Nabokov" in a diary might refer to Vladimir Nabokov (Q36591), his son
|
||||
Dmitri (Q566744), or someone else entirely. Disambiguation requires context
|
||||
that the symbolic engine doesn't have without LLM assistance.
|
||||
- Wikidata is biased toward English Wikipedia's coverage. A memex in Arabic,
|
||||
Farsi, or Amharic will find far fewer resolved entities. The "universal" in
|
||||
Wikidata is aspirational, not actual.
|
||||
- Wikidata's property graph is not a ontology in the formal sense — it's a
|
||||
collaboratively edited dataset with contradictions, gaps, and editorial wars
|
||||
frozen in time. Loading it directly into a symbolic index that expects
|
||||
consistency (Screamer checks, cardinality policies) will surface thousands of
|
||||
contradictions on ingest, many of which are Wikidata artifacts, not meaningful
|
||||
tensions.
|
||||
- N-hop expansion is unbounded. One hop from Nabokov hits hundreds of entities
|
||||
(his works, his family, his influences, his translators). Two hops hits
|
||||
thousands. Three hops hits tens of thousands. The notes say "3-4 hops" for a
|
||||
literary memex but don't estimate the entity count this implies. The claim that
|
||||
5 million entities = ~400MB is the best-case hash-table figure; a graph with
|
||||
query indices will be larger, and Prolog-like queries over millions of nodes
|
||||
are not free.
|
||||
|
||||
Still: even a partial Wikidata load with conservative hop limits would provide
|
||||
more ontology than the system could accumulate through years of organic growth.
|
||||
It is the right accelerator, and the architecture handles it correctly — Wikidata
|
||||
facts are admitted with =:provenance :wikidata= and =:policy :plural=, meaning
|
||||
they sit alongside personal facts without overriding them. Disagreements are
|
||||
surfaced, not resolved. The architecture treats Wikidata as evidence from an
|
||||
external source, not as ground truth. That's the correct posture.
|
||||
|
||||
** Cardinality policies are the right abstraction for contradiction
|
||||
|
||||
The =:singular= / =:dual= / =:plural= cardinality model is one of the most
|
||||
important ideas in these notes. Classical logic requires consistency — a
|
||||
contradiction implies everything (ex contradictione quodlibet). A constraint
|
||||
solver like Screamer also requires consistency — a contradictory constraint set
|
||||
has no solutions. But a personal memex operates across domains where the meaning
|
||||
of contradiction is fundamentally different:
|
||||
|
||||
- "rm -rf / is catastrophic" is =:singular= — there is one truth that evolves
|
||||
over time.
|
||||
- "I loved this person AND I resented them" is =:dual= — the tension IS the
|
||||
fact.
|
||||
- "Wikidata says Everest is 8848m; DBpedia says 8849m; my 2023 diary says
|
||||
8848m" is =:plural= — multiple sources disagree, and surfacing the disagreement
|
||||
with provenance is the product.
|
||||
|
||||
This is a genuinely novel contribution to knowledge representation. Most
|
||||
knowledge graphs (Wikidata, Freebase, DBpedia) don't model contradiction at all —
|
||||
they pick one value and discard the rest. Most constraint solvers reject
|
||||
contradiction as error. Passepartout's cardinality model makes contradiction a
|
||||
first-class citizen: you can query the fact that "I used to believe X until
|
||||
Tuesday, then Y," or "these three sources disagree on height," or "I hold these
|
||||
two positions in tension." The symbolic engine's job is not to decide which is
|
||||
right. It is to surface the tension with provenance.
|
||||
|
||||
This alone, if implemented correctly, would be a category-level advance over
|
||||
every existing personal knowledge management tool.
|
||||
|
||||
** Ontology versioning is the right approach to the migration problem
|
||||
|
||||
Every knowledge base eventually faces schema migration — you split =:secret-file=
|
||||
into =:crypto-secret= and =:plaintext-secret=, and now every deduction that
|
||||
crossed the old category boundary is suspect. The standard approach is batch
|
||||
UPDATE operations that overwrite the past. Passepartout's approach — the category
|
||||
hierarchy itself is a Merkle tree, every fact stores the =:ontology-version= at
|
||||
assertion time, category changes trigger re-verification rather than remapping —
|
||||
preserves all worldviews. You can query "what did I believe about secrets before
|
||||
I refined my security model?" This is not querying a fact. It is querying the
|
||||
history of your own thinking.
|
||||
|
||||
This is the kind of capability that no existing tool provides, and it flows
|
||||
directly from the architecture. If the Merkle DAG infrastructure exists (it does,
|
||||
from v0.2.0), ontology versioning is ~40 lines on top of it. The conceptual
|
||||
design is sound. The engineering appears tractable.
|
||||
|
||||
* SWOT Analysis
|
||||
|
||||
** Strengths
|
||||
|
||||
*** Architectural inversion — proposer vs decider
|
||||
|
||||
The LLM proposes. The symbolic engine decides. This is the inverse of every
|
||||
existing agent architecture, and it solves the hallucination problem at the
|
||||
architectural level rather than the prompt-engineering level. No amount of
|
||||
prompt refinement can make a probabilistic system deterministic. But a
|
||||
deterministic admission gate can make a probabilistic proposer safe.
|
||||
|
||||
*** Unified container format (Org files)
|
||||
|
||||
Org files serve as the container for human prose, Lisp source code, symbolic
|
||||
facts, and Agora Notes. One format, one toolchain, one Merkle tree, one version
|
||||
control system. If Passepartout stops existing, the data survives in plain text.
|
||||
This is the hardest commitment in the design and the most undervalued. Most agent
|
||||
architectures store memory in JSONL transcripts, vector databases, or proprietary
|
||||
formats — opaque to the human and dependent on the tool. Passepartout's memory
|
||||
IS the human's memory, in the human's format.
|
||||
|
||||
*** Provenance as product
|
||||
|
||||
Every fact carries =:grounding= (the specific Org heading), =:provenance= (who
|
||||
or what produced it), =:timestamp=, =:referenced-by=, =:contradicted-by=,
|
||||
=:superseded-by=. The =/audit= command renders the full provenance chain. In the
|
||||
broader memex, the value is not the verified fact ("this command is safe"). It
|
||||
is the provenance itself: "this claim originated in that diary entry, has been
|
||||
referenced 7 times across 4 projects, was contradicted 6 months later, and was
|
||||
revised 3 weeks after that." This is a memory prosthesis that makes your own mind
|
||||
legible to you.
|
||||
|
||||
*** Gate-to-fact bootstrap — ontology from existing code
|
||||
|
||||
The existing Dispatcher gate stack encodes an implicit ontology (categories of
|
||||
secrets, destructive commands, trusted domains, core files). The bootstrap
|
||||
extracts this mechanically — zero LLM tokens, zero human authoring, ~30 lines of
|
||||
Lisp. This proves the pattern and provides the seed ontology without any new
|
||||
infrastructure. Every new gate pattern added by the human (HITL approvals that
|
||||
become rules) extends the ontology automatically.
|
||||
|
||||
*** Self-preservation architecture
|
||||
|
||||
The Third Law implementation — quarantine on skill failure, degraded-mode
|
||||
signaling, resource monitoring, external watchdog, refusal to self-terminate —
|
||||
is individually small (~20-50 lines each) and collectively transforms
|
||||
self-preservation from a passive architectural property into an active behavior.
|
||||
The key insight: the biggest gap is not that these mechanisms are hard. It is
|
||||
that degradation is currently silent. Making it visible is cheap and high-impact.
|
||||
|
||||
*** Cardinality policies as a solution to contradiction
|
||||
|
||||
The =:singular= / =:dual= / =:plural= model is novel in knowledge representation
|
||||
and directly addresses the hardest problem in a personal memex: that
|
||||
contradiction is the product, not the error. Bayesian knowledge bases, graph
|
||||
databases, and triple stores all struggle with contradiction. Passepartout's
|
||||
model makes it a feature.
|
||||
|
||||
*** Organic ontology growth
|
||||
|
||||
Categories emerge from the system's own operation: gate patterns → gate outcomes
|
||||
→ Screamer generalizations → archivist proposals → cross-domain overlap
|
||||
detection. The ontology is a garden, not a building. This avoids the Principia
|
||||
Mathematica problem — the need to define everything upfront — by replacing
|
||||
axiomatic design with evolutionary growth. Categories that aren't used fade.
|
||||
Categories that are contradictory are pruned. Categories that emerge from
|
||||
overlapping domains are promoted. The system converges on useful granularity
|
||||
through use.
|
||||
|
||||
*** Agora as provenance layer for networked knowledge
|
||||
|
||||
A BFT-timestamped triple store is one approach, but the Merkle DAG + DID
|
||||
signatures provide a lighter-weight alternative: every fact's provenance is
|
||||
content-addressed, every author's identity is cryptographically verifiable, and
|
||||
the DAG structure enables partial replication without consensus. This is more
|
||||
tractable than full BFT and sufficient for a personal memex that needs to share
|
||||
facts across a network.
|
||||
|
||||
*** Decoupling of compute cost from knowledge base size
|
||||
|
||||
LLM tokens are minimized by design — deterministic gates cost 0 tokens, sparse-
|
||||
tree rendering keeps context at 2,000-4,000 tokens, Screamer deductions cost 0
|
||||
tokens. Adding 5 million Wikidata entities does not add a single token to any LLM
|
||||
call. The variables that actually degrade performance — context window size, LLM
|
||||
call frequency, Screamer deduction budget — are all bounded independently of
|
||||
knowledge base size. This is a structural property: the education is local, only
|
||||
the brain costs.
|
||||
|
||||
** Weaknesses
|
||||
|
||||
*** The fact language is unproven and may be insufficient
|
||||
|
||||
Triples — =(:entity :relation :value)= with provenance and grounding — is the
|
||||
current hypothesis. It is simple enough to be parseable, expressive enough to
|
||||
capture the gate stack's implicit claims, and extensible enough that Screamer can
|
||||
operate on it. But:
|
||||
|
||||
- Triples cannot naturally express temporal relations. "Was X before Y?" requires
|
||||
reification (making the relation itself an entity), which makes queries
|
||||
exponentially more complex.
|
||||
- Triples cannot express modal claims. "Should not do X unless Y" has no natural
|
||||
triple representation. Neither does "could have done X but chose Y."
|
||||
- Triples cannot express counterfactuals. "If X had happened, Y would have
|
||||
followed." These are essential for the "what if" reasoning that a personal
|
||||
memex should support.
|
||||
- Triples struggle with n-ary relations. "Nabokov wrote Pale Fire in 1962 while
|
||||
living in Montreux" is a 4-ary relation (author, work, date, location), not a
|
||||
set of independent binary relations. Breaking it into triples loses the
|
||||
connection that binds them.
|
||||
- Triples cannot express negation cleanly. "Nabokov did NOT write Doctor Zhivago"
|
||||
requires a negative fact, which in a triple store with an open-world assumption
|
||||
means "not known" and "known not" are conflated.
|
||||
|
||||
The notes acknowledge this limitation but defer it. The right granularity
|
||||
"depends on what queries the planner actually needs to make, and that cannot be
|
||||
known in advance." This is honest but unsatisfying. If triples prove insufficient,
|
||||
the entire fact store, the Screamer integration, the VivaceGraph persistence, and
|
||||
the archivist's extraction format must be redesigned. The architecture has no
|
||||
intermediate fallback between "triples" and "something more expressive."
|
||||
|
||||
*** Screamer as admission gate is untested at this scale
|
||||
|
||||
Screamer is a constraint solver with non-deterministic backtracking. Using it
|
||||
to check a candidate triple against an existing fact store is conceptually
|
||||
elegant: express the fact store as constraint variables, assert the candidate,
|
||||
check solvability. But:
|
||||
|
||||
- Screamer was designed for constraint satisfaction problems with tens to
|
||||
hundreds of variables. A fact store with millions of triples (after Wikidata
|
||||
loading) is a constraint space orders of magnitude larger than Screamer's
|
||||
design envelope.
|
||||
- The consistency check is domain-scoped (only rules from the candidate's
|
||||
=:domain= apply), but cross-domain contradictions are the most valuable kind.
|
||||
"Nabokov was born in 1899" (literature domain) should be consistent with
|
||||
"Nabokov died in 1977" (history domain). If these are separate domains, the
|
||||
check misses contradictions; if they are unified, the constraint space
|
||||
explodes.
|
||||
- Screamer's non-deterministic backtracking is worst-case exponential. The notes
|
||||
bound this via deduction budget (=SCREAMER_DEDUCTION_BUDGET_MS=) but don't
|
||||
address the admission check itself, which runs on every assertion.
|
||||
|
||||
There is a risk that Screamer works beautifully for the gate-bootstrapped seed
|
||||
(50-70 entity classes, ~200 facts) and becomes unusably slow after Wikidata
|
||||
loading (millions of facts). The transition from "works" to "doesn't" may be
|
||||
gradual and hard to detect — the system gets slower but doesn't crash,
|
||||
degrading user experience without a clear diagnostic.
|
||||
|
||||
*** The "flip" from lossy to deterministic is underspecified
|
||||
|
||||
The architecture's central narrative arc is the "flip": at some point, the non-
|
||||
lossy facts constitute a sufficient foundation that the symbolic engine can
|
||||
reverse the flow — instead of LLM extraction, the symbolic engine reads prose
|
||||
through its own lens and deduces facts directly. The sufficiency metric
|
||||
(non-lossy / total > 0.7) makes this "computable and visible to the user."
|
||||
|
||||
But:
|
||||
|
||||
- The threshold (0.7) is arbitrary. It is not derived from empirical measurement,
|
||||
information theory, or constraint satisfaction theory. It is a guess.
|
||||
- Sufficiency is domain-specific, not global. The gate stack may have 0.95
|
||||
coverage of security classifications but 0.05 coverage of literary analysis.
|
||||
A global threshold of 0.7 hides the domains where the symbolic engine is still
|
||||
effectively blind.
|
||||
- The "flip" operation itself is not defined. "Screamer reads prose through its
|
||||
own lens" — Screamer does not read prose. It operates on structured facts.
|
||||
Either the archivist still extracts triples (which is LLM work), or some new
|
||||
mechanism parses prose into triples deterministically (which is NLP at a level
|
||||
that does not exist in open-source Lisp).
|
||||
- Even after the flip, facts from the pre-flip period carry =:provenance
|
||||
:llm-proposed= and are therefore suspect. The pre-flip facts were admitted
|
||||
against fewer non-lossy facts, meaning Screamer's consistency checks were
|
||||
weaker. A fact admitted during the seed phase may be wrong but undetected
|
||||
because there were no contradicting facts at the time. Re-verifying all pre-
|
||||
flip facts against the current fact store is described as a heartbeat task but
|
||||
the cost (millions of Screamer checks) is not estimated.
|
||||
|
||||
The flip is a beautiful narrative. It may also be a mirage — the system may
|
||||
achieve high sufficiency in narrow domains (security, filesystem, coding) and
|
||||
never approach it in the broader memex (literature, personal reflection, daily
|
||||
life). If the broader memex is the use case, the flip may never happen.
|
||||
|
||||
*** The archivist's extraction cost is unaccounted
|
||||
|
||||
The archivist calls the LLM to extract triples from prose, with "a minimal prompt
|
||||
(~200 tokens)." Over a personal memex with thousands of entries — a decade of
|
||||
diary entries, hundreds of literature notes, dozens of project logs — the
|
||||
extraction cost is substantial.
|
||||
|
||||
Assume 5,000 headings, 200 tokens per heading prompt, and an LLM that returns
|
||||
~100 tokens of structured triples per heading. That's 1.5 million tokens for the
|
||||
initial extraction, plus verification tokens (Screamer checks cost 0 LLM tokens,
|
||||
but incorrect proposals generate feedback that may trigger re-extraction). At
|
||||
current API prices (~$0.15 per million input tokens for GPT-4o-mini), the cost
|
||||
is modest (~$0.25). But at scale — re-extraction after ontology changes,
|
||||
continuous extraction as new content is added, extraction for all incoming Agora
|
||||
Notes — the cost accumulates.
|
||||
|
||||
More importantly, the extraction latency is human-noticeable. 5,000 headings at
|
||||
1 second per LLM call is ~1.4 hours of extraction time. The system needs to
|
||||
either batch-extract on startup (making cold starts slow) or extract lazily on
|
||||
first query (making first queries slow). Neither is ideal.
|
||||
|
||||
The notes trumpet the token savings from deterministic gates and Screamer
|
||||
deductions (valid — those cost 0 tokens) but the archivist's extraction cost is
|
||||
the system's single largest recurring LLM expense, and it is mentioned only in
|
||||
passing.
|
||||
|
||||
*** The Agora integration is clean in theory, undefined in practice
|
||||
|
||||
The "Passepartout IS the PDS" claim is elegant: the =memory-object= struct IS
|
||||
the Note format, the Merkle DAG IS the Key Event Log, the fact store IS the
|
||||
reputation system. But:
|
||||
|
||||
- An Agora PDS needs to serve HTTP APIs for thin clients. The daemon speaks a
|
||||
framed TCP protocol over a local port. Extending it to serve HTTPS with
|
||||
DIDComm endpoints, subscription management, and Relay push/pull is a
|
||||
substantial engineering effort.
|
||||
- The PDS needs to manage encrypted storage — client-side encrypted content that
|
||||
the PDS itself cannot read. Passepartout's vault stores credentials with
|
||||
integrity hashes but does not currently manage per-Note encryption with
|
||||
audience-specific keys.
|
||||
- The Relay Network is described as an intelligent communication backbone with
|
||||
pub/sub routing. Passepartout has no Relay implementation, no Relay-facing API,
|
||||
and no subscription management beyond its own event orchestrator.
|
||||
- Agora's contract system (SCAL contracts, HODL invoices, arbitration tiers)
|
||||
requires state machines and Lightning Network integration that Passepartout
|
||||
has no primitives for.
|
||||
- The "Passepartout IS the PDS" vision conflates two things: the data model
|
||||
(Org files = Notes) and the infrastructure (a process that serves a network
|
||||
protocol). The data model unification is clean and right. The infrastructure
|
||||
unification implies Passepartout grows from a local agent to a network server
|
||||
— a significant architectural expansion that the notes treat as a ~40-line
|
||||
utility.
|
||||
|
||||
*** No adversarial model
|
||||
|
||||
The notes describe layered authentication (crypto, sensory, deterministic,
|
||||
probabilistic) and type-level gates as structural safety. They do not describe
|
||||
an adversarial model:
|
||||
|
||||
- What stops a malicious Agora Note from containing 100,000 triples that flood
|
||||
the fact store?
|
||||
- What stops a DID from publishing Notes that deliberately inject contradictions
|
||||
to force Screamer into exponential backtracking?
|
||||
- What stops a compromised sensor key from signing valid sensor data that is
|
||||
adversarially crafted (e.g., video frames designed to trigger specific vision
|
||||
model false positives)?
|
||||
- What stops a spam DID from creating millions of Personas and flooding the
|
||||
user's incoming Notes directory?
|
||||
|
||||
The resource monitor (Phase 1a) handles storage pressure generically. The
|
||||
quarantine system handles individual DIDs flagged for spam. But none of these
|
||||
are adversary-aware — they react to symptoms (disk full, error rate high) rather
|
||||
than anticipating attack patterns. An adversarial model would identify these
|
||||
vectors and design mitigations specifically. The notes describe a system that
|
||||
works in a cooperative environment, not an adversarial one.
|
||||
|
||||
*** The self-repair criterion creates a two-tier architecture
|
||||
|
||||
The AGENTS.md rule — "default: everything is a skill" — means the symbolic
|
||||
engine (Screamer, VivaceGraph, fact store, archivist, ACL2, planner) is all
|
||||
skills, not core. This is correct for the self-repair criterion: a corrupted
|
||||
skill degrades the agent but doesn't kill it. A corrupted core file kills the
|
||||
brainstem.
|
||||
|
||||
But it creates a tension: the symbolic engine IS the reasoning layer that would
|
||||
diagnose and repair a corrupted skill. If the fact store itself is corrupted
|
||||
(impossible facts, inconsistent cardinality, broken Merkle chains), the engine
|
||||
that detects corruption is the engine that is corrupted. The system needs a
|
||||
"repair from below" path — a minimal core that can purge and rebuild the symbolic
|
||||
index without depending on the symbolic index. This path exists (the fact store
|
||||
is ephemeral in Phase 1-4 and rebuildable from prose in Phase 5+) but is not
|
||||
exercised automatically. A corruption in the symbolic engine requires human
|
||||
detection and manual rebuild — the exact problem the self-repair criterion was
|
||||
designed to avoid.
|
||||
|
||||
** Opportunities
|
||||
|
||||
*** A memory prosthesis that makes your own mind legible
|
||||
|
||||
The symbolic index, when populated and queried, answers questions that no
|
||||
existing tool can:
|
||||
|
||||
- "What did I believe about monorepos in 2023, and how has that changed?"
|
||||
- "Which of my diary entries contradict each other?"
|
||||
- "What entities in my memex have no connection to any other entity?"
|
||||
- "Show me everything I've written about Nabokov, organized by when I wrote it,
|
||||
what I was reading at the time, and what I concluded."
|
||||
- "Which of my project plans reference security assumptions that I later changed?"
|
||||
- "What did I think about this topic, and why did I change my mind?"
|
||||
|
||||
These are not information retrieval queries. They are self-knowledge queries.
|
||||
They require provenance chains, temporal versioning, contradiction surfacing, and
|
||||
cross-domain linkage — all of which the architecture provides as first-class
|
||||
capabilities. If this works, it transforms the memex from a searchable archive
|
||||
into a thinking partner that knows the history of your thoughts.
|
||||
|
||||
*** Deterministic reasoning as a moat
|
||||
|
||||
Every competitor agent system (Claude Code, OpenCode, OpenClaw, Hermes, Cognee,
|
||||
Mem0) uses neural-only reasoning. They are all vulnerable to the same failure
|
||||
mode: the LLM hallucinates a fact or an action, and there is no second system to
|
||||
catch it. Their safety is heuristic. Their memory is flat. Their reasoning is
|
||||
unprovable.
|
||||
|
||||
Passepartout's architectural bet — a symbolic engine that verifies, deduces, and
|
||||
audits — creates a category difference, not a performance difference. If the bet
|
||||
pays off, Passepartout is not "a better AI agent." It is a different kind of
|
||||
system — one whose reasoning is provable, whose memory is content-addressed, and
|
||||
whose knowledge accumulates through deduction rather than re-prompting.
|
||||
|
||||
This is a genuine moat. It cannot be replicated by adding a better system prompt
|
||||
or a larger context window. It requires building the ontology, the constraint
|
||||
solver, the fact store, and the provenance tracker — work that takes years and
|
||||
cannot be shortcut by spending more on inference.
|
||||
|
||||
*** Agora as the first sovereign agent network
|
||||
|
||||
If Passepartout serves as the PDS and an Agora Persona, then AI agents can:
|
||||
|
||||
- Publish verified outputs as signed Notes with cryptographic provenance.
|
||||
Readers know the agent produced the output, not a human impersonating the
|
||||
agent.
|
||||
- Accept invocation Notes from other persona owners. "Please analyze this
|
||||
contract and publish your findings." The agent receives the request as an
|
||||
Agora Note, processes it, signs the response, and publishes it.
|
||||
- Build reputation through auditable chains of signed work products, not through
|
||||
self-reported claims.
|
||||
- Participate in the compute marketplace as both consumer and provider.
|
||||
- Maintain sovereign identity — the agent's DID is independent of any platform,
|
||||
any provider, any human account.
|
||||
|
||||
This is not a chatbot on a messaging platform. It is an autonomous entity on a
|
||||
decentralized network, with cryptographic identity, verifiable provenance, and
|
||||
economic agency. If Agora reaches even Order 1 (the first 1,000 users),
|
||||
Passepartout agents become some of the most capable participants on the network.
|
||||
|
||||
*** The 10-80-10 ratio for coding is genuinely achievable
|
||||
|
||||
For a coding agent — the domain that Passepartout currently operates in — the
|
||||
10-80-10 ratio is plausible. The existing Dispatcher already verifies every
|
||||
action deterministically. Adding Screamer for consistency checking, VivaceGraph
|
||||
for dependency queries, and ACL2 for structural verification would shift the
|
||||
ratio from the current ~95-5-0 (neural-gate-symbolic) toward 50-40-10 in the
|
||||
near term and potentially 10-80-10 in the long term.
|
||||
|
||||
The bootstrapped gate facts already cover file classifications, command safety,
|
||||
path protections, and tool permissions — the core categories for a coding agent.
|
||||
The archivist's extraction from project files would add dependency information,
|
||||
test coverage, and code structure facts. The planner could reason about
|
||||
refactoring order, dependency chains, and safety constraints deterministically.
|
||||
This is the domain where the symbolic engine provides the most immediate value,
|
||||
and it is the domain Passepartout already operates in.
|
||||
|
||||
*** Wikidata as an entity backbone unlocks cross-domain reasoning
|
||||
|
||||
Without Wikidata, the symbolic index for a general-knowledge memex is a sparse
|
||||
set of personal facts with no connecting structure. With Wikidata, the entity
|
||||
graph is pre-structured. The system can answer:
|
||||
|
||||
- "What does my memex say about Nabokov that Wikidata doesn't?"
|
||||
- "Where does my memex disagree with Wikidata?"
|
||||
- "What entities in my memex have no Wikidata counterpart?" (These are the
|
||||
personal, novel, or subjective entities that are the most valuable.)
|
||||
- "Show me the intersection of my literary interests (from diary) with Wikidata's
|
||||
influence graph — which authors I read influenced each other in ways I haven't
|
||||
written about?"
|
||||
|
||||
These are cross-domain queries that require both the personal memex (for what
|
||||
the user knows) and Wikidata (for what the world knows). Neither alone can
|
||||
answer them. Together, they enable a kind of knowledge synthesis that no existing
|
||||
tool provides.
|
||||
|
||||
*** Ontology versioning enables "what-if" reasoning about one's own thinking
|
||||
|
||||
The ability to query across worldviews — "what did I believe before I changed my
|
||||
security model?" — is a capability that has no analog in any existing tool. It
|
||||
transforms the memex from a static archive into a dynamic record of intellectual
|
||||
evolution. Combined with the temporal awareness system (Phase 0c), the system
|
||||
could surface correlations: "You changed your mind about monorepos two weeks
|
||||
after reading this article, which you bookmarked on this date, and one week
|
||||
before starting this project that uses a monorepo structure." The provenance
|
||||
chain IS the narrative of your thinking.
|
||||
|
||||
*** Contract-level pre-arbitration reduces the cost of decentralized commerce
|
||||
|
||||
Agora's Tier 0 Arbitrator — a local AI that provides evidence summaries before
|
||||
human arbitration — is a genuinely useful role for a neurosymbolic system.
|
||||
|
||||
- "Contract CID X references arbitrator DID Y. DID Y is active. Verified."
|
||||
- "All parties have signed. The HODL invoice is locked. Verified."
|
||||
- "The buyer's claim of non-delivery is supported by 3 signed messages with
|
||||
timestamps after the delivery deadline."
|
||||
- "The seller's proof-of-delivery field is empty. No QR scan recorded."
|
||||
|
||||
Each check is a Screamer query against the contract-lifecycle domain. The results
|
||||
are a plist, not a ruling. Both parties see the same evidence summary before
|
||||
escalating. This makes Level 1 arbitration faster (arbitrators receive
|
||||
pre-processed evidence bundles), cheaper (no human time spent on trivial
|
||||
verification), and more transparent (both parties see the same machine-generated
|
||||
summary).
|
||||
|
||||
This is not AI judging. This is AI preparing the docket. The distinction is
|
||||
important and defensible.
|
||||
|
||||
*** Self-auditing agents could transform AI safety discourse
|
||||
|
||||
If Passepartout can answer =/audit= for any action or fact — showing the full
|
||||
provenance chain, every gate that approved it, every fact that supported it,
|
||||
every alternative that was considered — then AI safety moves from "trust us, we
|
||||
tested it" to "here is the audit trail, verify it yourself."
|
||||
|
||||
This is the transparency that every AI safety framework calls for and none
|
||||
delivers. It is possible because the architecture records provenance as a
|
||||
first-class operation, not as an after-the-fact log. The provenance is the
|
||||
operating system, not a logging layer.
|
||||
|
||||
*** The memex + Agora combination could be a new kind of social network
|
||||
|
||||
Current social networks (Twitter, Facebook, Reddit) separate the person from
|
||||
their knowledge. You are a profile with posts. Your posts are isolated units
|
||||
without connection to your broader intellectual life.
|
||||
|
||||
A Passepartout-powered Agora Persona would publish Notes that are grounded in
|
||||
the memex: "Here is my analysis of /Pale Fire/, drawn from diary entries across
|
||||
three years, annotated with Wikidata context, and verified against my existing
|
||||
literary framework." The Note is cryptographically signed, carrying provenance
|
||||
back to the specific Org headings that informed it. Readers see not just the
|
||||
conclusion but the intellectual scaffolding that produced it.
|
||||
|
||||
This is not a "post." It is a publication — a knowledge artifact with verifiable
|
||||
provenance, auditable reasoning, and cryptographic identity. If this becomes the
|
||||
norm, it raises the standard for public discourse from "this is my opinion" to
|
||||
"this is my opinion, here is the evidence, here is how it evolved, here is who
|
||||
verified it."
|
||||
|
||||
** Threats
|
||||
|
||||
*** The ontology problem may be harder than anticipated
|
||||
|
||||
The notes are honest about this: "Whitehead's Principia Mathematica took over
|
||||
300 pages to define the logical foundations before it could prove that 1+1=2."
|
||||
Passepartout's domain is narrower (coding + personal knowledge) but the
|
||||
ontology problem is the same category of problem. Every entity class must be
|
||||
defined. Every relation must have clear semantics. Every inference rule must be
|
||||
justified.
|
||||
|
||||
The gate-to-fact bootstrap provides 50-70 entity classes — enough for a coding
|
||||
agent. But the broader memex contains orders of magnitude more entity types:
|
||||
people, places, works, concepts, events, emotions, aesthetic judgments,
|
||||
professional skills, personal projects, temporal patterns. Defining these as
|
||||
triples with clear semantics is genuine intellectual work that no amount of
|
||||
engineering can shortcut.
|
||||
|
||||
The risk is not that it's impossible. It's that it's slow — slow enough that
|
||||
the system never achieves the density of facts needed for the "flip" in the
|
||||
broader memex. The coding domain may reach sufficiency in months. The literary
|
||||
domain may take years. The daily-reflection domain may never cross the
|
||||
threshold because the facts involved (mood, insight, aesthetic experience) are
|
||||
not formalizable as triples.
|
||||
|
||||
*** Screamer may not scale to the fact store size
|
||||
|
||||
The constraint satisfaction approach to consistency checking is elegant for a
|
||||
seed fact set of hundreds of triples. It is unproven for millions of triples
|
||||
(after Wikidata loading + years of personal extraction). The domain-scoping
|
||||
strategy (Screamer only checks facts from the candidate's =:domain=) bounds the
|
||||
constraint space, but the most valuable consistency checks are cross-domain:
|
||||
|
||||
- "You classified this file as public in your project notes but the gate stack
|
||||
classifies it as secret." (project domain vs security domain)
|
||||
- "You wrote that Nabokov influenced Kafka, but Wikidata says Kafka died before
|
||||
Nabokov published his first novel." (literature domain vs Wikidata domain)
|
||||
- "You planned to use this dependency, but the dependency's license changed in
|
||||
a way that conflicts with your project's license." (project domain vs legal
|
||||
domain)
|
||||
|
||||
If cross-domain checks are disabled for performance, the most valuable
|
||||
contradictions are never detected. If they are enabled, the constraint space
|
||||
explodes. There is no obvious sweet spot.
|
||||
|
||||
*** Wikidata quality may undermine trust in the symbolic index
|
||||
|
||||
If Wikidata facts are admitted with =:policy :plural= and the user sees
|
||||
thousands of contradictions between Wikidata and their personal memex, the
|
||||
symbolic index may feel less trustworthy, not more. "Wikidata says Mount Everest
|
||||
is 8848m. DBpedia says 8849m. Your 2023 diary says 8848m. These three sources
|
||||
disagree on height." This is correct behavior — surfacing disagreement with
|
||||
provenance — but it may be overwhelming. The user wanted a knowledge base, not
|
||||
a disagreement engine.
|
||||
|
||||
The trust problem is compounded by Wikidata's editorial biases. Wikidata
|
||||
reflects the biases of Wikipedia editors: English-language dominance, Western
|
||||
epistemological frameworks, systemic underrepresentation of non-Western
|
||||
knowledge. A memex in Arabic that references Islamic philosophy, Egyptian
|
||||
history, or African literature will find Wikidata's coverage thin, biased, or
|
||||
absent. The symbolic index would dutifully surface these gaps — "your memex
|
||||
mentions 47 entities with no Wikidata counterpart" — but it cannot fill them.
|
||||
|
||||
*** LLM cost and latency may prevent the archivist from keeping up
|
||||
|
||||
If the user writes a diary entry every day, the archivist must extract triples
|
||||
from each new heading. If the extraction takes 1-3 seconds per heading, it's
|
||||
background noise. But if the user imports 500 old diary entries, or the
|
||||
archivist needs to re-extract after an ontology change, or Agora Notes arrive in
|
||||
bulk from multiple follows, the extraction queue grows faster than it drains.
|
||||
|
||||
The notes describe extraction as a background task triggered by heartbeat, but
|
||||
they don't specify the extraction rate limit. An unbounded queue with no rate
|
||||
limit would consume the LLM budget. A bounded queue would fall behind. A lazy
|
||||
extraction strategy (extract on first query) would make first queries slow.
|
||||
A batch extraction on startup would make cold starts slow.
|
||||
|
||||
The archivist's throughput is gated by LLM API rate limits, token costs, and
|
||||
inference latency. These are external constraints that the architecture cannot
|
||||
eliminate. The symbolic engine can reduce LLM calls for reasoning; it cannot
|
||||
reduce LLM calls for extraction from prose.
|
||||
|
||||
*** Agora may never reach network effects
|
||||
|
||||
Agora faces the cold start problem that every decentralized social protocol
|
||||
faces: users won't join without content, creators won't post without users. The
|
||||
bootstrapping strategy (managed service → hybrid → full decentralization,
|
||||
targeting niche communities first) is well-articulated but its success depends
|
||||
on execution in a market where Mastodon, Bluesky, Nostr, and Farcaster are
|
||||
already competing for the same users.
|
||||
|
||||
If Agora doesn't reach even Order 1 (1,000 users), the PDS integration is
|
||||
academic. Passepartout's DID identity, DIDComm gateway, Note signing, and
|
||||
contract verification are all infrastructure for a network that doesn't exist.
|
||||
The symbolic engine still works locally — provenance tracking, contradiction
|
||||
surfacing, and deduction are all valuable without Agora. But the network effects
|
||||
that make Agora a transformative platform — reputation, contracts, marketplaces,
|
||||
collective governance — require a living network.
|
||||
|
||||
The risk is asymmetric: Passepartout invests significant engineering in Agora
|
||||
integration that provides zero value if Agora fails to launch.
|
||||
|
||||
*** Complexity may prevent adoption
|
||||
|
||||
Passepartout is already a complex system: a Lisp daemon, a terminal UI, a skill
|
||||
engine, a gate stack, multiple LLM backends, a Merkle memory system, and an
|
||||
event orchestrator. Adding a fact store, a constraint solver, a graph database,
|
||||
a theorem prover, an archivist, a planner, and an Agora PDS makes it more
|
||||
complex, not less.
|
||||
|
||||
The target user — someone who wants a personal AI assistant that works offline —
|
||||
may not want or need any of this. They want the TUI to work, the LLM to be fast,
|
||||
and the files to stay safe. The neurosymbolic engine is infrastructure for a use
|
||||
case (lifelong personal knowledge management with verifiable provenance) that
|
||||
most users do not yet know they have.
|
||||
|
||||
The risk is that Passepartout builds a cathedral for a congregation of one — a
|
||||
system that is architecturally brilliant and practically unused because the
|
||||
complexity-to-value ratio is too high for anyone except the author.
|
||||
|
||||
*** The self-repair criterion may not hold under adversarial conditions
|
||||
|
||||
The architecture assumes that skills can fail gracefully (fboundp guards, hash
|
||||
table fallbacks, degraded mode). It does not assume that a skill can be
|
||||
adversarially corrupted to behave correctly while producing wrong results. A
|
||||
compromised archivist that extracts plausible but false triples, a compromised
|
||||
Screamer that passes all consistency checks, a compromised VivaceGraph that
|
||||
returns query results from a parallel graph — these are "living" skills that
|
||||
would pass integrity checks and still poison the symbolic index.
|
||||
|
||||
The type-level gates prevent the LLM from modifying gate code. They do not
|
||||
prevent a compromised skill (loaded by a trusted human, or corrupted on disk by
|
||||
a separate process) from operating normally while subtly wrong. The integrity
|
||||
monitoring (Phase 0) catches disk-level corruption through hash checks. It does
|
||||
not catch semantic corruption — a skill that is byte-for-byte identical to the
|
||||
known-good version but loaded with a malicious input that triggers a latent bug.
|
||||
|
||||
This is not a vulnerability unique to Passepartout. It is a vulnerability in
|
||||
every system where components trust each other. But Passepartout's architecture
|
||||
amplifies the risk because the symbolic engine is supposed to be the trustworthy
|
||||
layer — the component that verifies the LLM's output. If the symbolic engine
|
||||
itself is compromised, the system has no higher court of appeal.
|
||||
|
||||
*** The 10-80-10 ratio may create false confidence
|
||||
|
||||
If the sufficiency metric shows "71% non-lossy, threshold 70%, mode: AUTO-
|
||||
EXTRACTION," the user may assume the system is trustworthy. But sufficiency is
|
||||
global — it aggregates across all domains. The system may have 95% sufficiency
|
||||
in the security domain and 5% sufficiency in the literary domain, averaging to
|
||||
71%. The auto-extraction switch would bypass the LLM for all categories with
|
||||
sufficient coverage, but the threshold is global, not per-domain. A literary
|
||||
query would hit the symbolic index that has "sufficient" coverage globally but
|
||||
insufficient coverage for literature.
|
||||
|
||||
The notes describe domain-scoped Screamer checks but not domain-scoped
|
||||
sufficiency. A global sufficiency metric that triggers a global extraction mode
|
||||
change is the wrong granularity. Per-domain sufficiency, with per-domain
|
||||
extraction mode, would be more complex but more honest. The architecture as
|
||||
described has the simpler, more dangerous version.
|
||||
|
||||
** Summary Matrix
|
||||
|
||||
| | Positive | Negative |
|
||||
|-----------+---------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------|
|
||||
| INTERNAL | S: Architectural inversion, unified Org format, provenance as product, | W: Unproven fact language, Screamer scale unverified, extraction cost hidden, |
|
||||
| | cardinality model, gate-to-fact bootstrap, self-preservation, organic ontology, | flip underspecified, adversarial model absent, self-repair tension, |
|
||||
| | Wikidata as accelerator, decoupled compute cost | Agora integration scope undefined, per-domain sufficiency missing |
|
||||
|-----------+---------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------|
|
||||
| EXTERNAL | O: Memory prosthesis, deterministic moat, sovereign agent network, | T: Ontology may be harder than expected, Screamer may not scale, |
|
||||
| | 10-80-10 for coding achievable, Wikidata cross-domain queries, | Wikidata quality/trust, LLM extraction bottleneck, Agora network effects, |
|
||||
| | ontology versioning, contract pre-arbitration, self-auditing safety, | complexity-to-adoption ratio, adversarial semantic corruption, |
|
||||
| | knowledge-based social network | false confidence from global sufficiency metric |
|
||||
|
||||
* What This Unlocks
|
||||
|
||||
** Technologically
|
||||
|
||||
The neurosymbolic engine, if built, would be the first AI system where:
|
||||
|
||||
1. *Reasoning is auditable.* Every conclusion carries a provenance chain back to
|
||||
its premises. The =/audit= command renders the full inference tree — every
|
||||
fact, every deduction, every gate outcome — in human-readable form.
|
||||
|
||||
2. *Knowledge accumulates deterministically.* Screamer deductions and gate
|
||||
outcomes generate new facts without any LLM involvement. The knowledge base
|
||||
grows from the system's own operation, not from re-prompting the LLM.
|
||||
|
||||
3. *Memory is content-addressed.* Every fact is a Merkle node. Every version
|
||||
chain is tamper-proof. Rollback is atomic. The storage format is proven
|
||||
correct before it is committed to disk.
|
||||
|
||||
4. *Safety is provable, not empirical.* Type-level gates make self-modification
|
||||
structurally impossible. ACL2 proves that the rule set has no contradictions.
|
||||
The dispatcher doesn't "try" to be safe — it is safe by construction.
|
||||
|
||||
5. *The human and the machine share the same format.* Org files for both. No
|
||||
hidden database. No import/export step. The agent's memory IS the human's
|
||||
memory.
|
||||
|
||||
These five properties, together, define a new category of AI system: the
|
||||
*sovereign reasoning agent*. Not sovereign in the blockchain sense (decentralized
|
||||
by consensus), but sovereign in the personal sense: the agent runs on your
|
||||
hardware, reasons with your knowledge, and proves its reasoning to you.
|
||||
|
||||
** Socially
|
||||
|
||||
If the technical vision succeeds and Agora reaches network effects, the
|
||||
combination unlocks:
|
||||
|
||||
1. *Verifiable public discourse.* Every published claim carries provenance back
|
||||
to source material. "I read this, I thought this, I changed my mind on this
|
||||
date, here is the evidence." Public discourse shifts from "competing opinions"
|
||||
to "competing evidence chains." The quality floor rises because claims without
|
||||
provenance are visibly weaker than claims with provenance.
|
||||
|
||||
2. *Sovereign AI agents with legal and economic personhood.* A Passepartout
|
||||
agent with an Agora Persona can own assets, enter contracts, earn reputation,
|
||||
and face consequences for failure. This is not a chatbot. It is an autonomous
|
||||
entity with cryptographic identity, verified provenance, and economic agency
|
||||
— more like a corporation than a tool.
|
||||
|
||||
3. *Self-auditing AI safety.* Every action the agent takes is traceable. Every
|
||||
gate decision is recorded. Every fact that informed a decision is queryable.
|
||||
AI safety moves from "trust us" to "here is the audit trail." This is the
|
||||
transparency that every AI ethics framework calls for.
|
||||
|
||||
4. *A personal knowledge economy.* If your memex can publish Notes as Agora
|
||||
content, your intellectual work — your analyses, your syntheses, your
|
||||
discoveries — becomes a publishable, attributable, monetizable asset. Not
|
||||
through advertising or subscriptions, but through direct value exchange:
|
||||
Lightning payments for content access, contract work for your verified
|
||||
expertise, reputation that follows your Persona across platforms.
|
||||
|
||||
5. *Collective intelligence without centralized control.* If multiple
|
||||
Passepartout agents share facts through Agora Notes, the collective symbolic
|
||||
index represents the verified, provenanced knowledge of a community — not the
|
||||
averaged opinion of a crowd, but the auditable intersection of independently
|
||||
verified claims. This is Wikipedia without the editorial board, science
|
||||
without the journal gatekeepers, journalism without the corporate owners.
|
||||
|
||||
6. *A memory prosthesis that outlives the individual.* A memex with a decade of
|
||||
diary entries, linked to Wikidata's entity graph, with Screamer deductions
|
||||
surfacing patterns and contradictions, with ontology versioning preserving
|
||||
intellectual evolution — this is not a knowledge management tool. It is an
|
||||
externalized, queryable, auditable record of a life's thinking. It is what
|
||||
Vannevar Bush imagined in 1945: "an enlarged intimate supplement to one's
|
||||
memory."
|
||||
|
||||
* Conclusion
|
||||
|
||||
The architecture described in these notes is genuinely novel. Not incrementally
|
||||
novel — most agent architectures are variations on "LLM + tools + prompt-based
|
||||
safety." Passepartout's neurosymbolic vision is categorically different: an
|
||||
inversion where the deterministic layer judges the probabilistic layer, where
|
||||
facts carry provenance chains, where contradiction is a feature rather than an
|
||||
error, and where the user's Org files are the single source of truth for both
|
||||
human and machine.
|
||||
|
||||
The largest risk is not that the architecture is wrong. It is that the ontology
|
||||
problem — the genuine difficulty of defining what a "fact" is, what relations
|
||||
are, what categories are useful, and how they evolve — is harder than the notes
|
||||
anticipate, and that the system spends years in a partially-working state where
|
||||
the symbolic index is too sparse to be useful but too entangled to be discarded.
|
||||
|
||||
The second-largest risk is that Agora never reaches the network effects needed
|
||||
to make the PDS integration valuable beyond a local experiment, and that the
|
||||
engineering investment in DIDComm gateways, Note signing, contract verification,
|
||||
and Relay integration produces infrastructure for a network that doesn't exist.
|
||||
|
||||
The opportunity is equally large: a system that makes your own mind legible to
|
||||
you, that proves its reasoning rather than asserting it, that accumulates
|
||||
knowledge across sessions through deduction rather than re-prompting, and that
|
||||
publishes verified, provenanced knowledge to a decentralized network. If this
|
||||
works — even partially, even slowly — it is a category-level advance over every
|
||||
existing agent architecture and every existing personal knowledge management
|
||||
tool.
|
||||
|
||||
The notes are a map of territory that no one has walked. The territory is real.
|
||||
The map is detailed enough to navigate by. Whether the journey completes depends
|
||||
on whether the ontology problem yields to engineering, and whether the user —
|
||||
the one human whose memex this serves — finds value in the partial system well
|
||||
before the full vision materializes.
|
||||
314
notes/passepartout-agora.org
Normal file
314
notes/passepartout-agora.org
Normal file
@@ -0,0 +1,314 @@
|
||||
#+TITLE: Passepartout-Agora Integration — Unified Container Format
|
||||
#+AUTHOR: Agent
|
||||
#+FILETAGS: :notes:integration:agora:passepartout:design:
|
||||
#+CREATED: [2026-05-08 Fri]
|
||||
|
||||
* Summary
|
||||
|
||||
Org files and Agora Notes are the same container. Both are text with headers,
|
||||
tags, properties, and prose body. Both contain zero or more symbolic facts
|
||||
extractable by Passepartout's archivist. The only difference is that an Agora
|
||||
Note carries a DID signature and a CID for cryptographic provenance on the
|
||||
network. An Org file without a signature is a local Note. A signed Org file
|
||||
pushed to the PDS is an Agora Note.
|
||||
|
||||
Passepartout's =memory-object= struct serves as the storage format for both.
|
||||
The archivist extracts facts from one unified store. Authorship is distinguished
|
||||
by provenance, not location.
|
||||
|
||||
* The Unification
|
||||
|
||||
** Org files and Notes are the same container
|
||||
|
||||
| Property | Org file (local) | Agora Note (network) |
|
||||
|------------------+------------------------------+-------------------------------------|
|
||||
| Format | Org-mode text | Org-mode text |
|
||||
| Identity | Merkle hash (=memory-object=) | CIDv1 (same hash) |
|
||||
| Contains facts | Yes (archivist extracts) | Yes (archivist extracts) |
|
||||
| Author identity | Implicit (file in =~/memex/=) | Explicit (DID signature in =proof=) |
|
||||
| Access control | Filesystem permissions | =access_control= flags |
|
||||
| Routing | N/A (local disk) | =notify= + =references= + Relay |
|
||||
| Ephemeral | No | =ephemeral_duration= |
|
||||
| Behavioral flag | Implicit (convention) | =is_feed= field |
|
||||
|
||||
The structure converges in a single plist:
|
||||
|
||||
#+begin_src lisp
|
||||
(:cid <merkle-hash> ;; Identity across local and network
|
||||
:title <string> ;; Org headline title
|
||||
:content <org-text> ;; Full Org body (headings, prose, source blocks)
|
||||
:owner <did-or-nil> ;; For Agora Notes: the signing Persona DID. nil for local
|
||||
:proof <plist-or-nil> ;; ( :editor <did> :signature <bytes> )
|
||||
;; Agora behavioral flags (nil for local files)
|
||||
:is-feed <boolean-or-nil>
|
||||
:access-control <did-list-or-nil>
|
||||
:notify <did-list-or-nil>
|
||||
:references <cid-list-or-nil>
|
||||
:reply-to <cid-or-nil>
|
||||
:thread-root <cid-or-nil>
|
||||
:ephemeral-duration <integer-or-nil>
|
||||
;; Passepartout metadata
|
||||
:created-at <timestamp>
|
||||
:tags <string-list> ;; Org tags
|
||||
:properties <plist> ;; Org property drawer
|
||||
:extracted-facts <fact-list>) ;; Populated by archivist after extraction
|
||||
#+end_src
|
||||
|
||||
** Facts are extracted from both, identically
|
||||
|
||||
An Org file in =~/memex/literature/pale-fire.org= and an Agora Note from
|
||||
=did:agora:heather= with =:references <post-CID>= both contain prose. The
|
||||
archivist scans both, proposes triples via the LLM, verifies via Screamer,
|
||||
and admits facts to the symbolic index. The facts carry different provenance:
|
||||
|
||||
#+begin_src lisp
|
||||
;; Extracted from local Org file
|
||||
(:entity :pale-fire :relation :theme :value :unreliable-narration
|
||||
:provenance :local-prose :grounding "heading-42")
|
||||
|
||||
;; Extracted from Agora Note
|
||||
(:entity :kafka :relation :influence :value :nabokov
|
||||
:provenance :agora-note :grounding <incoming-note-cid> :author "did:agora:heather")
|
||||
#+end_src
|
||||
|
||||
No new extraction path. The archivist already walks containers and extracts
|
||||
facts. The container type determines the provenance tag and the grounding
|
||||
identifier (local heading ID vs. Note CID).
|
||||
|
||||
** The memex distinguishes provenance by location, not format
|
||||
|
||||
Incoming Agora Notes arrive at =~/memex/social/notes/<did>/<cid>.org=.
|
||||
The directory structure encodes authorship:
|
||||
|
||||
| Path | Meaning |
|
||||
|---------------------------------------------------+------------------------------------|
|
||||
| ~/memex/daily/ | Local diary entries |
|
||||
| ~/memex/projects/ | Local project files |
|
||||
| ~/memex/literature/ | Local reading notes |
|
||||
| ~/memex/notes/ | Local design and thinking notes |
|
||||
| ~/memex/social/notes/<did>/<cid>.org | Incoming Notes from other DIDs |
|
||||
| ~/memex/social/outbox/<cid>.org | Outgoing Notes signed by the user |
|
||||
|
||||
The archivist scans all directories. Local files produce facts with
|
||||
=:provenance :local-prose=. Agora files produce facts with =:provenance
|
||||
:agora-note= + =:author <did>=. The symbolic index maps the provenance
|
||||
to the cardinality policy: local prose is =:plural= (the human's own notes —
|
||||
multiple interpretations coexist). Agora Notes are =:plural= by default (the
|
||||
author's claim, not authoritative over local facts). Agora Notes can be promoted
|
||||
to =:singular= or =:dual= if they carry cryptographic proofs of specific claims.
|
||||
|
||||
** Publishing Org content as Agora Notes
|
||||
|
||||
When the user wants to publish a diary entry, project log, or literary note as
|
||||
an Agora Note, the operation is:
|
||||
|
||||
1. Select the Org heading or file.
|
||||
2. Compute the Merkle hash (=memory-object= hash → CIDv1).
|
||||
3. Sign with the user's Persona DID key (Phase 0b key registry).
|
||||
4. Set Agora flags: =:is-feed= t/nil, =:access-control= [], =:references= [previous-note-cid].
|
||||
5. Push to the PDS. The Note is an Org plist with a DID signature.
|
||||
6. The PDS stores and relays it. The Note remains in =~/memex/social/outbox/= with its CID.
|
||||
|
||||
All of this is a single function: =(note-publish heading-id &key is-feed access-control references)=.
|
||||
~40 lines, extending the vault (key signing), the fact store (CID generation),
|
||||
and the memex (output directory).
|
||||
|
||||
* Implications for Passepartout's Architecture
|
||||
|
||||
** The symbolic index now has a second ingestion path
|
||||
|
||||
Facts enter through three gates:
|
||||
1. Gate outcomes (bootstrap + runtime, =:provenance :gate-outcome=)
|
||||
2. Screamer deductions (=:provenance :deduced=)
|
||||
3. Archivist extraction (=:provenance :local-prose= or =:provenance :agora-note=)
|
||||
|
||||
The third path now covers both local Org files and incoming Agora Notes. No new
|
||||
path needed. The archivist gains no new code — only a new directory to walk
|
||||
(=~/memex/social/notes/=) and a new provenance tag to assign.
|
||||
|
||||
** Authentication Layer 1 now has Agora-native verification
|
||||
|
||||
Phase 0b's cryptographic gate (vector 0) verifies DID signatures. An incoming
|
||||
Agora Note carries =:owner <did>= and =:proof.signature <bytes>=. Gate vector 0
|
||||
verifies the signature against the DID's public key (from the key registry, which
|
||||
is now also an Agora DID registry). Verification is identical for local signals
|
||||
and Agora signals — the same gate, the same key lookup.
|
||||
|
||||
** Self-preservation gains an Agora dimension
|
||||
|
||||
The resource monitor (Phase 1a) tracks =~/memex/social/= as a source of storage
|
||||
growth. Incoming Notes from network sources are lower preservation priority than
|
||||
local prose — if disk pressure hits, incoming Agora Notes are evicted first
|
||||
(their source is the remote PDS; they can be re-fetched). Quarantine (Phase 1a)
|
||||
extends to Agora channels: if a DID is sending spam or malformed Notes, their
|
||||
incoming directory is quarantined and the DID is flagged for human review.
|
||||
|
||||
** Sufficiency tracks Agora as a provenance source
|
||||
|
||||
The sufficiency score (Phase 4) gains a new provenance category:
|
||||
|
||||
#+begin_example
|
||||
Symbolic Index
|
||||
Facts: 3,847
|
||||
Gate outcomes: 847 (22%)
|
||||
Deduced: 921 (24%)
|
||||
Human-authored: 72 (2%)
|
||||
Local prose: 1,247 (32%)
|
||||
Agora Notes: 760 (20%)
|
||||
─────────────────────────
|
||||
Non-lossy: 1,840 (48%)
|
||||
LLM-proposed: 2,007 (52%)
|
||||
#+end_example
|
||||
|
||||
Agora Notes are a provenance source, not a lossiness category. Facts from Agora
|
||||
Notes carry =:provenance :agora-note= — they are LLM-extracted (the archivist
|
||||
proposes them) but the source is cryptographically signed by a known DID. They
|
||||
are neither =:gate-outcome= (mechanical) nor =:llm-proposed= from local prose
|
||||
(uncertain source). They occupy a middle ground: verified source, uncertain
|
||||
extraction.
|
||||
|
||||
* Implications for Agora
|
||||
|
||||
** Passepartout IS the PDS
|
||||
|
||||
The TODO.org in =projects/agora/= already captures this: "Passepartout IS the
|
||||
PDS — the agent runs a personal data store in-process." With Org files as the
|
||||
Note format, this is literal. The PDS stores Org files. The agent reads them.
|
||||
The network accesses them via the PDS API. There is no separate PDS process.
|
||||
|
||||
** Level 0 pre-arbitration via Screamer
|
||||
|
||||
Section 07 of the Agora requirements describes a "Tier 0 Arbitrator" — a local
|
||||
AI that provides a sanity check before human arbitration. Passepartout's
|
||||
Screamer + fact store provides this at zero LLM tokens when working from
|
||||
existing facts:
|
||||
|
||||
- "Contract CID X references arbitrator DID Y. DID Y is active. Verified."
|
||||
- "All parties have signed. The HODL invoice is locked. Verified."
|
||||
- "The buyer's claim of non-delivery is supported by 3 signed messages with
|
||||
timestamps after the delivery deadline."
|
||||
- "The seller's proof-of-delivery field is empty. No QR scan recorded."
|
||||
|
||||
Each check is a Screamer query against the contract-lifecycle domain. Results
|
||||
are a plist, not a ruling. Both parties see the same evidence summary before
|
||||
escalating to Level 1.
|
||||
|
||||
** Reputation as deduced facts
|
||||
|
||||
Screamer deduces reputation from signed contract chains, not asserted claims:
|
||||
|
||||
#+begin_src lisp
|
||||
(:entity "did:agora:heather" :relation :contract-reputation
|
||||
:value (:completed 47 :defaulted 0 :disputes 3 :won 3 :escalated 0)
|
||||
:provenance :deduced :derived-from (<list of 47 contract CIDs>))
|
||||
#+end_src
|
||||
|
||||
This is the strong version of Agora's Trust Score. It's a fact deduced from
|
||||
cryptographic evidence, not a claim by the persona (self-reporting could be
|
||||
false) and not a claim by a centralized reputation service (could be bought).
|
||||
The deduction is auditable — `/audit did:agora:heather` shows every contract,
|
||||
every outcome, every ruling.
|
||||
|
||||
** Agent Behavioral Contracts — formal enforcement for the ABC of Agora
|
||||
|
||||
Bhardwaj (2026) introduces a formal framework that brings Design-by-Contract
|
||||
principles to autonomous AI agents. An ABC contract =C = (P, I, G, R)=
|
||||
specifies /Preconditions/, /Invariants/ (hard and soft), /Governance/ policies
|
||||
(hard and soft), and /Recovery/ mechanisms as first-class runtime-enforceable
|
||||
components.
|
||||
|
||||
This maps directly onto Agora's contract lifecycle:
|
||||
|
||||
| ABC component | Agora mapping |
|
||||
|------------------------+--------------------------------------------------------------|
|
||||
| =P= (Preconditions) | Contract Note validity checks: all signers' DIDs active, |
|
||||
| | contract CID correctly referenced, HODL invoice locked |
|
||||
| =I= (Invariants) | Hard: payment amount unchanged, arbitrator DID unchanged. |
|
||||
| | Soft: delivery within estimated window |
|
||||
| =G= (Governance) | Hard: no party modifies contract terms unilaterally. |
|
||||
| | Soft: parties communicate through designated channels |
|
||||
| =R= (Recovery) | Arbitration escalation, HODL invoice release, reputation |
|
||||
| | deduction |
|
||||
|
||||
The framework's key mathematical results have direct implications for Agora:
|
||||
|
||||
- /Drift Bounds Theorem/: contracts with recovery rate γ > α (natural drift rate
|
||||
from LLM non-determinism in agent behavior) bound behavioral drift to D* = α/γ.
|
||||
For Agora, this means contract enforcement can be /predictive/ — detecting drift
|
||||
before violation — rather than just /corrective/ after breach.
|
||||
|
||||
- /Compositionality Theorem/: sufficient conditions (interface compatibility,
|
||||
assumption discharge, governance consistency, recovery independence) under
|
||||
which individual contract guarantees compose end-to-end for multi-agent chains.
|
||||
This is essential for Agora's multi-party contracts, where a buyer, seller,
|
||||
arbitrator, and escrow agent form a chain of interdependent behavioral
|
||||
expectations.
|
||||
|
||||
- /(p, δ, k)-satisfaction/: probabilistic compliance accounting for LLM
|
||||
non-determinism — contracts hold with probability p, deviations stay within
|
||||
tolerance δ, recovery within k steps. This formalizes what Screamer's
|
||||
contract-lifecycle domain queries verify: whether the current state of a
|
||||
contract satisfies its agreed-upon conditions, given the inherent uncertainty
|
||||
in any agent's behavior.
|
||||
|
||||
The empirical results are significant: across 1,980 sessions on 7 models,
|
||||
contracted agents (with ABC enforcement) detected 5.2-6.8 soft violations per
|
||||
session that uncontracted agents missed entirely, with <10ms per-action overhead.
|
||||
Overhead is critical for Passepartout as the PDS — contract enforcement must not
|
||||
add latency to Note processing.
|
||||
|
||||
ABC does not replace Screamer. ABC specifies /what/ must hold; Screamer verifies
|
||||
/whether/ it holds against the fact store. The contract-lifecycle domain already
|
||||
planned for Phase 0b (signal chain) can be implemented as an ABC-like structure:
|
||||
a tuple of preconditions, invariants, governance rules, and recovery mechanisms,
|
||||
each expressed as Screamer-verifiable facts with Merkle provenance.
|
||||
|
||||
See also:
|
||||
- Bhardwaj, V.P. (2026). Agent Behavioral Contracts: Formal Specification and
|
||||
Runtime Enforcement for Reliable Autonomous AI Agents. arXiv:2602.22302.
|
||||
|
||||
** The merkle DAG IS the Key Event Log
|
||||
|
||||
Agora's KEL specification (Section 02) describes an append-only log of key
|
||||
events — inception, rotation, revocation, follow events. Passepartout's Merkle
|
||||
DAG (Phase 5, built on v0.2.0 memory-object infrastructure) is this log. Each
|
||||
key event is a fact in the =:key-lifecycle= domain. Each event has a
|
||||
=:parent-id= chaining to the previous event. The DAG is content-addressed —
|
||||
every event is a CID. The full KEL is queryable: `/audit did:agora:heather`
|
||||
renders every key event, every follow event, every contract signature, with
|
||||
provenance chains.
|
||||
|
||||
* Relation to the Neurosymbolic Roadmap
|
||||
|
||||
The Agora integration is not a new phase. It is a consequence of decisions
|
||||
already made:
|
||||
|
||||
| Roadmap item | Agora consequence |
|
||||
|-------------------------+----------------------------------------------------------------|
|
||||
| Phase 0b (key registry) | Key registry uses Agora DIDs. DID store is =:key-lifecycle= domain |
|
||||
| Phase 1 (fact store) | Fact store is also Note store. Same API, same hash table |
|
||||
| Phase 1a (self-pres.) | Incoming Notes tracked. Spam DIDs quarantined. Disk eviction |
|
||||
| Phase 3 (archivist) | Archivist walks =~/memex/social/notes/= alongside local dirs |
|
||||
| Phase 4 (sufficiency) | Agora Notes are a provenance category in the sufficiency score |
|
||||
| Phase 5 (Merkle DAG) | DAG = KEL. DAG = contract audit trail |
|
||||
| Phase 0b (signal chain) | Signal chain = contract lifecycle chain. Same Merkle linking |
|
||||
|
||||
No new lines in the roadmap. The Note publishing function (~40 lines) is a
|
||||
utility, not a phase.
|
||||
|
||||
* What Is NOT Built
|
||||
|
||||
1. *A separate Note parser.* Agora Notes ARE Org files. The existing Org parser
|
||||
reads both.
|
||||
2. *A separate Note store.* The =memory-object= struct stores both. The
|
||||
=*memory-store*= hash table holds both.
|
||||
3. *A separate extraction path for Agora content.* The archivist extracts facts
|
||||
from prose regardless of origin. The provenance tag distinguishes source.
|
||||
4. *A new authentication mechanism for Agora signals.* Gate vector 0 verifies
|
||||
DID signatures. The key registry is the DID registry.
|
||||
|
||||
See also:
|
||||
- =projects/agora/docs/= — Agora requirements (overview, identity, primitive, social, contracts, governance)
|
||||
- =projects/agora/TODO.org= — Passepartout integration track
|
||||
- =passepartout-neurosymbolic-design-decisions-and-options.org= — the full design rationale
|
||||
- =passepartout-neurosymbolic-roadmap.org= — the phased implementation plan
|
||||
@@ -442,6 +442,371 @@ design. The gate stack provides the seed. Gate outcomes, prose extraction,
|
||||
deduction, and human authoring grow the shoots. Screamer prunes contradictions.
|
||||
The ontology is a garden, not a building.
|
||||
|
||||
* Empirical Validation — Modular Ontology Engineering with LLMs
|
||||
|
||||
Shimizu and Hitzler (2025, /Journal of Web Semantics/) argue that LLMs can
|
||||
significantly accelerate knowledge graph and ontology engineering — modeling,
|
||||
extension, population, alignment, and entity disambiguation — but /only/ if
|
||||
ontologies are modular. Their paper provides empirical evidence that validates
|
||||
the modular architecture described in this document and exposes concrete patterns
|
||||
the archivist should adopt.
|
||||
|
||||
** The central finding: modularity is the key variable
|
||||
|
||||
In a complex ontology alignment task (mapping between two oceanography ontologies
|
||||
with hundreds of classes and properties), an LLM without module information
|
||||
detected correct mappings for 5 of 109 alignment rules — effectively useless. When
|
||||
the same LLM was given the module structure of the target ontology (20 named
|
||||
conceptual modules such as "Organization," "Cruise," "Physical Sample"), it
|
||||
detected correct mappings for 104 of 109 rules — 95% accuracy. The variable was
|
||||
modularity.
|
||||
|
||||
For ontology population (extracting triples from text), their best results came
|
||||
from prompts that included a schematic representation of a /single module/ plus
|
||||
one extraction example. Against ground truth, this achieved approximately 90%
|
||||
extraction accuracy. Without module-scoped prompting, quality degraded
|
||||
substantially.
|
||||
|
||||
The mechanism: conceptual modules scope the LLM's attention to something
|
||||
human-sized. The paper's central claim — "by somehow limiting the scope, we
|
||||
achieve a more human-like approach — and one more capable of being expressed
|
||||
succinctly in language, and thus more appropriate for LLM-based assistance" — is
|
||||
an independent discovery of the same principle underlying Passepartout's
|
||||
domain-scoped Screamer checks and per-domain cardinality policies.
|
||||
|
||||
** MOMo: a mature modular ontology methodology
|
||||
|
||||
The authors' approach, MOMo (Modular Ontology Modeling), has been developed over a
|
||||
decade and includes:
|
||||
|
||||
- A /step-by-step methodology/ that breaks ontology design into clearly delineated
|
||||
pieces, each "easier to automate than going one-shot from base data to an
|
||||
ontology."
|
||||
- A /pattern description language/ (OPLa, expressed in OWL) for annotating modules
|
||||
so they can be identified programmatically.
|
||||
- A /design library/ (MODL) containing hundreds of commonsense micropatterns
|
||||
organized for programmatic access, including via RAG.
|
||||
- A /Protégé plugin/ (CoModIDE) for graphical modular ontology development.
|
||||
|
||||
Critically, their modules are not formal sub-ontologies with logical boundaries.
|
||||
They are /conceptual/ partitions — groupings of classes, properties, and axioms
|
||||
around "key notions" identified by domain experts. Modules can overlap and nest.
|
||||
There are "no precise rules" for what belongs in a module. The modules provide
|
||||
"conceptual bridges between human expert conceptualization and data reality."
|
||||
|
||||
** What Passepartout should adopt
|
||||
|
||||
*** The modular prompt pattern for the archivist
|
||||
|
||||
The extraction prompt structure that achieved 90% accuracy is concrete and
|
||||
replicable: a schematic representation of a domain module plus a single extraction
|
||||
example. The archivist should use this pattern when extracting facts from prose.
|
||||
Instead of a generic "extract triples from this text" prompt (200 tokens), the
|
||||
prompt should reference the relevant module(s) and include an example triple for
|
||||
each relation in that module. The module provides /context/; the example provides
|
||||
/format/. Both improve LLM extraction quality without increasing Screamer's
|
||||
verification burden.
|
||||
|
||||
*** MOMo modules as ontology scaffold
|
||||
|
||||
The Passepartout notes describe an organic growth model: gate-bootstrapped facts
|
||||
seed the ontology; gate outcomes, Screamer deductions, and archivist proposals
|
||||
grow the shoots. This is correct for the /security and filesystem/ domains where
|
||||
the gate stack already encodes expertise. For the broader memex — literature,
|
||||
daily reflection, project planning — the 50-70 gate-bootstrapped entity classes
|
||||
are starvation.
|
||||
|
||||
MOMo's micropattern library provides a ready-made scaffold for these domains.
|
||||
Hundreds of commonsense patterns already exist for temporal relations, spatial
|
||||
relations, agent-action, organizational structure, provenance, and event
|
||||
participation. Loading these as initial modules — with :policy :plural and
|
||||
=:provenance :external-ontology= — would give the symbolic index a structured
|
||||
vocabulary for domains where the gate stack has nothing to offer. The organic
|
||||
growth model then /extends and refines/ these modules rather than inventing them
|
||||
from scratch. This is the Wikidata strategy applied at the schema level: adopt
|
||||
existing structured knowledge, connect personal facts to it, and surface
|
||||
disagreements rather than resolve them.
|
||||
|
||||
*** OPLa annotation for module identification
|
||||
|
||||
MOMo modules annotated in OPLa can "easily be identified programmatically." If
|
||||
Passepartout annotates its ontology modules in a compatible format (even a
|
||||
simplified plist-based equivalent), the archivist can automatically select the
|
||||
right module(s) when extracting facts from prose. A heading in =literature/=
|
||||
triggers the literature module; a heading in =projects/= triggers the software
|
||||
engineering module; a heading tagged =:personal:= triggers the diary module. The
|
||||
module scopes the prompt. The prompt improves extraction. Screamer gates the
|
||||
result. This is the full pipeline, validated at each step.
|
||||
|
||||
** What this means for the Passepartout architecture
|
||||
|
||||
The paper validates three design decisions already made:
|
||||
|
||||
1. /Modularity is non-negotiable./ The paper found that modularity is the
|
||||
difference between 5% and 95% accuracy on alignment. Passepartout's per-domain
|
||||
cardinality policies and domain-scoped Screamer checks are the same insight
|
||||
implemented in a different context. The paper proves the approach works;
|
||||
Passepartout applies it to verification rather than extraction.
|
||||
|
||||
2. /The extraction pipeline is feasible./ 90% population accuracy with module-
|
||||
scoped prompts means the archivist /can/ extract useful facts from prose. The
|
||||
remaining 10% — the hallucination rate — is what Screamer catches. The paper
|
||||
validates the LLM-as-proposer role; Passepartout adds the Screamer-as-verifier
|
||||
role.
|
||||
|
||||
3. /KGs are positioned as anti-hallucination infrastructure./ The paper explicitly
|
||||
frames knowledge graphs as "ground truth to escape from LLM hallucinations" and
|
||||
as "components of other neurosymbolic approaches." This is the Passepartout
|
||||
thesis — the symbolic index as ground truth against which LLM proposals are
|
||||
checked — stated in the academic literature by the editors of the neurosymbolic
|
||||
AI handbooks.
|
||||
|
||||
And it exposes one gap in the current design:
|
||||
|
||||
1. /Emergent modularity may be slower than designed modularity./ Passepartout's
|
||||
modules are supposed to emerge organically from gate patterns, Screamer
|
||||
generalizations, and cross-domain overlap detection. MOMo's modules are
|
||||
designed by domain experts who identify key notions upfront. The emergent
|
||||
approach is philosophically cleaner — the system learns its own categories —
|
||||
but practically slower. The paper's results suggest that adopting designed
|
||||
modules as a scaffold, and letting emergent growth /refine/ rather than
|
||||
/invent/ them, would compress the timeline for sufficiency by years.
|
||||
|
||||
** Relation to Wikidata loading
|
||||
|
||||
The MOMo micropattern approach and the Wikidata loading strategy are complementary:
|
||||
|
||||
| Layer | MOMo provides | Wikidata provides |
|
||||
|----------------+--------------------------------+--------------------------|
|
||||
| Schema | Modular ontology of relations | — (Wikidata's schema is |
|
||||
| | and entity classes | implicit in its data) |
|
||||
| Instances | — (patterns, not entities) | 100M+ entities with |
|
||||
| | | property-value pairs |
|
||||
|
||||
MOMo gives Passepartout the /relations/ (wrote, lectured-on, influenced,
|
||||
published-in). Wikidata gives Passepartout the /instances/ (Nabokov, Pale Fire,
|
||||
Kafka). Both are needed. Neither alone is sufficient. The MOMo scaffold tells the
|
||||
archivist /what kinds of facts to look for/. The Wikidata graph tells the
|
||||
archivist /which entities those facts are about/. Together they transform the
|
||||
extraction task from "discover entities and their relations from prose" to
|
||||
"connect this prose heading to known entities using known relations" — a
|
||||
dramatically simpler prompt with dramatically higher expected accuracy.
|
||||
|
||||
** Reference
|
||||
|
||||
- Shimizu, C., & Hitzler, P. (2025). Accelerating knowledge graph and ontology
|
||||
engineering with large language models. /Journal of Web Semantics, 85/,
|
||||
100862. https://doi.org/10.1016/j.websem.2025.100862
|
||||
|
||||
** See also
|
||||
|
||||
- =passepartout-neurosymbolic-roadmap.org=: Phase 3 (Archivist) — the modular
|
||||
prompt pattern should be incorporated into the extraction pipeline.
|
||||
- =passepartout-agora.org=: the KEL / contract audit trail as instances of
|
||||
MOMo-style key-lifecycle and contract-lifecycle modules.
|
||||
- =notes/passepartout-SWOT.org=: the SWOT analysis which identifies the ontology
|
||||
problem as the key bottleneck — MOMo partially addresses this.
|
||||
|
||||
** Supporting References
|
||||
|
||||
*** MOMo: the canonical methodology
|
||||
|
||||
Shimizu, Hammar & Hitzler (2023, /Semantic Web Journal/) present the full MOMo
|
||||
methodology — 31 pages covering the step-by-step design process, schema diagrams
|
||||
as knowledge elicitation tools, ODP libraries, OPLa annotation language, and
|
||||
CoModIDE, a Protégé plugin for graphical modular ontology development. The paper
|
||||
was evaluated with usability studies and demonstrates that modular development
|
||||
significantly improves approachability for domain experts who are not ontology
|
||||
engineers.
|
||||
|
||||
Key architectural commitments from MOMo that Passepartout should adopt:
|
||||
|
||||
- /Schema diagrams/ as the primary communication format between ontologist and
|
||||
domain expert. Passepartout's equivalent: the archivist's module-scoped prompt
|
||||
includes a simplified schema diagram of the module being populated.
|
||||
- /Template-based instantiation/ of ontology design patterns into concrete
|
||||
modules. Passepartout's equivalent: micropatterns loaded from MODL are
|
||||
instantiated with entities from the user's memex, producing concrete facts.
|
||||
- /Systematic axiomatization/ — 17 frequently used axiom patterns for each
|
||||
node-edge-node construction in a schema diagram. Passepartout's equivalent:
|
||||
Screamer constraint rules derived from module structure.
|
||||
|
||||
Reference:
|
||||
- Shimizu, C., Hammar, K., & Hitzler, P. (2023). Modular ontology modeling.
|
||||
/Semantic Web, 14/(3), 459–489. https://doi.org/10.3233/SW-222886
|
||||
|
||||
*** Ontology Population — the empirical methodology
|
||||
|
||||
Norouzi et al. (2024) provide the full experimental methodology behind the ~90%
|
||||
extraction accuracy claim. Using the Enslaved.org Hub Ontology as ground truth
|
||||
and Wikipedia articles as source text, they tested five LLMs across a three-stage
|
||||
pipeline: preprocessing, text retrieval, and KG population. The critical finding:
|
||||
prompts that included a /schema diagram/ of the target ontology module (using
|
||||
MOMo's visual conventions with colored boxes for classes, arrows for relations)
|
||||
plus a single extraction example achieved the highest accuracy. Without
|
||||
module-scoped prompts, quality degraded substantially.
|
||||
|
||||
Three findings are directly applicable to the archivist:
|
||||
|
||||
1. /Role chain simplification./ The Enslaved Ontology has complex role chains
|
||||
(e.g., Person → hasRole → Role → inEvent → Event). These were collapsed into
|
||||
shortcut relations (e.g., Person → participatedIn → Event) for LLM extraction.
|
||||
The archivist should maintain two layers: the /logical/ schema with full role
|
||||
chains for Screamer verification, and the /extraction/ schema with simplified
|
||||
relations for LLM prompting.
|
||||
|
||||
2. /Variance across models./ Five LLMs were tested. Performance varied
|
||||
significantly. The archivist should benchmark extraction accuracy per provider
|
||||
and per module, and route extraction tasks to the best-performing model for
|
||||
each module — extending the existing model-tier routing (v0.3.0) from
|
||||
complexity-based to accuracy-based routing.
|
||||
|
||||
3. /Cross-source validation./ The paper used both Wikipedia text and Wikidata
|
||||
as overlapping sources for the same entities, enabling cross-verification.
|
||||
The archivist can do the same: extract facts from the user's prose, extract
|
||||
facts from Wikidata for the same entities, and present disagreements with
|
||||
provenance. This is the =:plural= cardinality policy applied at extraction time.
|
||||
|
||||
Reference:
|
||||
- Norouzi, S.S., Barua, A., Christou, A., Gautam, N., Eells, A., Hitzler, P.,
|
||||
& Shimizu, C. (2024). Ontology Population using LLMs. arXiv:2411.01612.
|
||||
|
||||
* Historical Lineage — McCarthy's Advice Taker
|
||||
|
||||
McCarthy's "Programs with Common Sense" (1959) is the direct intellectual ancestor
|
||||
of the Passepartout architecture. The paper proposed an "advice taker" — a program
|
||||
that "will draw immediate conclusions from a list of premises" expressed in
|
||||
"a suitable formal language (most likely a part of the predicate calculus)." The
|
||||
program would:
|
||||
|
||||
1. Accept declarative statements about the world as input.
|
||||
2. Store them as logical formulas.
|
||||
3. Reason from them to produce new conclusions.
|
||||
4. Accept new facts and revise its conclusions.
|
||||
|
||||
This is precisely the Passepartout pipeline: the archivist extracts declarative
|
||||
facts from prose → Screamer checks them for consistency → VivaceGraph stores them
|
||||
→ the planner reasons from them → new facts from gate outcomes and deductions
|
||||
revise the store. McCarthy proposed it in 1959. Passepartout is building it in
|
||||
2026.
|
||||
|
||||
The gap between McCarthy's proposal and Passepartout's implementation is the
|
||||
/hallucination problem/. McCarthy assumed facts would be entered by a human
|
||||
programmer in formal logic. Passepartout's facts are extracted from natural
|
||||
language prose by an LLM — a probabilistic process that requires deterministic
|
||||
verification. Screamer is the component McCarthy didn't need: a constraint solver
|
||||
that gates LLM-proposed facts against the existing fact store.
|
||||
|
||||
The connection is not metaphorical. McCarthy cited Principia Mathematica as an
|
||||
influence on Lisp. Passepartout's Whitehead note traces the same PM → Lisp
|
||||
lineage. The advice taker → Passepartout lineage completes the arc: PM's formal
|
||||
logic → Lisp → McCarthy's advice taker → Passepartout's neurosymbolic engine.
|
||||
|
||||
Reference:
|
||||
- McCarthy, J. (1959). Programs with Common Sense. /Proceedings of the
|
||||
Teddington Conference on the Mechanization of Thought Processes./
|
||||
|
||||
* Philosophical Validation — The Neurosymbolic Consensus
|
||||
|
||||
Three papers from the neurosymbolic AI research community validate the
|
||||
architectural thesis from complementary angles.
|
||||
|
||||
** Marcus (2020): The Case Against Pure Deep Learning
|
||||
|
||||
Gary Marcus's "The Next Decade in AI" argues that deep learning alone is "data
|
||||
hungry, shallow, brittle, and limited in its ability to generalize." The paper
|
||||
demonstrates GPT-2 failing at basic commonsense reasoning:
|
||||
|
||||
- "Yesterday I dropped my clothes off at the dry cleaners and have yet to pick
|
||||
them up. Where are my clothes?" → GPT-2: "at my mom's house."
|
||||
- "There are six frogs on a log. Two leave, but three join. The number of frogs
|
||||
on the log is now" → GPT-2: "seventeen."
|
||||
|
||||
Marcus proposes four steps toward robust AI: hybrid architecture (combining
|
||||
neural and symbolic), large-scale knowledge (abstract and causal, not just
|
||||
statistical), reasoning (formal inference over structured representations), and
|
||||
cognitive models (frameworks for how entities relate). Passepartout implements all
|
||||
four: the perceive-reason-act pipeline is hybrid, the symbolic index is causal
|
||||
knowledge, Screamer + ACL2 provide reasoning, and the gate-bootstrapped ontology
|
||||
plus MOMo modules provide cognitive models.
|
||||
|
||||
Marcus's core claim — "we have no hope of achieving robust intelligence without
|
||||
first developing systems with deep understanding" — is the justification for
|
||||
Passepartout's entire neurosymbolic investment. The alternative is a system that
|
||||
works "on a good day" and fails unpredictably. The deterministic gate stack and
|
||||
Screamer admission gate are the engineering realization of Marcus's call for
|
||||
robustness.
|
||||
|
||||
Reference:
|
||||
- Marcus, G. (2020). The Next Decade in AI: Four Steps Towards Robust
|
||||
Artificial Intelligence. arXiv:2002.06177.
|
||||
|
||||
** Gaur & Sheth (2023): CREST — Trustworthy Neurosymbolic AI
|
||||
|
||||
Gaur and Sheth present the CREST framework: Consistency, Reliability, user-level
|
||||
Explainability, and Safety build Trust — and they argue these require
|
||||
neurosymbolic methods. Their empirical finding: GPT-3.5 breached safety
|
||||
constraints 30% of the time when asked identical questions repeatedly. Claude's
|
||||
16 safety rules and Sparrow's 23 rules provide no /inherent/ safety — they are
|
||||
heuristic guardrails that can be breached through prompt variation.
|
||||
|
||||
These findings validate three Passepartout design commitments:
|
||||
|
||||
1. /Prompt-level safety is insufficient./ Claude and Sparrow use rules that
|
||||
consume LLM tokens and can be evaded. Passepartout's deterministic gates run
|
||||
in pure Lisp, cost 0 tokens, and cannot be evaded by prompt engineering.
|
||||
|
||||
2. /Inconsistency is the norm, not the exception./ Gaur & Sheth show that even
|
||||
identical queries produce inconsistent responses ~30% of the time. This
|
||||
validates the cardinality model: a system that expects contradiction and
|
||||
surfaces it with provenance is architecturally more honest than one that
|
||||
assumes consistency and silently resolves it.
|
||||
|
||||
3. /Knowledge infusion is required for trust./ The CREST framework embeds
|
||||
domain knowledge (clinical guidelines, procedural knowledge) into LLM
|
||||
pipelines. Passepartout's symbolic index IS the knowledge infusion layer —
|
||||
facts extracted from prose, verified by Screamer, and available for any LLM
|
||||
call through the context assembly pipeline.
|
||||
|
||||
Reference:
|
||||
- Gaur, M., & Sheth, A. (2023). Building Trustworthy NeuroSymbolic AI Systems:
|
||||
Consistency, Reliability, Explainability, and Safety. arXiv:2312.06798.
|
||||
|
||||
** Sheth et al. (2022): Knowledge-Infused Learning
|
||||
|
||||
Sheth, Gunaratna, Bhatt, and Gaur define Knowledge-infused Learning (KiL) as
|
||||
"combining various types of explicit knowledge with data-driven deep learning
|
||||
techniques." They identify three infusion levels (shallow, semi-deep, deep) and
|
||||
position KiL as "a sweet spot in neuro-symbolic AI."
|
||||
|
||||
The paper makes two observations relevant to Passepartout:
|
||||
|
||||
1. /Data alone is not enough./ The opening cites Pedro Domingos ("Data Alone is
|
||||
Not Enough"), Andrew Ng ("the importance of Big Data is overhyped"), and
|
||||
Gary Marcus ("AI that captures how humans think"). These are the intellectual
|
||||
warrant for the symbolic index: a knowledge layer that is independent of any
|
||||
specific LLM call, accumulated across sessions, and verified against existing
|
||||
facts.
|
||||
|
||||
2. /Expert knowledge is external to the model./ Domain experts use "their past
|
||||
experience, web or domain-specific knowledge sources, and annotation
|
||||
guidelines" to create ground truth — resources the LLM cannot access during
|
||||
training. The symbolic index makes these resources queryable: facts from the
|
||||
gate stack (security expertise), from the human (declarative authoring), from
|
||||
Wikidata (world knowledge), and from Screamer deductions (derived expertise).
|
||||
|
||||
Passepartout's architecture is a specific implementation of KiL at the deepest
|
||||
infusion level: knowledge is not appended to prompts (shallow) or embedded in
|
||||
fine-tuning (semi-deep). It is a first-class data structure — the symbolic index
|
||||
— that the LLM queries through the archivist and the planner. The knowledge is
|
||||
living: it accumulates, is verified, carries provenance, and evolves through
|
||||
ontology versioning.
|
||||
|
||||
Reference:
|
||||
- Gaur, M., Gunaratna, K., Bhatt, S., & Sheth, A. (2022). Knowledge-Infused
|
||||
Learning: A Sweet Spot in Neuro-Symbolic AI. /IEEE Internet Computing, 26/(4),
|
||||
5–11. https://doi.org/10.1109/MIC.2022.3179759
|
||||
|
||||
* Semantic Wikipedia as Entity Backbone
|
||||
|
||||
The gate stack provides 50-70 entity classes — adequate for a coding agent where
|
||||
@@ -1412,3 +1777,19 @@ See also:
|
||||
- =passepartout/docs/DESIGN_DECISIONS.org= — the existing design decisions
|
||||
- =passepartout/docs/ARCHITECTURE.org= — the current pipeline architecture
|
||||
- =passepartout/docs/ROADMAP.org= — the feature roadmap through v0.13.0
|
||||
- =notes/passepartout-SWOT.org= — SWOT analysis of the neurosymbolic architecture
|
||||
- =passepartout-agora.org= — Passepartout-Agora integration design
|
||||
- Shimizu, C. & Hitzler, P. (2025). Accelerating knowledge graph and ontology
|
||||
engineering with large language models. /Journal of Web Semantics, 85/, 100862.
|
||||
https://doi.org/10.1016/j.websem.2025.100862
|
||||
- Shimizu, C., Hammar, K., & Hitzler, P. (2023). Modular ontology modeling.
|
||||
/Semantic Web, 14/(3), 459–489. https://doi.org/10.3233/SW-222886
|
||||
- Norouzi, S.S. et al. (2024). Ontology Population using LLMs. arXiv:2411.01612.
|
||||
- McCarthy, J. (1959). Programs with Common Sense. /Proc. Teddington Conf. on
|
||||
the Mechanization of Thought Processes./
|
||||
- Marcus, G. (2020). The Next Decade in AI. arXiv:2002.06177.
|
||||
- Gaur, M. & Sheth, A. (2023). Building Trustworthy NeuroSymbolic AI Systems.
|
||||
arXiv:2312.06798.
|
||||
- Gaur, M., Gunaratna, K., Bhatt, S., & Sheth, A. (2022). Knowledge-Infused
|
||||
Learning. /IEEE Internet Computing, 26/(4), 5–11.
|
||||
- Bhardwaj, V.P. (2026). Agent Behavioral Contracts. arXiv:2602.22302.
|
||||
|
||||
Submodule projects/passepartout updated: 24a24b481b...6422a84872
Submodule projects/passepartout-contrib updated: ce17336acd...825ef832ba
Reference in New Issue
Block a user