869 lines
49 KiB
Org Mode
869 lines
49 KiB
Org Mode
#+TITLE: Passepartout Neurosymbolic + Agora Integration — SWOT Analysis
|
|
#+AUTHOR: Agent
|
|
#+FILETAGS: :notes:analysis:swot:passepartout:agora:neurosymbolic:
|
|
#+CREATED: [2026-05-09 Sat]
|
|
|
|
* Premise and Scope
|
|
|
|
This analysis assumes the engineering is possible — Screamer can be wrapped,
|
|
VivaceGraph can persist facts, ACL2 can verify structural properties, the
|
|
archivist can extract triples from prose with Screamer verification, and the
|
|
note-publishing bridge to Agora can be implemented. The question is not "can it
|
|
be built?" but "does the architecture cohere? What does it enable? What does it
|
|
miss?"
|
|
|
|
* Will It Work Conceptually?
|
|
|
|
The short answer: yes, within a specific domain. The long answer: the boundary of
|
|
that domain is the most important thing to get right.
|
|
|
|
** The architecture's core insight is correct and load-bearing
|
|
|
|
The central design decision — "the LLM proposes; the symbolic engine decides
|
|
whether to accept" — is sound. It is the inverse of every existing agent
|
|
architecture. Claude Code, OpenCode, Hermes — all of them put the LLM in the
|
|
driver's seat and add safety as an afterthought (prompt-based guardrails that
|
|
consume tokens and can be evaded). Passepartout inverts this: the LLM proposes
|
|
actions and facts, but a deterministic layer of gates, constraint solvers, and
|
|
formal verifiers decides what to admit and what to execute. This inversion is the
|
|
correct response to the hallucination problem. You cannot eliminate hallucination
|
|
by making the LLM better. You eliminate it by not asking the LLM to do things
|
|
that require certainty.
|
|
|
|
The bootstrap mechanism — extracting 50-70 entity classes mechanically from the
|
|
existing Dispatcher gate stack with zero new code — is genuinely elegant. It
|
|
proves the pattern at minimal cost: code becomes facts, facts enable reasoning.
|
|
Every new gate pattern adds to the ontology organically. This is the right way to
|
|
start a knowledge base: not by designing a schema upfront, but by formalizing what
|
|
the system already knows implicitly.
|
|
|
|
** The "one memex, two indices" architecture survives contact with reality
|
|
|
|
Option 4 (one memex with neural and symbolic indices over the same Org files) is
|
|
the correct long-term architecture. The prose is the ground truth — always. The
|
|
symbolic index is a derived view that can be thrown away and rebuilt. The neural
|
|
index handles semantic search, associative leaps, and fuzzy matching. This
|
|
division of labor is permanent, not transitional, because the domains they serve
|
|
are fundamentally different kinds of knowledge.
|
|
|
|
The practical path — starting with Option 5 (ephemeral facts, no persistence)
|
|
through Phases 1-4, then graduating to Option 4 with VivaceGraph persistence in
|
|
Phase 5 — is the right sequence. It punts the serialization format problem until
|
|
the fact language has been battle-tested. It keeps the cost of mistakes low. It
|
|
treats the ontology as something discovered through use rather than designed
|
|
upfront.
|
|
|
|
** Wikipedia's ontology WOULD give it a running start — with caveats
|
|
|
|
Wikidata contains approximately 100 million entities with a decade of human
|
|
curation: type hierarchies, relations, dates, citations, disambiguation. For a
|
|
personal memex that mentions Nabokov, /Pale Fire/, Kafka, postmodernism, and
|
|
butterfly migration, the gate stack's 50-70 entity classes is starvation.
|
|
Organic growth through prose extraction would take years to cover the entities in
|
|
one person's engagement with a single novel.
|
|
|
|
Loading Wikidata's entity graph into the symbolic index transforms the
|
|
archivist's job from "discover that Nabokov wrote /Pale Fire/" to "connect your
|
|
heading to Wikidata entity Q36591." The second task is reference resolution, not
|
|
knowledge extraction — simpler, more reliable, and in many cases doable without
|
|
an LLM at all (string match against loaded entities). The notes claim this
|
|
collapses the LLM's role to three thin boundaries: input translation, prose-to-
|
|
candidate-triple for personal content, and result-to-prose formatting.
|
|
|
|
The caveats are real:
|
|
|
|
- Entity resolution (matching prose mentions to Wikidata entities) is genuinely
|
|
hard. "Nabokov" in a diary might refer to Vladimir Nabokov (Q36591), his son
|
|
Dmitri (Q566744), or someone else entirely. Disambiguation requires context
|
|
that the symbolic engine doesn't have without LLM assistance.
|
|
- Wikidata is biased toward English Wikipedia's coverage. A memex in Arabic,
|
|
Farsi, or Amharic will find far fewer resolved entities. The "universal" in
|
|
Wikidata is aspirational, not actual.
|
|
- Wikidata's property graph is not a ontology in the formal sense — it's a
|
|
collaboratively edited dataset with contradictions, gaps, and editorial wars
|
|
frozen in time. Loading it directly into a symbolic index that expects
|
|
consistency (Screamer checks, cardinality policies) will surface thousands of
|
|
contradictions on ingest, many of which are Wikidata artifacts, not meaningful
|
|
tensions.
|
|
- N-hop expansion is unbounded. One hop from Nabokov hits hundreds of entities
|
|
(his works, his family, his influences, his translators). Two hops hits
|
|
thousands. Three hops hits tens of thousands. The notes say "3-4 hops" for a
|
|
literary memex but don't estimate the entity count this implies. The claim that
|
|
5 million entities = ~400MB is the best-case hash-table figure; a graph with
|
|
query indices will be larger, and Prolog-like queries over millions of nodes
|
|
are not free.
|
|
|
|
Still: even a partial Wikidata load with conservative hop limits would provide
|
|
more ontology than the system could accumulate through years of organic growth.
|
|
It is the right accelerator, and the architecture handles it correctly — Wikidata
|
|
facts are admitted with =:provenance :wikidata= and =:policy :plural=, meaning
|
|
they sit alongside personal facts without overriding them. Disagreements are
|
|
surfaced, not resolved. The architecture treats Wikidata as evidence from an
|
|
external source, not as ground truth. That's the correct posture.
|
|
|
|
** Cardinality policies are the right abstraction for contradiction
|
|
|
|
The =:singular= / =:dual= / =:plural= cardinality model is one of the most
|
|
important ideas in these notes. Classical logic requires consistency — a
|
|
contradiction implies everything (ex contradictione quodlibet). A constraint
|
|
solver like Screamer also requires consistency — a contradictory constraint set
|
|
has no solutions. But a personal memex operates across domains where the meaning
|
|
of contradiction is fundamentally different:
|
|
|
|
- "rm -rf / is catastrophic" is =:singular= — there is one truth that evolves
|
|
over time.
|
|
- "I loved this person AND I resented them" is =:dual= — the tension IS the
|
|
fact.
|
|
- "Wikidata says Everest is 8848m; DBpedia says 8849m; my 2023 diary says
|
|
8848m" is =:plural= — multiple sources disagree, and surfacing the disagreement
|
|
with provenance is the product.
|
|
|
|
This is a genuinely novel contribution to knowledge representation. Most
|
|
knowledge graphs (Wikidata, Freebase, DBpedia) don't model contradiction at all —
|
|
they pick one value and discard the rest. Most constraint solvers reject
|
|
contradiction as error. Passepartout's cardinality model makes contradiction a
|
|
first-class citizen: you can query the fact that "I used to believe X until
|
|
Tuesday, then Y," or "these three sources disagree on height," or "I hold these
|
|
two positions in tension." The symbolic engine's job is not to decide which is
|
|
right. It is to surface the tension with provenance.
|
|
|
|
This alone, if implemented correctly, would be a category-level advance over
|
|
every existing personal knowledge management tool.
|
|
|
|
** Ontology versioning is the right approach to the migration problem
|
|
|
|
Every knowledge base eventually faces schema migration — you split =:secret-file=
|
|
into =:crypto-secret= and =:plaintext-secret=, and now every deduction that
|
|
crossed the old category boundary is suspect. The standard approach is batch
|
|
UPDATE operations that overwrite the past. Passepartout's approach — the category
|
|
hierarchy itself is a Merkle tree, every fact stores the =:ontology-version= at
|
|
assertion time, category changes trigger re-verification rather than remapping —
|
|
preserves all worldviews. You can query "what did I believe about secrets before
|
|
I refined my security model?" This is not querying a fact. It is querying the
|
|
history of your own thinking.
|
|
|
|
This is the kind of capability that no existing tool provides, and it flows
|
|
directly from the architecture. If the Merkle DAG infrastructure exists (it does,
|
|
from v0.2.0), ontology versioning is ~40 lines on top of it. The conceptual
|
|
design is sound. The engineering appears tractable.
|
|
|
|
* SWOT Analysis
|
|
|
|
** Strengths
|
|
|
|
*** Architectural inversion — proposer vs decider
|
|
|
|
The LLM proposes. The symbolic engine decides. This is the inverse of every
|
|
existing agent architecture, and it solves the hallucination problem at the
|
|
architectural level rather than the prompt-engineering level. No amount of
|
|
prompt refinement can make a probabilistic system deterministic. But a
|
|
deterministic admission gate can make a probabilistic proposer safe.
|
|
|
|
*** Unified container format (Org files)
|
|
|
|
Org files serve as the container for human prose, Lisp source code, symbolic
|
|
facts, and Agora Notes. One format, one toolchain, one Merkle tree, one version
|
|
control system. If Passepartout stops existing, the data survives in plain text.
|
|
This is the hardest commitment in the design and the most undervalued. Most agent
|
|
architectures store memory in JSONL transcripts, vector databases, or proprietary
|
|
formats — opaque to the human and dependent on the tool. Passepartout's memory
|
|
IS the human's memory, in the human's format.
|
|
|
|
*** Provenance as product
|
|
|
|
Every fact carries =:grounding= (the specific Org heading), =:provenance= (who
|
|
or what produced it), =:timestamp=, =:referenced-by=, =:contradicted-by=,
|
|
=:superseded-by=. The =/audit= command renders the full provenance chain. In the
|
|
broader memex, the value is not the verified fact ("this command is safe"). It
|
|
is the provenance itself: "this claim originated in that diary entry, has been
|
|
referenced 7 times across 4 projects, was contradicted 6 months later, and was
|
|
revised 3 weeks after that." This is a memory prosthesis that makes your own mind
|
|
legible to you.
|
|
|
|
*** Gate-to-fact bootstrap — ontology from existing code
|
|
|
|
The existing Dispatcher gate stack encodes an implicit ontology (categories of
|
|
secrets, destructive commands, trusted domains, core files). The bootstrap
|
|
extracts this mechanically — zero LLM tokens, zero human authoring, ~30 lines of
|
|
Lisp. This proves the pattern and provides the seed ontology without any new
|
|
infrastructure. Every new gate pattern added by the human (HITL approvals that
|
|
become rules) extends the ontology automatically.
|
|
|
|
*** Self-preservation architecture
|
|
|
|
The Third Law implementation — quarantine on skill failure, degraded-mode
|
|
signaling, resource monitoring, external watchdog, refusal to self-terminate —
|
|
is individually small (~20-50 lines each) and collectively transforms
|
|
self-preservation from a passive architectural property into an active behavior.
|
|
The key insight: the biggest gap is not that these mechanisms are hard. It is
|
|
that degradation is currently silent. Making it visible is cheap and high-impact.
|
|
|
|
*** Cardinality policies as a solution to contradiction
|
|
|
|
The =:singular= / =:dual= / =:plural= model is novel in knowledge representation
|
|
and directly addresses the hardest problem in a personal memex: that
|
|
contradiction is the product, not the error. Bayesian knowledge bases, graph
|
|
databases, and triple stores all struggle with contradiction. Passepartout's
|
|
model makes it a feature.
|
|
|
|
*** Organic ontology growth
|
|
|
|
Categories emerge from the system's own operation: gate patterns → gate outcomes
|
|
→ Screamer generalizations → archivist proposals → cross-domain overlap
|
|
detection. The ontology is a garden, not a building. This avoids the Principia
|
|
Mathematica problem — the need to define everything upfront — by replacing
|
|
axiomatic design with evolutionary growth. Categories that aren't used fade.
|
|
Categories that are contradictory are pruned. Categories that emerge from
|
|
overlapping domains are promoted. The system converges on useful granularity
|
|
through use.
|
|
|
|
*** Agora as provenance layer for networked knowledge
|
|
|
|
A BFT-timestamped triple store is one approach, but the Merkle DAG + DID
|
|
signatures provide a lighter-weight alternative: every fact's provenance is
|
|
content-addressed, every author's identity is cryptographically verifiable, and
|
|
the DAG structure enables partial replication without consensus. This is more
|
|
tractable than full BFT and sufficient for a personal memex that needs to share
|
|
facts across a network.
|
|
|
|
*** Decoupling of compute cost from knowledge base size
|
|
|
|
LLM tokens are minimized by design — deterministic gates cost 0 tokens, sparse-
|
|
tree rendering keeps context at 2,000-4,000 tokens, Screamer deductions cost 0
|
|
tokens. Adding 5 million Wikidata entities does not add a single token to any LLM
|
|
call. The variables that actually degrade performance — context window size, LLM
|
|
call frequency, Screamer deduction budget — are all bounded independently of
|
|
knowledge base size. This is a structural property: the education is local, only
|
|
the brain costs.
|
|
|
|
** Weaknesses
|
|
|
|
*** The fact language is unproven and may be insufficient
|
|
|
|
Triples — =(:entity :relation :value)= with provenance and grounding — is the
|
|
current hypothesis. It is simple enough to be parseable, expressive enough to
|
|
capture the gate stack's implicit claims, and extensible enough that Screamer can
|
|
operate on it. But:
|
|
|
|
- Triples cannot naturally express temporal relations. "Was X before Y?" requires
|
|
reification (making the relation itself an entity), which makes queries
|
|
exponentially more complex.
|
|
- Triples cannot express modal claims. "Should not do X unless Y" has no natural
|
|
triple representation. Neither does "could have done X but chose Y."
|
|
- Triples cannot express counterfactuals. "If X had happened, Y would have
|
|
followed." These are essential for the "what if" reasoning that a personal
|
|
memex should support.
|
|
- Triples struggle with n-ary relations. "Nabokov wrote Pale Fire in 1962 while
|
|
living in Montreux" is a 4-ary relation (author, work, date, location), not a
|
|
set of independent binary relations. Breaking it into triples loses the
|
|
connection that binds them.
|
|
- Triples cannot express negation cleanly. "Nabokov did NOT write Doctor Zhivago"
|
|
requires a negative fact, which in a triple store with an open-world assumption
|
|
means "not known" and "known not" are conflated.
|
|
|
|
The notes acknowledge this limitation but defer it. The right granularity
|
|
"depends on what queries the planner actually needs to make, and that cannot be
|
|
known in advance." This is honest but unsatisfying. If triples prove insufficient,
|
|
the entire fact store, the Screamer integration, the VivaceGraph persistence, and
|
|
the archivist's extraction format must be redesigned. The architecture has no
|
|
intermediate fallback between "triples" and "something more expressive."
|
|
|
|
*** Screamer as admission gate is untested at this scale
|
|
|
|
Screamer is a constraint solver with non-deterministic backtracking. Using it
|
|
to check a candidate triple against an existing fact store is conceptually
|
|
elegant: express the fact store as constraint variables, assert the candidate,
|
|
check solvability. But:
|
|
|
|
- Screamer was designed for constraint satisfaction problems with tens to
|
|
hundreds of variables. A fact store with millions of triples (after Wikidata
|
|
loading) is a constraint space orders of magnitude larger than Screamer's
|
|
design envelope.
|
|
- The consistency check is domain-scoped (only rules from the candidate's
|
|
=:domain= apply), but cross-domain contradictions are the most valuable kind.
|
|
"Nabokov was born in 1899" (literature domain) should be consistent with
|
|
"Nabokov died in 1977" (history domain). If these are separate domains, the
|
|
check misses contradictions; if they are unified, the constraint space
|
|
explodes.
|
|
- Screamer's non-deterministic backtracking is worst-case exponential. The notes
|
|
bound this via deduction budget (=SCREAMER_DEDUCTION_BUDGET_MS=) but don't
|
|
address the admission check itself, which runs on every assertion.
|
|
|
|
There is a risk that Screamer works beautifully for the gate-bootstrapped seed
|
|
(50-70 entity classes, ~200 facts) and becomes unusably slow after Wikidata
|
|
loading (millions of facts). The transition from "works" to "doesn't" may be
|
|
gradual and hard to detect — the system gets slower but doesn't crash,
|
|
degrading user experience without a clear diagnostic.
|
|
|
|
*** The "flip" from lossy to deterministic is underspecified
|
|
|
|
The architecture's central narrative arc is the "flip": at some point, the non-
|
|
lossy facts constitute a sufficient foundation that the symbolic engine can
|
|
reverse the flow — instead of LLM extraction, the symbolic engine reads prose
|
|
through its own lens and deduces facts directly. The sufficiency metric
|
|
(non-lossy / total > 0.7) makes this "computable and visible to the user."
|
|
|
|
But:
|
|
|
|
- The threshold (0.7) is arbitrary. It is not derived from empirical measurement,
|
|
information theory, or constraint satisfaction theory. It is a guess.
|
|
- Sufficiency is domain-specific, not global. The gate stack may have 0.95
|
|
coverage of security classifications but 0.05 coverage of literary analysis.
|
|
A global threshold of 0.7 hides the domains where the symbolic engine is still
|
|
effectively blind.
|
|
- The "flip" operation itself is not defined. "Screamer reads prose through its
|
|
own lens" — Screamer does not read prose. It operates on structured facts.
|
|
Either the archivist still extracts triples (which is LLM work), or some new
|
|
mechanism parses prose into triples deterministically (which is NLP at a level
|
|
that does not exist in open-source Lisp).
|
|
- Even after the flip, facts from the pre-flip period carry =:provenance
|
|
:llm-proposed= and are therefore suspect. The pre-flip facts were admitted
|
|
against fewer non-lossy facts, meaning Screamer's consistency checks were
|
|
weaker. A fact admitted during the seed phase may be wrong but undetected
|
|
because there were no contradicting facts at the time. Re-verifying all pre-
|
|
flip facts against the current fact store is described as a heartbeat task but
|
|
the cost (millions of Screamer checks) is not estimated.
|
|
|
|
The flip is a beautiful narrative. It may also be a mirage — the system may
|
|
achieve high sufficiency in narrow domains (security, filesystem, coding) and
|
|
never approach it in the broader memex (literature, personal reflection, daily
|
|
life). If the broader memex is the use case, the flip may never happen.
|
|
|
|
*** The archivist's extraction cost is unaccounted
|
|
|
|
The archivist calls the LLM to extract triples from prose, with "a minimal prompt
|
|
(~200 tokens)." Over a personal memex with thousands of entries — a decade of
|
|
diary entries, hundreds of literature notes, dozens of project logs — the
|
|
extraction cost is substantial.
|
|
|
|
Assume 5,000 headings, 200 tokens per heading prompt, and an LLM that returns
|
|
~100 tokens of structured triples per heading. That's 1.5 million tokens for the
|
|
initial extraction, plus verification tokens (Screamer checks cost 0 LLM tokens,
|
|
but incorrect proposals generate feedback that may trigger re-extraction). At
|
|
current API prices (~$0.15 per million input tokens for GPT-4o-mini), the cost
|
|
is modest (~$0.25). But at scale — re-extraction after ontology changes,
|
|
continuous extraction as new content is added, extraction for all incoming Agora
|
|
Notes — the cost accumulates.
|
|
|
|
More importantly, the extraction latency is human-noticeable. 5,000 headings at
|
|
1 second per LLM call is ~1.4 hours of extraction time. The system needs to
|
|
either batch-extract on startup (making cold starts slow) or extract lazily on
|
|
first query (making first queries slow). Neither is ideal.
|
|
|
|
The notes trumpet the token savings from deterministic gates and Screamer
|
|
deductions (valid — those cost 0 tokens) but the archivist's extraction cost is
|
|
the system's single largest recurring LLM expense, and it is mentioned only in
|
|
passing.
|
|
|
|
*** The Agora integration is clean in theory, undefined in practice
|
|
|
|
The "Passepartout IS the PDS" claim is elegant: the =memory-object= struct IS
|
|
the Note format, the Merkle DAG IS the Key Event Log, the fact store IS the
|
|
reputation system. But:
|
|
|
|
- An Agora PDS needs to serve HTTP APIs for thin clients. The daemon speaks a
|
|
framed TCP protocol over a local port. Extending it to serve HTTPS with
|
|
DIDComm endpoints, subscription management, and Relay push/pull is a
|
|
substantial engineering effort.
|
|
- The PDS needs to manage encrypted storage — client-side encrypted content that
|
|
the PDS itself cannot read. Passepartout's vault stores credentials with
|
|
integrity hashes but does not currently manage per-Note encryption with
|
|
audience-specific keys.
|
|
- The Relay Network is described as an intelligent communication backbone with
|
|
pub/sub routing. Passepartout has no Relay implementation, no Relay-facing API,
|
|
and no subscription management beyond its own event orchestrator.
|
|
- Agora's contract system (SCAL contracts, HODL invoices, arbitration tiers)
|
|
requires state machines and Lightning Network integration that Passepartout
|
|
has no primitives for.
|
|
- The "Passepartout IS the PDS" vision conflates two things: the data model
|
|
(Org files = Notes) and the infrastructure (a process that serves a network
|
|
protocol). The data model unification is clean and right. The infrastructure
|
|
unification implies Passepartout grows from a local agent to a network server
|
|
— a significant architectural expansion that the notes treat as a ~40-line
|
|
utility.
|
|
|
|
*** No adversarial model
|
|
|
|
The notes describe layered authentication (crypto, sensory, deterministic,
|
|
probabilistic) and type-level gates as structural safety. They do not describe
|
|
an adversarial model:
|
|
|
|
- What stops a malicious Agora Note from containing 100,000 triples that flood
|
|
the fact store?
|
|
- What stops a DID from publishing Notes that deliberately inject contradictions
|
|
to force Screamer into exponential backtracking?
|
|
- What stops a compromised sensor key from signing valid sensor data that is
|
|
adversarially crafted (e.g., video frames designed to trigger specific vision
|
|
model false positives)?
|
|
- What stops a spam DID from creating millions of Personas and flooding the
|
|
user's incoming Notes directory?
|
|
|
|
The resource monitor (Phase 1a) handles storage pressure generically. The
|
|
quarantine system handles individual DIDs flagged for spam. But none of these
|
|
are adversary-aware — they react to symptoms (disk full, error rate high) rather
|
|
than anticipating attack patterns. An adversarial model would identify these
|
|
vectors and design mitigations specifically. The notes describe a system that
|
|
works in a cooperative environment, not an adversarial one.
|
|
|
|
*** The self-repair criterion creates a two-tier architecture
|
|
|
|
The AGENTS.md rule — "default: everything is a skill" — means the symbolic
|
|
engine (Screamer, VivaceGraph, fact store, archivist, ACL2, planner) is all
|
|
skills, not core. This is correct for the self-repair criterion: a corrupted
|
|
skill degrades the agent but doesn't kill it. A corrupted core file kills the
|
|
brainstem.
|
|
|
|
But it creates a tension: the symbolic engine IS the reasoning layer that would
|
|
diagnose and repair a corrupted skill. If the fact store itself is corrupted
|
|
(impossible facts, inconsistent cardinality, broken Merkle chains), the engine
|
|
that detects corruption is the engine that is corrupted. The system needs a
|
|
"repair from below" path — a minimal core that can purge and rebuild the symbolic
|
|
index without depending on the symbolic index. This path exists (the fact store
|
|
is ephemeral in Phase 1-4 and rebuildable from prose in Phase 5+) but is not
|
|
exercised automatically. A corruption in the symbolic engine requires human
|
|
detection and manual rebuild — the exact problem the self-repair criterion was
|
|
designed to avoid.
|
|
|
|
** Opportunities
|
|
|
|
*** A memory prosthesis that makes your own mind legible
|
|
|
|
The symbolic index, when populated and queried, answers questions that no
|
|
existing tool can:
|
|
|
|
- "What did I believe about monorepos in 2023, and how has that changed?"
|
|
- "Which of my diary entries contradict each other?"
|
|
- "What entities in my memex have no connection to any other entity?"
|
|
- "Show me everything I've written about Nabokov, organized by when I wrote it,
|
|
what I was reading at the time, and what I concluded."
|
|
- "Which of my project plans reference security assumptions that I later changed?"
|
|
- "What did I think about this topic, and why did I change my mind?"
|
|
|
|
These are not information retrieval queries. They are self-knowledge queries.
|
|
They require provenance chains, temporal versioning, contradiction surfacing, and
|
|
cross-domain linkage — all of which the architecture provides as first-class
|
|
capabilities. If this works, it transforms the memex from a searchable archive
|
|
into a thinking partner that knows the history of your thoughts.
|
|
|
|
*** Deterministic reasoning as a moat
|
|
|
|
Every competitor agent system (Claude Code, OpenCode, OpenClaw, Hermes, Cognee,
|
|
Mem0) uses neural-only reasoning. They are all vulnerable to the same failure
|
|
mode: the LLM hallucinates a fact or an action, and there is no second system to
|
|
catch it. Their safety is heuristic. Their memory is flat. Their reasoning is
|
|
unprovable.
|
|
|
|
Passepartout's architectural bet — a symbolic engine that verifies, deduces, and
|
|
audits — creates a category difference, not a performance difference. If the bet
|
|
pays off, Passepartout is not "a better AI agent." It is a different kind of
|
|
system — one whose reasoning is provable, whose memory is content-addressed, and
|
|
whose knowledge accumulates through deduction rather than re-prompting.
|
|
|
|
This is a genuine moat. It cannot be replicated by adding a better system prompt
|
|
or a larger context window. It requires building the ontology, the constraint
|
|
solver, the fact store, and the provenance tracker — work that takes years and
|
|
cannot be shortcut by spending more on inference.
|
|
|
|
*** Agora as the first sovereign agent network
|
|
|
|
If Passepartout serves as the PDS and an Agora Persona, then AI agents can:
|
|
|
|
- Publish verified outputs as signed Notes with cryptographic provenance.
|
|
Readers know the agent produced the output, not a human impersonating the
|
|
agent.
|
|
- Accept invocation Notes from other persona owners. "Please analyze this
|
|
contract and publish your findings." The agent receives the request as an
|
|
Agora Note, processes it, signs the response, and publishes it.
|
|
- Build reputation through auditable chains of signed work products, not through
|
|
self-reported claims.
|
|
- Participate in the compute marketplace as both consumer and provider.
|
|
- Maintain sovereign identity — the agent's DID is independent of any platform,
|
|
any provider, any human account.
|
|
|
|
This is not a chatbot on a messaging platform. It is an autonomous entity on a
|
|
decentralized network, with cryptographic identity, verifiable provenance, and
|
|
economic agency. If Agora reaches even Order 1 (the first 1,000 users),
|
|
Passepartout agents become some of the most capable participants on the network.
|
|
|
|
*** The 10-80-10 ratio for coding is genuinely achievable
|
|
|
|
For a coding agent — the domain that Passepartout currently operates in — the
|
|
10-80-10 ratio is plausible. The existing Dispatcher already verifies every
|
|
action deterministically. Adding Screamer for consistency checking, VivaceGraph
|
|
for dependency queries, and ACL2 for structural verification would shift the
|
|
ratio from the current ~95-5-0 (neural-gate-symbolic) toward 50-40-10 in the
|
|
near term and potentially 10-80-10 in the long term.
|
|
|
|
The bootstrapped gate facts already cover file classifications, command safety,
|
|
path protections, and tool permissions — the core categories for a coding agent.
|
|
The archivist's extraction from project files would add dependency information,
|
|
test coverage, and code structure facts. The planner could reason about
|
|
refactoring order, dependency chains, and safety constraints deterministically.
|
|
This is the domain where the symbolic engine provides the most immediate value,
|
|
and it is the domain Passepartout already operates in.
|
|
|
|
*** Wikidata as an entity backbone unlocks cross-domain reasoning
|
|
|
|
Without Wikidata, the symbolic index for a general-knowledge memex is a sparse
|
|
set of personal facts with no connecting structure. With Wikidata, the entity
|
|
graph is pre-structured. The system can answer:
|
|
|
|
- "What does my memex say about Nabokov that Wikidata doesn't?"
|
|
- "Where does my memex disagree with Wikidata?"
|
|
- "What entities in my memex have no Wikidata counterpart?" (These are the
|
|
personal, novel, or subjective entities that are the most valuable.)
|
|
- "Show me the intersection of my literary interests (from diary) with Wikidata's
|
|
influence graph — which authors I read influenced each other in ways I haven't
|
|
written about?"
|
|
|
|
These are cross-domain queries that require both the personal memex (for what
|
|
the user knows) and Wikidata (for what the world knows). Neither alone can
|
|
answer them. Together, they enable a kind of knowledge synthesis that no existing
|
|
tool provides.
|
|
|
|
*** Ontology versioning enables "what-if" reasoning about one's own thinking
|
|
|
|
The ability to query across worldviews — "what did I believe before I changed my
|
|
security model?" — is a capability that has no analog in any existing tool. It
|
|
transforms the memex from a static archive into a dynamic record of intellectual
|
|
evolution. Combined with the temporal awareness system (Phase 0c), the system
|
|
could surface correlations: "You changed your mind about monorepos two weeks
|
|
after reading this article, which you bookmarked on this date, and one week
|
|
before starting this project that uses a monorepo structure." The provenance
|
|
chain IS the narrative of your thinking.
|
|
|
|
*** Contract-level pre-arbitration reduces the cost of decentralized commerce
|
|
|
|
Agora's Tier 0 Arbitrator — a local AI that provides evidence summaries before
|
|
human arbitration — is a genuinely useful role for a neurosymbolic system.
|
|
|
|
- "Contract CID X references arbitrator DID Y. DID Y is active. Verified."
|
|
- "All parties have signed. The HODL invoice is locked. Verified."
|
|
- "The buyer's claim of non-delivery is supported by 3 signed messages with
|
|
timestamps after the delivery deadline."
|
|
- "The seller's proof-of-delivery field is empty. No QR scan recorded."
|
|
|
|
Each check is a Screamer query against the contract-lifecycle domain. The results
|
|
are a plist, not a ruling. Both parties see the same evidence summary before
|
|
escalating. This makes Level 1 arbitration faster (arbitrators receive
|
|
pre-processed evidence bundles), cheaper (no human time spent on trivial
|
|
verification), and more transparent (both parties see the same machine-generated
|
|
summary).
|
|
|
|
This is not AI judging. This is AI preparing the docket. The distinction is
|
|
important and defensible.
|
|
|
|
*** Self-auditing agents could transform AI safety discourse
|
|
|
|
If Passepartout can answer =/audit= for any action or fact — showing the full
|
|
provenance chain, every gate that approved it, every fact that supported it,
|
|
every alternative that was considered — then AI safety moves from "trust us, we
|
|
tested it" to "here is the audit trail, verify it yourself."
|
|
|
|
This is the transparency that every AI safety framework calls for and none
|
|
delivers. It is possible because the architecture records provenance as a
|
|
first-class operation, not as an after-the-fact log. The provenance is the
|
|
operating system, not a logging layer.
|
|
|
|
*** The memex + Agora combination could be a new kind of social network
|
|
|
|
Current social networks (Twitter, Facebook, Reddit) separate the person from
|
|
their knowledge. You are a profile with posts. Your posts are isolated units
|
|
without connection to your broader intellectual life.
|
|
|
|
A Passepartout-powered Agora Persona would publish Notes that are grounded in
|
|
the memex: "Here is my analysis of /Pale Fire/, drawn from diary entries across
|
|
three years, annotated with Wikidata context, and verified against my existing
|
|
literary framework." The Note is cryptographically signed, carrying provenance
|
|
back to the specific Org headings that informed it. Readers see not just the
|
|
conclusion but the intellectual scaffolding that produced it.
|
|
|
|
This is not a "post." It is a publication — a knowledge artifact with verifiable
|
|
provenance, auditable reasoning, and cryptographic identity. If this becomes the
|
|
norm, it raises the standard for public discourse from "this is my opinion" to
|
|
"this is my opinion, here is the evidence, here is how it evolved, here is who
|
|
verified it."
|
|
|
|
** Threats
|
|
|
|
*** The ontology problem may be harder than anticipated
|
|
|
|
The notes are honest about this: "Whitehead's Principia Mathematica took over
|
|
300 pages to define the logical foundations before it could prove that 1+1=2."
|
|
Passepartout's domain is narrower (coding + personal knowledge) but the
|
|
ontology problem is the same category of problem. Every entity class must be
|
|
defined. Every relation must have clear semantics. Every inference rule must be
|
|
justified.
|
|
|
|
The gate-to-fact bootstrap provides 50-70 entity classes — enough for a coding
|
|
agent. But the broader memex contains orders of magnitude more entity types:
|
|
people, places, works, concepts, events, emotions, aesthetic judgments,
|
|
professional skills, personal projects, temporal patterns. Defining these as
|
|
triples with clear semantics is genuine intellectual work that no amount of
|
|
engineering can shortcut.
|
|
|
|
The risk is not that it's impossible. It's that it's slow — slow enough that
|
|
the system never achieves the density of facts needed for the "flip" in the
|
|
broader memex. The coding domain may reach sufficiency in months. The literary
|
|
domain may take years. The daily-reflection domain may never cross the
|
|
threshold because the facts involved (mood, insight, aesthetic experience) are
|
|
not formalizable as triples.
|
|
|
|
*** Screamer may not scale to the fact store size
|
|
|
|
The constraint satisfaction approach to consistency checking is elegant for a
|
|
seed fact set of hundreds of triples. It is unproven for millions of triples
|
|
(after Wikidata loading + years of personal extraction). The domain-scoping
|
|
strategy (Screamer only checks facts from the candidate's =:domain=) bounds the
|
|
constraint space, but the most valuable consistency checks are cross-domain:
|
|
|
|
- "You classified this file as public in your project notes but the gate stack
|
|
classifies it as secret." (project domain vs security domain)
|
|
- "You wrote that Nabokov influenced Kafka, but Wikidata says Kafka died before
|
|
Nabokov published his first novel." (literature domain vs Wikidata domain)
|
|
- "You planned to use this dependency, but the dependency's license changed in
|
|
a way that conflicts with your project's license." (project domain vs legal
|
|
domain)
|
|
|
|
If cross-domain checks are disabled for performance, the most valuable
|
|
contradictions are never detected. If they are enabled, the constraint space
|
|
explodes. There is no obvious sweet spot.
|
|
|
|
*** Wikidata quality may undermine trust in the symbolic index
|
|
|
|
If Wikidata facts are admitted with =:policy :plural= and the user sees
|
|
thousands of contradictions between Wikidata and their personal memex, the
|
|
symbolic index may feel less trustworthy, not more. "Wikidata says Mount Everest
|
|
is 8848m. DBpedia says 8849m. Your 2023 diary says 8848m. These three sources
|
|
disagree on height." This is correct behavior — surfacing disagreement with
|
|
provenance — but it may be overwhelming. The user wanted a knowledge base, not
|
|
a disagreement engine.
|
|
|
|
The trust problem is compounded by Wikidata's editorial biases. Wikidata
|
|
reflects the biases of Wikipedia editors: English-language dominance, Western
|
|
epistemological frameworks, systemic underrepresentation of non-Western
|
|
knowledge. A memex in Arabic that references Islamic philosophy, Egyptian
|
|
history, or African literature will find Wikidata's coverage thin, biased, or
|
|
absent. The symbolic index would dutifully surface these gaps — "your memex
|
|
mentions 47 entities with no Wikidata counterpart" — but it cannot fill them.
|
|
|
|
*** LLM cost and latency may prevent the archivist from keeping up
|
|
|
|
If the user writes a diary entry every day, the archivist must extract triples
|
|
from each new heading. If the extraction takes 1-3 seconds per heading, it's
|
|
background noise. But if the user imports 500 old diary entries, or the
|
|
archivist needs to re-extract after an ontology change, or Agora Notes arrive in
|
|
bulk from multiple follows, the extraction queue grows faster than it drains.
|
|
|
|
The notes describe extraction as a background task triggered by heartbeat, but
|
|
they don't specify the extraction rate limit. An unbounded queue with no rate
|
|
limit would consume the LLM budget. A bounded queue would fall behind. A lazy
|
|
extraction strategy (extract on first query) would make first queries slow.
|
|
A batch extraction on startup would make cold starts slow.
|
|
|
|
The archivist's throughput is gated by LLM API rate limits, token costs, and
|
|
inference latency. These are external constraints that the architecture cannot
|
|
eliminate. The symbolic engine can reduce LLM calls for reasoning; it cannot
|
|
reduce LLM calls for extraction from prose.
|
|
|
|
*** Agora may never reach network effects
|
|
|
|
Agora faces the cold start problem that every decentralized social protocol
|
|
faces: users won't join without content, creators won't post without users. The
|
|
bootstrapping strategy (managed service → hybrid → full decentralization,
|
|
targeting niche communities first) is well-articulated but its success depends
|
|
on execution in a market where Mastodon, Bluesky, Nostr, and Farcaster are
|
|
already competing for the same users.
|
|
|
|
If Agora doesn't reach even Order 1 (1,000 users), the PDS integration is
|
|
academic. Passepartout's DID identity, DIDComm gateway, Note signing, and
|
|
contract verification are all infrastructure for a network that doesn't exist.
|
|
The symbolic engine still works locally — provenance tracking, contradiction
|
|
surfacing, and deduction are all valuable without Agora. But the network effects
|
|
that make Agora a transformative platform — reputation, contracts, marketplaces,
|
|
collective governance — require a living network.
|
|
|
|
The risk is asymmetric: Passepartout invests significant engineering in Agora
|
|
integration that provides zero value if Agora fails to launch.
|
|
|
|
*** Complexity may prevent adoption
|
|
|
|
Passepartout is already a complex system: a Lisp daemon, a terminal UI, a skill
|
|
engine, a gate stack, multiple LLM backends, a Merkle memory system, and an
|
|
event orchestrator. Adding a fact store, a constraint solver, a graph database,
|
|
a theorem prover, an archivist, a planner, and an Agora PDS makes it more
|
|
complex, not less.
|
|
|
|
The target user — someone who wants a personal AI assistant that works offline —
|
|
may not want or need any of this. They want the TUI to work, the LLM to be fast,
|
|
and the files to stay safe. The neurosymbolic engine is infrastructure for a use
|
|
case (lifelong personal knowledge management with verifiable provenance) that
|
|
most users do not yet know they have.
|
|
|
|
The risk is that Passepartout builds a cathedral for a congregation of one — a
|
|
system that is architecturally brilliant and practically unused because the
|
|
complexity-to-value ratio is too high for anyone except the author.
|
|
|
|
*** The self-repair criterion may not hold under adversarial conditions
|
|
|
|
The architecture assumes that skills can fail gracefully (fboundp guards, hash
|
|
table fallbacks, degraded mode). It does not assume that a skill can be
|
|
adversarially corrupted to behave correctly while producing wrong results. A
|
|
compromised archivist that extracts plausible but false triples, a compromised
|
|
Screamer that passes all consistency checks, a compromised VivaceGraph that
|
|
returns query results from a parallel graph — these are "living" skills that
|
|
would pass integrity checks and still poison the symbolic index.
|
|
|
|
The type-level gates prevent the LLM from modifying gate code. They do not
|
|
prevent a compromised skill (loaded by a trusted human, or corrupted on disk by
|
|
a separate process) from operating normally while subtly wrong. The integrity
|
|
monitoring (Phase 0) catches disk-level corruption through hash checks. It does
|
|
not catch semantic corruption — a skill that is byte-for-byte identical to the
|
|
known-good version but loaded with a malicious input that triggers a latent bug.
|
|
|
|
This is not a vulnerability unique to Passepartout. It is a vulnerability in
|
|
every system where components trust each other. But Passepartout's architecture
|
|
amplifies the risk because the symbolic engine is supposed to be the trustworthy
|
|
layer — the component that verifies the LLM's output. If the symbolic engine
|
|
itself is compromised, the system has no higher court of appeal.
|
|
|
|
*** The 10-80-10 ratio may create false confidence
|
|
|
|
If the sufficiency metric shows "71% non-lossy, threshold 70%, mode: AUTO-
|
|
EXTRACTION," the user may assume the system is trustworthy. But sufficiency is
|
|
global — it aggregates across all domains. The system may have 95% sufficiency
|
|
in the security domain and 5% sufficiency in the literary domain, averaging to
|
|
71%. The auto-extraction switch would bypass the LLM for all categories with
|
|
sufficient coverage, but the threshold is global, not per-domain. A literary
|
|
query would hit the symbolic index that has "sufficient" coverage globally but
|
|
insufficient coverage for literature.
|
|
|
|
The notes describe domain-scoped Screamer checks but not domain-scoped
|
|
sufficiency. A global sufficiency metric that triggers a global extraction mode
|
|
change is the wrong granularity. Per-domain sufficiency, with per-domain
|
|
extraction mode, would be more complex but more honest. The architecture as
|
|
described has the simpler, more dangerous version.
|
|
|
|
** Summary Matrix
|
|
|
|
| | Positive | Negative |
|
|
|-----------+---------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------|
|
|
| INTERNAL | S: Architectural inversion, unified Org format, provenance as product, | W: Unproven fact language, Screamer scale unverified, extraction cost hidden, |
|
|
| | cardinality model, gate-to-fact bootstrap, self-preservation, organic ontology, | flip underspecified, adversarial model absent, self-repair tension, |
|
|
| | Wikidata as accelerator, decoupled compute cost | Agora integration scope undefined, per-domain sufficiency missing |
|
|
|-----------+---------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------|
|
|
| EXTERNAL | O: Memory prosthesis, deterministic moat, sovereign agent network, | T: Ontology may be harder than expected, Screamer may not scale, |
|
|
| | 10-80-10 for coding achievable, Wikidata cross-domain queries, | Wikidata quality/trust, LLM extraction bottleneck, Agora network effects, |
|
|
| | ontology versioning, contract pre-arbitration, self-auditing safety, | complexity-to-adoption ratio, adversarial semantic corruption, |
|
|
| | knowledge-based social network | false confidence from global sufficiency metric |
|
|
|
|
* What This Unlocks
|
|
|
|
** Technologically
|
|
|
|
The neurosymbolic engine, if built, would be the first AI system where:
|
|
|
|
1. *Reasoning is auditable.* Every conclusion carries a provenance chain back to
|
|
its premises. The =/audit= command renders the full inference tree — every
|
|
fact, every deduction, every gate outcome — in human-readable form.
|
|
|
|
2. *Knowledge accumulates deterministically.* Screamer deductions and gate
|
|
outcomes generate new facts without any LLM involvement. The knowledge base
|
|
grows from the system's own operation, not from re-prompting the LLM.
|
|
|
|
3. *Memory is content-addressed.* Every fact is a Merkle node. Every version
|
|
chain is tamper-proof. Rollback is atomic. The storage format is proven
|
|
correct before it is committed to disk.
|
|
|
|
4. *Safety is provable, not empirical.* Type-level gates make self-modification
|
|
structurally impossible. ACL2 proves that the rule set has no contradictions.
|
|
The dispatcher doesn't "try" to be safe — it is safe by construction.
|
|
|
|
5. *The human and the machine share the same format.* Org files for both. No
|
|
hidden database. No import/export step. The agent's memory IS the human's
|
|
memory.
|
|
|
|
These five properties, together, define a new category of AI system: the
|
|
*sovereign reasoning agent*. Not sovereign in the blockchain sense (decentralized
|
|
by consensus), but sovereign in the personal sense: the agent runs on your
|
|
hardware, reasons with your knowledge, and proves its reasoning to you.
|
|
|
|
** Socially
|
|
|
|
If the technical vision succeeds and Agora reaches network effects, the
|
|
combination unlocks:
|
|
|
|
1. *Verifiable public discourse.* Every published claim carries provenance back
|
|
to source material. "I read this, I thought this, I changed my mind on this
|
|
date, here is the evidence." Public discourse shifts from "competing opinions"
|
|
to "competing evidence chains." The quality floor rises because claims without
|
|
provenance are visibly weaker than claims with provenance.
|
|
|
|
2. *Sovereign AI agents with legal and economic personhood.* A Passepartout
|
|
agent with an Agora Persona can own assets, enter contracts, earn reputation,
|
|
and face consequences for failure. This is not a chatbot. It is an autonomous
|
|
entity with cryptographic identity, verified provenance, and economic agency
|
|
— more like a corporation than a tool.
|
|
|
|
3. *Self-auditing AI safety.* Every action the agent takes is traceable. Every
|
|
gate decision is recorded. Every fact that informed a decision is queryable.
|
|
AI safety moves from "trust us" to "here is the audit trail." This is the
|
|
transparency that every AI ethics framework calls for.
|
|
|
|
4. *A personal knowledge economy.* If your memex can publish Notes as Agora
|
|
content, your intellectual work — your analyses, your syntheses, your
|
|
discoveries — becomes a publishable, attributable, monetizable asset. Not
|
|
through advertising or subscriptions, but through direct value exchange:
|
|
Lightning payments for content access, contract work for your verified
|
|
expertise, reputation that follows your Persona across platforms.
|
|
|
|
5. *Collective intelligence without centralized control.* If multiple
|
|
Passepartout agents share facts through Agora Notes, the collective symbolic
|
|
index represents the verified, provenanced knowledge of a community — not the
|
|
averaged opinion of a crowd, but the auditable intersection of independently
|
|
verified claims. This is Wikipedia without the editorial board, science
|
|
without the journal gatekeepers, journalism without the corporate owners.
|
|
|
|
6. *A memory prosthesis that outlives the individual.* A memex with a decade of
|
|
diary entries, linked to Wikidata's entity graph, with Screamer deductions
|
|
surfacing patterns and contradictions, with ontology versioning preserving
|
|
intellectual evolution — this is not a knowledge management tool. It is an
|
|
externalized, queryable, auditable record of a life's thinking. It is what
|
|
Vannevar Bush imagined in 1945: "an enlarged intimate supplement to one's
|
|
memory."
|
|
|
|
* Conclusion
|
|
|
|
The architecture described in these notes is genuinely novel. Not incrementally
|
|
novel — most agent architectures are variations on "LLM + tools + prompt-based
|
|
safety." Passepartout's neurosymbolic vision is categorically different: an
|
|
inversion where the deterministic layer judges the probabilistic layer, where
|
|
facts carry provenance chains, where contradiction is a feature rather than an
|
|
error, and where the user's Org files are the single source of truth for both
|
|
human and machine.
|
|
|
|
The largest risk is not that the architecture is wrong. It is that the ontology
|
|
problem — the genuine difficulty of defining what a "fact" is, what relations
|
|
are, what categories are useful, and how they evolve — is harder than the notes
|
|
anticipate, and that the system spends years in a partially-working state where
|
|
the symbolic index is too sparse to be useful but too entangled to be discarded.
|
|
|
|
The second-largest risk is that Agora never reaches the network effects needed
|
|
to make the PDS integration valuable beyond a local experiment, and that the
|
|
engineering investment in DIDComm gateways, Note signing, contract verification,
|
|
and Relay integration produces infrastructure for a network that doesn't exist.
|
|
|
|
The opportunity is equally large: a system that makes your own mind legible to
|
|
you, that proves its reasoning rather than asserting it, that accumulates
|
|
knowledge across sessions through deduction rather than re-prompting, and that
|
|
publishes verified, provenanced knowledge to a decentralized network. If this
|
|
works — even partially, even slowly — it is a category-level advance over every
|
|
existing agent architecture and every existing personal knowledge management
|
|
tool.
|
|
|
|
The notes are a map of territory that no one has walked. The territory is real.
|
|
The map is detailed enough to navigate by. Whether the journey completes depends
|
|
on whether the ontology problem yields to engineering, and whether the user —
|
|
the one human whose memex this serves — finds value in the partial system well
|
|
before the full vision materializes.
|