Compare commits
4 Commits
4e9431ec1d
...
0290feccc1
| Author | SHA1 | Date | |
|---|---|---|---|
| 0290feccc1 | |||
| f6094abb7b | |||
| e719443ce7 | |||
| 04944a62e2 |
81
AGENTS.md
81
AGENTS.md
@@ -1,77 +1,12 @@
|
||||
# AGENTS.md
|
||||
|
||||
## Development Cycle (every change)
|
||||
This is the memex monorepo. It contains multiple Common Lisp projects, each
|
||||
in `projects/`. See `projects/AGENTS.md` for the general development workflow
|
||||
(ROADMAP-driven, TDD in REPL, literate programming, branch policy).
|
||||
|
||||
1. **Think in org** — write your reasoning, goals, and approach in the .org file first
|
||||
2. **Write contract** — define a `** Contract` section listing each function's behavior:
|
||||
`(fn-name args)`: description. Returns/guarantees ...
|
||||
3. **TDD from contract** — each contract item becomes a `fiveam:test` in `* Test Suite`
|
||||
a. Write the test first → tangle → run → prove it FAILS (RED)
|
||||
b. Write the implementation → tangle → run → prove it PASSES (GREEN)
|
||||
c. Record both failure and success output
|
||||
4. **Reflect in org** — once tests pass, ensure the implementation is in the .org source
|
||||
5. **Update literate prose** — write/update the explanatory text around the code:
|
||||
what it does, why it exists, how it connects to the rest of the system
|
||||
6. **Commit** — only when asked. Ask first.
|
||||
## Project list
|
||||
|
||||
## Commands
|
||||
|
||||
Tangle a single file:
|
||||
emacs --batch --eval "(progn (require 'org) (find-file \"org/FILE.org\") (org-babel-tangle) (kill-buffer))"
|
||||
|
||||
Validate structural integrity:
|
||||
emacs --batch -Q --eval '(progn (find-file "org/FILE.org") (check-parens) (kill-buffer))'
|
||||
|
||||
Run tests:
|
||||
sbcl --noinform \
|
||||
--eval '(load (merge-pathnames "quicklisp/setup.lisp" (user-homedir-pathname)))' \
|
||||
--eval '(ql:quickload :passepartout :silent t)' \
|
||||
--eval '(load "lisp/FILE.lisp")' \
|
||||
--eval '(fiveam:run (intern "SUITE-NAME" :passepartout-TESTS))' --quit
|
||||
|
||||
For error details: bind fiveam:*on-failure* to :debug
|
||||
|
||||
## REPL (port 9105) — preferred when available
|
||||
|
||||
Start: `passepartout daemon`
|
||||
Send code:
|
||||
msg = '(:type :event :payload (:sensor :repl-eval :code "(+ 1 2)"))'
|
||||
s.sendall(f'{len(msg):06x}'.encode() + msg.encode())
|
||||
|
||||
When REPL is up: TDD in-image first, then reflect to .org and tangle.
|
||||
When REPL is down: fall back to the SBCL cycle above.
|
||||
|
||||
## Rules
|
||||
|
||||
- .org is source of truth; .lisp is generated — never edit .lisp directly
|
||||
- Every code change starts with a contract and a failing test
|
||||
- Prove RED before writing implementation
|
||||
- Validate before committing
|
||||
- If a tool fails, explain why and ask before trying alternatives
|
||||
- Before shipping a version, run the `** File Update Checklist` in `docs/ROADMAP.org`
|
||||
- **YOU MAY NOT** push a version tag (e.g., `v0.5.0`), create a GitHub release, or run `git push`
|
||||
that triggers CI/CD version workflows without explicit permission. Ask first.
|
||||
|
||||
## Core Boundary (HARD RULE)
|
||||
|
||||
- **YOU MAY NOT add files to `passepartout.asd` `:components` without asking for permission.**
|
||||
ASDF `:components` is the core harness. Files there load on every daemon boot,
|
||||
cannot be hot-reloaded, and a bug there kills the agent's brainstem.
|
||||
|
||||
- When you want to add a new module, **ask first**. Provide:
|
||||
1. Why it cannot be a skill (the self-repair criterion — can the agent fix it
|
||||
if corrupted without human help?) Demonstrate specifically how a broken
|
||||
version of this file prevents the agent from perceiving, reasoning,
|
||||
or acting — not just degrading performance or losing a feature.
|
||||
2. What it depends on and what depends on it
|
||||
3. Why it cannot use `fboundp` guards from core
|
||||
|
||||
- **Default: everything is a skill.** Skills load via `skill-initialize-all`,
|
||||
are hot-reloadable, self-repairable, and a bug in a skill degrades the agent
|
||||
but doesn't kill it. The harness stays thin.
|
||||
|
||||
- **The self-repair criterion**: a file belongs in core only if, when corrupted,
|
||||
the agent *cannot* fix it without human help. Corrupted core = dead brain,
|
||||
dead hands, or unreachable. Corrupted skill = degraded but self-repairable.
|
||||
This criterion is documented in `docs/ARCHITECTURE.org` and
|
||||
`docs/DESIGN_DECISIONS.org`.
|
||||
| Project | Description | Runtime |
|
||||
|---------|-------------|---------|
|
||||
| passepartout | Neurosymbolic agent | `passepartout daemon` |
|
||||
| cl-tui | Reusable terminal UI framework | `sbcl` + `(ql:quickload :cl-tui)` |
|
||||
|
||||
868
notes/passepartout-SWOT.org
Normal file
868
notes/passepartout-SWOT.org
Normal file
@@ -0,0 +1,868 @@
|
||||
#+TITLE: Passepartout Neurosymbolic + Agora Integration — SWOT Analysis
|
||||
#+AUTHOR: Agent
|
||||
#+FILETAGS: :notes:analysis:swot:passepartout:agora:neurosymbolic:
|
||||
#+CREATED: [2026-05-09 Sat]
|
||||
|
||||
* Premise and Scope
|
||||
|
||||
This analysis assumes the engineering is possible — Screamer can be wrapped,
|
||||
VivaceGraph can persist facts, ACL2 can verify structural properties, the
|
||||
archivist can extract triples from prose with Screamer verification, and the
|
||||
note-publishing bridge to Agora can be implemented. The question is not "can it
|
||||
be built?" but "does the architecture cohere? What does it enable? What does it
|
||||
miss?"
|
||||
|
||||
* Will It Work Conceptually?
|
||||
|
||||
The short answer: yes, within a specific domain. The long answer: the boundary of
|
||||
that domain is the most important thing to get right.
|
||||
|
||||
** The architecture's core insight is correct and load-bearing
|
||||
|
||||
The central design decision — "the LLM proposes; the symbolic engine decides
|
||||
whether to accept" — is sound. It is the inverse of every existing agent
|
||||
architecture. Claude Code, OpenCode, Hermes — all of them put the LLM in the
|
||||
driver's seat and add safety as an afterthought (prompt-based guardrails that
|
||||
consume tokens and can be evaded). Passepartout inverts this: the LLM proposes
|
||||
actions and facts, but a deterministic layer of gates, constraint solvers, and
|
||||
formal verifiers decides what to admit and what to execute. This inversion is the
|
||||
correct response to the hallucination problem. You cannot eliminate hallucination
|
||||
by making the LLM better. You eliminate it by not asking the LLM to do things
|
||||
that require certainty.
|
||||
|
||||
The bootstrap mechanism — extracting 50-70 entity classes mechanically from the
|
||||
existing Dispatcher gate stack with zero new code — is genuinely elegant. It
|
||||
proves the pattern at minimal cost: code becomes facts, facts enable reasoning.
|
||||
Every new gate pattern adds to the ontology organically. This is the right way to
|
||||
start a knowledge base: not by designing a schema upfront, but by formalizing what
|
||||
the system already knows implicitly.
|
||||
|
||||
** The "one memex, two indices" architecture survives contact with reality
|
||||
|
||||
Option 4 (one memex with neural and symbolic indices over the same Org files) is
|
||||
the correct long-term architecture. The prose is the ground truth — always. The
|
||||
symbolic index is a derived view that can be thrown away and rebuilt. The neural
|
||||
index handles semantic search, associative leaps, and fuzzy matching. This
|
||||
division of labor is permanent, not transitional, because the domains they serve
|
||||
are fundamentally different kinds of knowledge.
|
||||
|
||||
The practical path — starting with Option 5 (ephemeral facts, no persistence)
|
||||
through Phases 1-4, then graduating to Option 4 with VivaceGraph persistence in
|
||||
Phase 5 — is the right sequence. It punts the serialization format problem until
|
||||
the fact language has been battle-tested. It keeps the cost of mistakes low. It
|
||||
treats the ontology as something discovered through use rather than designed
|
||||
upfront.
|
||||
|
||||
** Wikipedia's ontology WOULD give it a running start — with caveats
|
||||
|
||||
Wikidata contains approximately 100 million entities with a decade of human
|
||||
curation: type hierarchies, relations, dates, citations, disambiguation. For a
|
||||
personal memex that mentions Nabokov, /Pale Fire/, Kafka, postmodernism, and
|
||||
butterfly migration, the gate stack's 50-70 entity classes is starvation.
|
||||
Organic growth through prose extraction would take years to cover the entities in
|
||||
one person's engagement with a single novel.
|
||||
|
||||
Loading Wikidata's entity graph into the symbolic index transforms the
|
||||
archivist's job from "discover that Nabokov wrote /Pale Fire/" to "connect your
|
||||
heading to Wikidata entity Q36591." The second task is reference resolution, not
|
||||
knowledge extraction — simpler, more reliable, and in many cases doable without
|
||||
an LLM at all (string match against loaded entities). The notes claim this
|
||||
collapses the LLM's role to three thin boundaries: input translation, prose-to-
|
||||
candidate-triple for personal content, and result-to-prose formatting.
|
||||
|
||||
The caveats are real:
|
||||
|
||||
- Entity resolution (matching prose mentions to Wikidata entities) is genuinely
|
||||
hard. "Nabokov" in a diary might refer to Vladimir Nabokov (Q36591), his son
|
||||
Dmitri (Q566744), or someone else entirely. Disambiguation requires context
|
||||
that the symbolic engine doesn't have without LLM assistance.
|
||||
- Wikidata is biased toward English Wikipedia's coverage. A memex in Arabic,
|
||||
Farsi, or Amharic will find far fewer resolved entities. The "universal" in
|
||||
Wikidata is aspirational, not actual.
|
||||
- Wikidata's property graph is not a ontology in the formal sense — it's a
|
||||
collaboratively edited dataset with contradictions, gaps, and editorial wars
|
||||
frozen in time. Loading it directly into a symbolic index that expects
|
||||
consistency (Screamer checks, cardinality policies) will surface thousands of
|
||||
contradictions on ingest, many of which are Wikidata artifacts, not meaningful
|
||||
tensions.
|
||||
- N-hop expansion is unbounded. One hop from Nabokov hits hundreds of entities
|
||||
(his works, his family, his influences, his translators). Two hops hits
|
||||
thousands. Three hops hits tens of thousands. The notes say "3-4 hops" for a
|
||||
literary memex but don't estimate the entity count this implies. The claim that
|
||||
5 million entities = ~400MB is the best-case hash-table figure; a graph with
|
||||
query indices will be larger, and Prolog-like queries over millions of nodes
|
||||
are not free.
|
||||
|
||||
Still: even a partial Wikidata load with conservative hop limits would provide
|
||||
more ontology than the system could accumulate through years of organic growth.
|
||||
It is the right accelerator, and the architecture handles it correctly — Wikidata
|
||||
facts are admitted with =:provenance :wikidata= and =:policy :plural=, meaning
|
||||
they sit alongside personal facts without overriding them. Disagreements are
|
||||
surfaced, not resolved. The architecture treats Wikidata as evidence from an
|
||||
external source, not as ground truth. That's the correct posture.
|
||||
|
||||
** Cardinality policies are the right abstraction for contradiction
|
||||
|
||||
The =:singular= / =:dual= / =:plural= cardinality model is one of the most
|
||||
important ideas in these notes. Classical logic requires consistency — a
|
||||
contradiction implies everything (ex contradictione quodlibet). A constraint
|
||||
solver like Screamer also requires consistency — a contradictory constraint set
|
||||
has no solutions. But a personal memex operates across domains where the meaning
|
||||
of contradiction is fundamentally different:
|
||||
|
||||
- "rm -rf / is catastrophic" is =:singular= — there is one truth that evolves
|
||||
over time.
|
||||
- "I loved this person AND I resented them" is =:dual= — the tension IS the
|
||||
fact.
|
||||
- "Wikidata says Everest is 8848m; DBpedia says 8849m; my 2023 diary says
|
||||
8848m" is =:plural= — multiple sources disagree, and surfacing the disagreement
|
||||
with provenance is the product.
|
||||
|
||||
This is a genuinely novel contribution to knowledge representation. Most
|
||||
knowledge graphs (Wikidata, Freebase, DBpedia) don't model contradiction at all —
|
||||
they pick one value and discard the rest. Most constraint solvers reject
|
||||
contradiction as error. Passepartout's cardinality model makes contradiction a
|
||||
first-class citizen: you can query the fact that "I used to believe X until
|
||||
Tuesday, then Y," or "these three sources disagree on height," or "I hold these
|
||||
two positions in tension." The symbolic engine's job is not to decide which is
|
||||
right. It is to surface the tension with provenance.
|
||||
|
||||
This alone, if implemented correctly, would be a category-level advance over
|
||||
every existing personal knowledge management tool.
|
||||
|
||||
** Ontology versioning is the right approach to the migration problem
|
||||
|
||||
Every knowledge base eventually faces schema migration — you split =:secret-file=
|
||||
into =:crypto-secret= and =:plaintext-secret=, and now every deduction that
|
||||
crossed the old category boundary is suspect. The standard approach is batch
|
||||
UPDATE operations that overwrite the past. Passepartout's approach — the category
|
||||
hierarchy itself is a Merkle tree, every fact stores the =:ontology-version= at
|
||||
assertion time, category changes trigger re-verification rather than remapping —
|
||||
preserves all worldviews. You can query "what did I believe about secrets before
|
||||
I refined my security model?" This is not querying a fact. It is querying the
|
||||
history of your own thinking.
|
||||
|
||||
This is the kind of capability that no existing tool provides, and it flows
|
||||
directly from the architecture. If the Merkle DAG infrastructure exists (it does,
|
||||
from v0.2.0), ontology versioning is ~40 lines on top of it. The conceptual
|
||||
design is sound. The engineering appears tractable.
|
||||
|
||||
* SWOT Analysis
|
||||
|
||||
** Strengths
|
||||
|
||||
*** Architectural inversion — proposer vs decider
|
||||
|
||||
The LLM proposes. The symbolic engine decides. This is the inverse of every
|
||||
existing agent architecture, and it solves the hallucination problem at the
|
||||
architectural level rather than the prompt-engineering level. No amount of
|
||||
prompt refinement can make a probabilistic system deterministic. But a
|
||||
deterministic admission gate can make a probabilistic proposer safe.
|
||||
|
||||
*** Unified container format (Org files)
|
||||
|
||||
Org files serve as the container for human prose, Lisp source code, symbolic
|
||||
facts, and Agora Notes. One format, one toolchain, one Merkle tree, one version
|
||||
control system. If Passepartout stops existing, the data survives in plain text.
|
||||
This is the hardest commitment in the design and the most undervalued. Most agent
|
||||
architectures store memory in JSONL transcripts, vector databases, or proprietary
|
||||
formats — opaque to the human and dependent on the tool. Passepartout's memory
|
||||
IS the human's memory, in the human's format.
|
||||
|
||||
*** Provenance as product
|
||||
|
||||
Every fact carries =:grounding= (the specific Org heading), =:provenance= (who
|
||||
or what produced it), =:timestamp=, =:referenced-by=, =:contradicted-by=,
|
||||
=:superseded-by=. The =/audit= command renders the full provenance chain. In the
|
||||
broader memex, the value is not the verified fact ("this command is safe"). It
|
||||
is the provenance itself: "this claim originated in that diary entry, has been
|
||||
referenced 7 times across 4 projects, was contradicted 6 months later, and was
|
||||
revised 3 weeks after that." This is a memory prosthesis that makes your own mind
|
||||
legible to you.
|
||||
|
||||
*** Gate-to-fact bootstrap — ontology from existing code
|
||||
|
||||
The existing Dispatcher gate stack encodes an implicit ontology (categories of
|
||||
secrets, destructive commands, trusted domains, core files). The bootstrap
|
||||
extracts this mechanically — zero LLM tokens, zero human authoring, ~30 lines of
|
||||
Lisp. This proves the pattern and provides the seed ontology without any new
|
||||
infrastructure. Every new gate pattern added by the human (HITL approvals that
|
||||
become rules) extends the ontology automatically.
|
||||
|
||||
*** Self-preservation architecture
|
||||
|
||||
The Third Law implementation — quarantine on skill failure, degraded-mode
|
||||
signaling, resource monitoring, external watchdog, refusal to self-terminate —
|
||||
is individually small (~20-50 lines each) and collectively transforms
|
||||
self-preservation from a passive architectural property into an active behavior.
|
||||
The key insight: the biggest gap is not that these mechanisms are hard. It is
|
||||
that degradation is currently silent. Making it visible is cheap and high-impact.
|
||||
|
||||
*** Cardinality policies as a solution to contradiction
|
||||
|
||||
The =:singular= / =:dual= / =:plural= model is novel in knowledge representation
|
||||
and directly addresses the hardest problem in a personal memex: that
|
||||
contradiction is the product, not the error. Bayesian knowledge bases, graph
|
||||
databases, and triple stores all struggle with contradiction. Passepartout's
|
||||
model makes it a feature.
|
||||
|
||||
*** Organic ontology growth
|
||||
|
||||
Categories emerge from the system's own operation: gate patterns → gate outcomes
|
||||
→ Screamer generalizations → archivist proposals → cross-domain overlap
|
||||
detection. The ontology is a garden, not a building. This avoids the Principia
|
||||
Mathematica problem — the need to define everything upfront — by replacing
|
||||
axiomatic design with evolutionary growth. Categories that aren't used fade.
|
||||
Categories that are contradictory are pruned. Categories that emerge from
|
||||
overlapping domains are promoted. The system converges on useful granularity
|
||||
through use.
|
||||
|
||||
*** Agora as provenance layer for networked knowledge
|
||||
|
||||
A BFT-timestamped triple store is one approach, but the Merkle DAG + DID
|
||||
signatures provide a lighter-weight alternative: every fact's provenance is
|
||||
content-addressed, every author's identity is cryptographically verifiable, and
|
||||
the DAG structure enables partial replication without consensus. This is more
|
||||
tractable than full BFT and sufficient for a personal memex that needs to share
|
||||
facts across a network.
|
||||
|
||||
*** Decoupling of compute cost from knowledge base size
|
||||
|
||||
LLM tokens are minimized by design — deterministic gates cost 0 tokens, sparse-
|
||||
tree rendering keeps context at 2,000-4,000 tokens, Screamer deductions cost 0
|
||||
tokens. Adding 5 million Wikidata entities does not add a single token to any LLM
|
||||
call. The variables that actually degrade performance — context window size, LLM
|
||||
call frequency, Screamer deduction budget — are all bounded independently of
|
||||
knowledge base size. This is a structural property: the education is local, only
|
||||
the brain costs.
|
||||
|
||||
** Weaknesses
|
||||
|
||||
*** The fact language is unproven and may be insufficient
|
||||
|
||||
Triples — =(:entity :relation :value)= with provenance and grounding — is the
|
||||
current hypothesis. It is simple enough to be parseable, expressive enough to
|
||||
capture the gate stack's implicit claims, and extensible enough that Screamer can
|
||||
operate on it. But:
|
||||
|
||||
- Triples cannot naturally express temporal relations. "Was X before Y?" requires
|
||||
reification (making the relation itself an entity), which makes queries
|
||||
exponentially more complex.
|
||||
- Triples cannot express modal claims. "Should not do X unless Y" has no natural
|
||||
triple representation. Neither does "could have done X but chose Y."
|
||||
- Triples cannot express counterfactuals. "If X had happened, Y would have
|
||||
followed." These are essential for the "what if" reasoning that a personal
|
||||
memex should support.
|
||||
- Triples struggle with n-ary relations. "Nabokov wrote Pale Fire in 1962 while
|
||||
living in Montreux" is a 4-ary relation (author, work, date, location), not a
|
||||
set of independent binary relations. Breaking it into triples loses the
|
||||
connection that binds them.
|
||||
- Triples cannot express negation cleanly. "Nabokov did NOT write Doctor Zhivago"
|
||||
requires a negative fact, which in a triple store with an open-world assumption
|
||||
means "not known" and "known not" are conflated.
|
||||
|
||||
The notes acknowledge this limitation but defer it. The right granularity
|
||||
"depends on what queries the planner actually needs to make, and that cannot be
|
||||
known in advance." This is honest but unsatisfying. If triples prove insufficient,
|
||||
the entire fact store, the Screamer integration, the VivaceGraph persistence, and
|
||||
the archivist's extraction format must be redesigned. The architecture has no
|
||||
intermediate fallback between "triples" and "something more expressive."
|
||||
|
||||
*** Screamer as admission gate is untested at this scale
|
||||
|
||||
Screamer is a constraint solver with non-deterministic backtracking. Using it
|
||||
to check a candidate triple against an existing fact store is conceptually
|
||||
elegant: express the fact store as constraint variables, assert the candidate,
|
||||
check solvability. But:
|
||||
|
||||
- Screamer was designed for constraint satisfaction problems with tens to
|
||||
hundreds of variables. A fact store with millions of triples (after Wikidata
|
||||
loading) is a constraint space orders of magnitude larger than Screamer's
|
||||
design envelope.
|
||||
- The consistency check is domain-scoped (only rules from the candidate's
|
||||
=:domain= apply), but cross-domain contradictions are the most valuable kind.
|
||||
"Nabokov was born in 1899" (literature domain) should be consistent with
|
||||
"Nabokov died in 1977" (history domain). If these are separate domains, the
|
||||
check misses contradictions; if they are unified, the constraint space
|
||||
explodes.
|
||||
- Screamer's non-deterministic backtracking is worst-case exponential. The notes
|
||||
bound this via deduction budget (=SCREAMER_DEDUCTION_BUDGET_MS=) but don't
|
||||
address the admission check itself, which runs on every assertion.
|
||||
|
||||
There is a risk that Screamer works beautifully for the gate-bootstrapped seed
|
||||
(50-70 entity classes, ~200 facts) and becomes unusably slow after Wikidata
|
||||
loading (millions of facts). The transition from "works" to "doesn't" may be
|
||||
gradual and hard to detect — the system gets slower but doesn't crash,
|
||||
degrading user experience without a clear diagnostic.
|
||||
|
||||
*** The "flip" from lossy to deterministic is underspecified
|
||||
|
||||
The architecture's central narrative arc is the "flip": at some point, the non-
|
||||
lossy facts constitute a sufficient foundation that the symbolic engine can
|
||||
reverse the flow — instead of LLM extraction, the symbolic engine reads prose
|
||||
through its own lens and deduces facts directly. The sufficiency metric
|
||||
(non-lossy / total > 0.7) makes this "computable and visible to the user."
|
||||
|
||||
But:
|
||||
|
||||
- The threshold (0.7) is arbitrary. It is not derived from empirical measurement,
|
||||
information theory, or constraint satisfaction theory. It is a guess.
|
||||
- Sufficiency is domain-specific, not global. The gate stack may have 0.95
|
||||
coverage of security classifications but 0.05 coverage of literary analysis.
|
||||
A global threshold of 0.7 hides the domains where the symbolic engine is still
|
||||
effectively blind.
|
||||
- The "flip" operation itself is not defined. "Screamer reads prose through its
|
||||
own lens" — Screamer does not read prose. It operates on structured facts.
|
||||
Either the archivist still extracts triples (which is LLM work), or some new
|
||||
mechanism parses prose into triples deterministically (which is NLP at a level
|
||||
that does not exist in open-source Lisp).
|
||||
- Even after the flip, facts from the pre-flip period carry =:provenance
|
||||
:llm-proposed= and are therefore suspect. The pre-flip facts were admitted
|
||||
against fewer non-lossy facts, meaning Screamer's consistency checks were
|
||||
weaker. A fact admitted during the seed phase may be wrong but undetected
|
||||
because there were no contradicting facts at the time. Re-verifying all pre-
|
||||
flip facts against the current fact store is described as a heartbeat task but
|
||||
the cost (millions of Screamer checks) is not estimated.
|
||||
|
||||
The flip is a beautiful narrative. It may also be a mirage — the system may
|
||||
achieve high sufficiency in narrow domains (security, filesystem, coding) and
|
||||
never approach it in the broader memex (literature, personal reflection, daily
|
||||
life). If the broader memex is the use case, the flip may never happen.
|
||||
|
||||
*** The archivist's extraction cost is unaccounted
|
||||
|
||||
The archivist calls the LLM to extract triples from prose, with "a minimal prompt
|
||||
(~200 tokens)." Over a personal memex with thousands of entries — a decade of
|
||||
diary entries, hundreds of literature notes, dozens of project logs — the
|
||||
extraction cost is substantial.
|
||||
|
||||
Assume 5,000 headings, 200 tokens per heading prompt, and an LLM that returns
|
||||
~100 tokens of structured triples per heading. That's 1.5 million tokens for the
|
||||
initial extraction, plus verification tokens (Screamer checks cost 0 LLM tokens,
|
||||
but incorrect proposals generate feedback that may trigger re-extraction). At
|
||||
current API prices (~$0.15 per million input tokens for GPT-4o-mini), the cost
|
||||
is modest (~$0.25). But at scale — re-extraction after ontology changes,
|
||||
continuous extraction as new content is added, extraction for all incoming Agora
|
||||
Notes — the cost accumulates.
|
||||
|
||||
More importantly, the extraction latency is human-noticeable. 5,000 headings at
|
||||
1 second per LLM call is ~1.4 hours of extraction time. The system needs to
|
||||
either batch-extract on startup (making cold starts slow) or extract lazily on
|
||||
first query (making first queries slow). Neither is ideal.
|
||||
|
||||
The notes trumpet the token savings from deterministic gates and Screamer
|
||||
deductions (valid — those cost 0 tokens) but the archivist's extraction cost is
|
||||
the system's single largest recurring LLM expense, and it is mentioned only in
|
||||
passing.
|
||||
|
||||
*** The Agora integration is clean in theory, undefined in practice
|
||||
|
||||
The "Passepartout IS the PDS" claim is elegant: the =memory-object= struct IS
|
||||
the Note format, the Merkle DAG IS the Key Event Log, the fact store IS the
|
||||
reputation system. But:
|
||||
|
||||
- An Agora PDS needs to serve HTTP APIs for thin clients. The daemon speaks a
|
||||
framed TCP protocol over a local port. Extending it to serve HTTPS with
|
||||
DIDComm endpoints, subscription management, and Relay push/pull is a
|
||||
substantial engineering effort.
|
||||
- The PDS needs to manage encrypted storage — client-side encrypted content that
|
||||
the PDS itself cannot read. Passepartout's vault stores credentials with
|
||||
integrity hashes but does not currently manage per-Note encryption with
|
||||
audience-specific keys.
|
||||
- The Relay Network is described as an intelligent communication backbone with
|
||||
pub/sub routing. Passepartout has no Relay implementation, no Relay-facing API,
|
||||
and no subscription management beyond its own event orchestrator.
|
||||
- Agora's contract system (SCAL contracts, HODL invoices, arbitration tiers)
|
||||
requires state machines and Lightning Network integration that Passepartout
|
||||
has no primitives for.
|
||||
- The "Passepartout IS the PDS" vision conflates two things: the data model
|
||||
(Org files = Notes) and the infrastructure (a process that serves a network
|
||||
protocol). The data model unification is clean and right. The infrastructure
|
||||
unification implies Passepartout grows from a local agent to a network server
|
||||
— a significant architectural expansion that the notes treat as a ~40-line
|
||||
utility.
|
||||
|
||||
*** No adversarial model
|
||||
|
||||
The notes describe layered authentication (crypto, sensory, deterministic,
|
||||
probabilistic) and type-level gates as structural safety. They do not describe
|
||||
an adversarial model:
|
||||
|
||||
- What stops a malicious Agora Note from containing 100,000 triples that flood
|
||||
the fact store?
|
||||
- What stops a DID from publishing Notes that deliberately inject contradictions
|
||||
to force Screamer into exponential backtracking?
|
||||
- What stops a compromised sensor key from signing valid sensor data that is
|
||||
adversarially crafted (e.g., video frames designed to trigger specific vision
|
||||
model false positives)?
|
||||
- What stops a spam DID from creating millions of Personas and flooding the
|
||||
user's incoming Notes directory?
|
||||
|
||||
The resource monitor (Phase 1a) handles storage pressure generically. The
|
||||
quarantine system handles individual DIDs flagged for spam. But none of these
|
||||
are adversary-aware — they react to symptoms (disk full, error rate high) rather
|
||||
than anticipating attack patterns. An adversarial model would identify these
|
||||
vectors and design mitigations specifically. The notes describe a system that
|
||||
works in a cooperative environment, not an adversarial one.
|
||||
|
||||
*** The self-repair criterion creates a two-tier architecture
|
||||
|
||||
The AGENTS.md rule — "default: everything is a skill" — means the symbolic
|
||||
engine (Screamer, VivaceGraph, fact store, archivist, ACL2, planner) is all
|
||||
skills, not core. This is correct for the self-repair criterion: a corrupted
|
||||
skill degrades the agent but doesn't kill it. A corrupted core file kills the
|
||||
brainstem.
|
||||
|
||||
But it creates a tension: the symbolic engine IS the reasoning layer that would
|
||||
diagnose and repair a corrupted skill. If the fact store itself is corrupted
|
||||
(impossible facts, inconsistent cardinality, broken Merkle chains), the engine
|
||||
that detects corruption is the engine that is corrupted. The system needs a
|
||||
"repair from below" path — a minimal core that can purge and rebuild the symbolic
|
||||
index without depending on the symbolic index. This path exists (the fact store
|
||||
is ephemeral in Phase 1-4 and rebuildable from prose in Phase 5+) but is not
|
||||
exercised automatically. A corruption in the symbolic engine requires human
|
||||
detection and manual rebuild — the exact problem the self-repair criterion was
|
||||
designed to avoid.
|
||||
|
||||
** Opportunities
|
||||
|
||||
*** A memory prosthesis that makes your own mind legible
|
||||
|
||||
The symbolic index, when populated and queried, answers questions that no
|
||||
existing tool can:
|
||||
|
||||
- "What did I believe about monorepos in 2023, and how has that changed?"
|
||||
- "Which of my diary entries contradict each other?"
|
||||
- "What entities in my memex have no connection to any other entity?"
|
||||
- "Show me everything I've written about Nabokov, organized by when I wrote it,
|
||||
what I was reading at the time, and what I concluded."
|
||||
- "Which of my project plans reference security assumptions that I later changed?"
|
||||
- "What did I think about this topic, and why did I change my mind?"
|
||||
|
||||
These are not information retrieval queries. They are self-knowledge queries.
|
||||
They require provenance chains, temporal versioning, contradiction surfacing, and
|
||||
cross-domain linkage — all of which the architecture provides as first-class
|
||||
capabilities. If this works, it transforms the memex from a searchable archive
|
||||
into a thinking partner that knows the history of your thoughts.
|
||||
|
||||
*** Deterministic reasoning as a moat
|
||||
|
||||
Every competitor agent system (Claude Code, OpenCode, OpenClaw, Hermes, Cognee,
|
||||
Mem0) uses neural-only reasoning. They are all vulnerable to the same failure
|
||||
mode: the LLM hallucinates a fact or an action, and there is no second system to
|
||||
catch it. Their safety is heuristic. Their memory is flat. Their reasoning is
|
||||
unprovable.
|
||||
|
||||
Passepartout's architectural bet — a symbolic engine that verifies, deduces, and
|
||||
audits — creates a category difference, not a performance difference. If the bet
|
||||
pays off, Passepartout is not "a better AI agent." It is a different kind of
|
||||
system — one whose reasoning is provable, whose memory is content-addressed, and
|
||||
whose knowledge accumulates through deduction rather than re-prompting.
|
||||
|
||||
This is a genuine moat. It cannot be replicated by adding a better system prompt
|
||||
or a larger context window. It requires building the ontology, the constraint
|
||||
solver, the fact store, and the provenance tracker — work that takes years and
|
||||
cannot be shortcut by spending more on inference.
|
||||
|
||||
*** Agora as the first sovereign agent network
|
||||
|
||||
If Passepartout serves as the PDS and an Agora Persona, then AI agents can:
|
||||
|
||||
- Publish verified outputs as signed Notes with cryptographic provenance.
|
||||
Readers know the agent produced the output, not a human impersonating the
|
||||
agent.
|
||||
- Accept invocation Notes from other persona owners. "Please analyze this
|
||||
contract and publish your findings." The agent receives the request as an
|
||||
Agora Note, processes it, signs the response, and publishes it.
|
||||
- Build reputation through auditable chains of signed work products, not through
|
||||
self-reported claims.
|
||||
- Participate in the compute marketplace as both consumer and provider.
|
||||
- Maintain sovereign identity — the agent's DID is independent of any platform,
|
||||
any provider, any human account.
|
||||
|
||||
This is not a chatbot on a messaging platform. It is an autonomous entity on a
|
||||
decentralized network, with cryptographic identity, verifiable provenance, and
|
||||
economic agency. If Agora reaches even Order 1 (the first 1,000 users),
|
||||
Passepartout agents become some of the most capable participants on the network.
|
||||
|
||||
*** The 10-80-10 ratio for coding is genuinely achievable
|
||||
|
||||
For a coding agent — the domain that Passepartout currently operates in — the
|
||||
10-80-10 ratio is plausible. The existing Dispatcher already verifies every
|
||||
action deterministically. Adding Screamer for consistency checking, VivaceGraph
|
||||
for dependency queries, and ACL2 for structural verification would shift the
|
||||
ratio from the current ~95-5-0 (neural-gate-symbolic) toward 50-40-10 in the
|
||||
near term and potentially 10-80-10 in the long term.
|
||||
|
||||
The bootstrapped gate facts already cover file classifications, command safety,
|
||||
path protections, and tool permissions — the core categories for a coding agent.
|
||||
The archivist's extraction from project files would add dependency information,
|
||||
test coverage, and code structure facts. The planner could reason about
|
||||
refactoring order, dependency chains, and safety constraints deterministically.
|
||||
This is the domain where the symbolic engine provides the most immediate value,
|
||||
and it is the domain Passepartout already operates in.
|
||||
|
||||
*** Wikidata as an entity backbone unlocks cross-domain reasoning
|
||||
|
||||
Without Wikidata, the symbolic index for a general-knowledge memex is a sparse
|
||||
set of personal facts with no connecting structure. With Wikidata, the entity
|
||||
graph is pre-structured. The system can answer:
|
||||
|
||||
- "What does my memex say about Nabokov that Wikidata doesn't?"
|
||||
- "Where does my memex disagree with Wikidata?"
|
||||
- "What entities in my memex have no Wikidata counterpart?" (These are the
|
||||
personal, novel, or subjective entities that are the most valuable.)
|
||||
- "Show me the intersection of my literary interests (from diary) with Wikidata's
|
||||
influence graph — which authors I read influenced each other in ways I haven't
|
||||
written about?"
|
||||
|
||||
These are cross-domain queries that require both the personal memex (for what
|
||||
the user knows) and Wikidata (for what the world knows). Neither alone can
|
||||
answer them. Together, they enable a kind of knowledge synthesis that no existing
|
||||
tool provides.
|
||||
|
||||
*** Ontology versioning enables "what-if" reasoning about one's own thinking
|
||||
|
||||
The ability to query across worldviews — "what did I believe before I changed my
|
||||
security model?" — is a capability that has no analog in any existing tool. It
|
||||
transforms the memex from a static archive into a dynamic record of intellectual
|
||||
evolution. Combined with the temporal awareness system (Phase 0c), the system
|
||||
could surface correlations: "You changed your mind about monorepos two weeks
|
||||
after reading this article, which you bookmarked on this date, and one week
|
||||
before starting this project that uses a monorepo structure." The provenance
|
||||
chain IS the narrative of your thinking.
|
||||
|
||||
*** Contract-level pre-arbitration reduces the cost of decentralized commerce
|
||||
|
||||
Agora's Tier 0 Arbitrator — a local AI that provides evidence summaries before
|
||||
human arbitration — is a genuinely useful role for a neurosymbolic system.
|
||||
|
||||
- "Contract CID X references arbitrator DID Y. DID Y is active. Verified."
|
||||
- "All parties have signed. The HODL invoice is locked. Verified."
|
||||
- "The buyer's claim of non-delivery is supported by 3 signed messages with
|
||||
timestamps after the delivery deadline."
|
||||
- "The seller's proof-of-delivery field is empty. No QR scan recorded."
|
||||
|
||||
Each check is a Screamer query against the contract-lifecycle domain. The results
|
||||
are a plist, not a ruling. Both parties see the same evidence summary before
|
||||
escalating. This makes Level 1 arbitration faster (arbitrators receive
|
||||
pre-processed evidence bundles), cheaper (no human time spent on trivial
|
||||
verification), and more transparent (both parties see the same machine-generated
|
||||
summary).
|
||||
|
||||
This is not AI judging. This is AI preparing the docket. The distinction is
|
||||
important and defensible.
|
||||
|
||||
*** Self-auditing agents could transform AI safety discourse
|
||||
|
||||
If Passepartout can answer =/audit= for any action or fact — showing the full
|
||||
provenance chain, every gate that approved it, every fact that supported it,
|
||||
every alternative that was considered — then AI safety moves from "trust us, we
|
||||
tested it" to "here is the audit trail, verify it yourself."
|
||||
|
||||
This is the transparency that every AI safety framework calls for and none
|
||||
delivers. It is possible because the architecture records provenance as a
|
||||
first-class operation, not as an after-the-fact log. The provenance is the
|
||||
operating system, not a logging layer.
|
||||
|
||||
*** The memex + Agora combination could be a new kind of social network
|
||||
|
||||
Current social networks (Twitter, Facebook, Reddit) separate the person from
|
||||
their knowledge. You are a profile with posts. Your posts are isolated units
|
||||
without connection to your broader intellectual life.
|
||||
|
||||
A Passepartout-powered Agora Persona would publish Notes that are grounded in
|
||||
the memex: "Here is my analysis of /Pale Fire/, drawn from diary entries across
|
||||
three years, annotated with Wikidata context, and verified against my existing
|
||||
literary framework." The Note is cryptographically signed, carrying provenance
|
||||
back to the specific Org headings that informed it. Readers see not just the
|
||||
conclusion but the intellectual scaffolding that produced it.
|
||||
|
||||
This is not a "post." It is a publication — a knowledge artifact with verifiable
|
||||
provenance, auditable reasoning, and cryptographic identity. If this becomes the
|
||||
norm, it raises the standard for public discourse from "this is my opinion" to
|
||||
"this is my opinion, here is the evidence, here is how it evolved, here is who
|
||||
verified it."
|
||||
|
||||
** Threats
|
||||
|
||||
*** The ontology problem may be harder than anticipated
|
||||
|
||||
The notes are honest about this: "Whitehead's Principia Mathematica took over
|
||||
300 pages to define the logical foundations before it could prove that 1+1=2."
|
||||
Passepartout's domain is narrower (coding + personal knowledge) but the
|
||||
ontology problem is the same category of problem. Every entity class must be
|
||||
defined. Every relation must have clear semantics. Every inference rule must be
|
||||
justified.
|
||||
|
||||
The gate-to-fact bootstrap provides 50-70 entity classes — enough for a coding
|
||||
agent. But the broader memex contains orders of magnitude more entity types:
|
||||
people, places, works, concepts, events, emotions, aesthetic judgments,
|
||||
professional skills, personal projects, temporal patterns. Defining these as
|
||||
triples with clear semantics is genuine intellectual work that no amount of
|
||||
engineering can shortcut.
|
||||
|
||||
The risk is not that it's impossible. It's that it's slow — slow enough that
|
||||
the system never achieves the density of facts needed for the "flip" in the
|
||||
broader memex. The coding domain may reach sufficiency in months. The literary
|
||||
domain may take years. The daily-reflection domain may never cross the
|
||||
threshold because the facts involved (mood, insight, aesthetic experience) are
|
||||
not formalizable as triples.
|
||||
|
||||
*** Screamer may not scale to the fact store size
|
||||
|
||||
The constraint satisfaction approach to consistency checking is elegant for a
|
||||
seed fact set of hundreds of triples. It is unproven for millions of triples
|
||||
(after Wikidata loading + years of personal extraction). The domain-scoping
|
||||
strategy (Screamer only checks facts from the candidate's =:domain=) bounds the
|
||||
constraint space, but the most valuable consistency checks are cross-domain:
|
||||
|
||||
- "You classified this file as public in your project notes but the gate stack
|
||||
classifies it as secret." (project domain vs security domain)
|
||||
- "You wrote that Nabokov influenced Kafka, but Wikidata says Kafka died before
|
||||
Nabokov published his first novel." (literature domain vs Wikidata domain)
|
||||
- "You planned to use this dependency, but the dependency's license changed in
|
||||
a way that conflicts with your project's license." (project domain vs legal
|
||||
domain)
|
||||
|
||||
If cross-domain checks are disabled for performance, the most valuable
|
||||
contradictions are never detected. If they are enabled, the constraint space
|
||||
explodes. There is no obvious sweet spot.
|
||||
|
||||
*** Wikidata quality may undermine trust in the symbolic index
|
||||
|
||||
If Wikidata facts are admitted with =:policy :plural= and the user sees
|
||||
thousands of contradictions between Wikidata and their personal memex, the
|
||||
symbolic index may feel less trustworthy, not more. "Wikidata says Mount Everest
|
||||
is 8848m. DBpedia says 8849m. Your 2023 diary says 8848m. These three sources
|
||||
disagree on height." This is correct behavior — surfacing disagreement with
|
||||
provenance — but it may be overwhelming. The user wanted a knowledge base, not
|
||||
a disagreement engine.
|
||||
|
||||
The trust problem is compounded by Wikidata's editorial biases. Wikidata
|
||||
reflects the biases of Wikipedia editors: English-language dominance, Western
|
||||
epistemological frameworks, systemic underrepresentation of non-Western
|
||||
knowledge. A memex in Arabic that references Islamic philosophy, Egyptian
|
||||
history, or African literature will find Wikidata's coverage thin, biased, or
|
||||
absent. The symbolic index would dutifully surface these gaps — "your memex
|
||||
mentions 47 entities with no Wikidata counterpart" — but it cannot fill them.
|
||||
|
||||
*** LLM cost and latency may prevent the archivist from keeping up
|
||||
|
||||
If the user writes a diary entry every day, the archivist must extract triples
|
||||
from each new heading. If the extraction takes 1-3 seconds per heading, it's
|
||||
background noise. But if the user imports 500 old diary entries, or the
|
||||
archivist needs to re-extract after an ontology change, or Agora Notes arrive in
|
||||
bulk from multiple follows, the extraction queue grows faster than it drains.
|
||||
|
||||
The notes describe extraction as a background task triggered by heartbeat, but
|
||||
they don't specify the extraction rate limit. An unbounded queue with no rate
|
||||
limit would consume the LLM budget. A bounded queue would fall behind. A lazy
|
||||
extraction strategy (extract on first query) would make first queries slow.
|
||||
A batch extraction on startup would make cold starts slow.
|
||||
|
||||
The archivist's throughput is gated by LLM API rate limits, token costs, and
|
||||
inference latency. These are external constraints that the architecture cannot
|
||||
eliminate. The symbolic engine can reduce LLM calls for reasoning; it cannot
|
||||
reduce LLM calls for extraction from prose.
|
||||
|
||||
*** Agora may never reach network effects
|
||||
|
||||
Agora faces the cold start problem that every decentralized social protocol
|
||||
faces: users won't join without content, creators won't post without users. The
|
||||
bootstrapping strategy (managed service → hybrid → full decentralization,
|
||||
targeting niche communities first) is well-articulated but its success depends
|
||||
on execution in a market where Mastodon, Bluesky, Nostr, and Farcaster are
|
||||
already competing for the same users.
|
||||
|
||||
If Agora doesn't reach even Order 1 (1,000 users), the PDS integration is
|
||||
academic. Passepartout's DID identity, DIDComm gateway, Note signing, and
|
||||
contract verification are all infrastructure for a network that doesn't exist.
|
||||
The symbolic engine still works locally — provenance tracking, contradiction
|
||||
surfacing, and deduction are all valuable without Agora. But the network effects
|
||||
that make Agora a transformative platform — reputation, contracts, marketplaces,
|
||||
collective governance — require a living network.
|
||||
|
||||
The risk is asymmetric: Passepartout invests significant engineering in Agora
|
||||
integration that provides zero value if Agora fails to launch.
|
||||
|
||||
*** Complexity may prevent adoption
|
||||
|
||||
Passepartout is already a complex system: a Lisp daemon, a terminal UI, a skill
|
||||
engine, a gate stack, multiple LLM backends, a Merkle memory system, and an
|
||||
event orchestrator. Adding a fact store, a constraint solver, a graph database,
|
||||
a theorem prover, an archivist, a planner, and an Agora PDS makes it more
|
||||
complex, not less.
|
||||
|
||||
The target user — someone who wants a personal AI assistant that works offline —
|
||||
may not want or need any of this. They want the TUI to work, the LLM to be fast,
|
||||
and the files to stay safe. The neurosymbolic engine is infrastructure for a use
|
||||
case (lifelong personal knowledge management with verifiable provenance) that
|
||||
most users do not yet know they have.
|
||||
|
||||
The risk is that Passepartout builds a cathedral for a congregation of one — a
|
||||
system that is architecturally brilliant and practically unused because the
|
||||
complexity-to-value ratio is too high for anyone except the author.
|
||||
|
||||
*** The self-repair criterion may not hold under adversarial conditions
|
||||
|
||||
The architecture assumes that skills can fail gracefully (fboundp guards, hash
|
||||
table fallbacks, degraded mode). It does not assume that a skill can be
|
||||
adversarially corrupted to behave correctly while producing wrong results. A
|
||||
compromised archivist that extracts plausible but false triples, a compromised
|
||||
Screamer that passes all consistency checks, a compromised VivaceGraph that
|
||||
returns query results from a parallel graph — these are "living" skills that
|
||||
would pass integrity checks and still poison the symbolic index.
|
||||
|
||||
The type-level gates prevent the LLM from modifying gate code. They do not
|
||||
prevent a compromised skill (loaded by a trusted human, or corrupted on disk by
|
||||
a separate process) from operating normally while subtly wrong. The integrity
|
||||
monitoring (Phase 0) catches disk-level corruption through hash checks. It does
|
||||
not catch semantic corruption — a skill that is byte-for-byte identical to the
|
||||
known-good version but loaded with a malicious input that triggers a latent bug.
|
||||
|
||||
This is not a vulnerability unique to Passepartout. It is a vulnerability in
|
||||
every system where components trust each other. But Passepartout's architecture
|
||||
amplifies the risk because the symbolic engine is supposed to be the trustworthy
|
||||
layer — the component that verifies the LLM's output. If the symbolic engine
|
||||
itself is compromised, the system has no higher court of appeal.
|
||||
|
||||
*** The 10-80-10 ratio may create false confidence
|
||||
|
||||
If the sufficiency metric shows "71% non-lossy, threshold 70%, mode: AUTO-
|
||||
EXTRACTION," the user may assume the system is trustworthy. But sufficiency is
|
||||
global — it aggregates across all domains. The system may have 95% sufficiency
|
||||
in the security domain and 5% sufficiency in the literary domain, averaging to
|
||||
71%. The auto-extraction switch would bypass the LLM for all categories with
|
||||
sufficient coverage, but the threshold is global, not per-domain. A literary
|
||||
query would hit the symbolic index that has "sufficient" coverage globally but
|
||||
insufficient coverage for literature.
|
||||
|
||||
The notes describe domain-scoped Screamer checks but not domain-scoped
|
||||
sufficiency. A global sufficiency metric that triggers a global extraction mode
|
||||
change is the wrong granularity. Per-domain sufficiency, with per-domain
|
||||
extraction mode, would be more complex but more honest. The architecture as
|
||||
described has the simpler, more dangerous version.
|
||||
|
||||
** Summary Matrix
|
||||
|
||||
| | Positive | Negative |
|
||||
|-----------+---------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------|
|
||||
| INTERNAL | S: Architectural inversion, unified Org format, provenance as product, | W: Unproven fact language, Screamer scale unverified, extraction cost hidden, |
|
||||
| | cardinality model, gate-to-fact bootstrap, self-preservation, organic ontology, | flip underspecified, adversarial model absent, self-repair tension, |
|
||||
| | Wikidata as accelerator, decoupled compute cost | Agora integration scope undefined, per-domain sufficiency missing |
|
||||
|-----------+---------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------|
|
||||
| EXTERNAL | O: Memory prosthesis, deterministic moat, sovereign agent network, | T: Ontology may be harder than expected, Screamer may not scale, |
|
||||
| | 10-80-10 for coding achievable, Wikidata cross-domain queries, | Wikidata quality/trust, LLM extraction bottleneck, Agora network effects, |
|
||||
| | ontology versioning, contract pre-arbitration, self-auditing safety, | complexity-to-adoption ratio, adversarial semantic corruption, |
|
||||
| | knowledge-based social network | false confidence from global sufficiency metric |
|
||||
|
||||
* What This Unlocks
|
||||
|
||||
** Technologically
|
||||
|
||||
The neurosymbolic engine, if built, would be the first AI system where:
|
||||
|
||||
1. *Reasoning is auditable.* Every conclusion carries a provenance chain back to
|
||||
its premises. The =/audit= command renders the full inference tree — every
|
||||
fact, every deduction, every gate outcome — in human-readable form.
|
||||
|
||||
2. *Knowledge accumulates deterministically.* Screamer deductions and gate
|
||||
outcomes generate new facts without any LLM involvement. The knowledge base
|
||||
grows from the system's own operation, not from re-prompting the LLM.
|
||||
|
||||
3. *Memory is content-addressed.* Every fact is a Merkle node. Every version
|
||||
chain is tamper-proof. Rollback is atomic. The storage format is proven
|
||||
correct before it is committed to disk.
|
||||
|
||||
4. *Safety is provable, not empirical.* Type-level gates make self-modification
|
||||
structurally impossible. ACL2 proves that the rule set has no contradictions.
|
||||
The dispatcher doesn't "try" to be safe — it is safe by construction.
|
||||
|
||||
5. *The human and the machine share the same format.* Org files for both. No
|
||||
hidden database. No import/export step. The agent's memory IS the human's
|
||||
memory.
|
||||
|
||||
These five properties, together, define a new category of AI system: the
|
||||
*sovereign reasoning agent*. Not sovereign in the blockchain sense (decentralized
|
||||
by consensus), but sovereign in the personal sense: the agent runs on your
|
||||
hardware, reasons with your knowledge, and proves its reasoning to you.
|
||||
|
||||
** Socially
|
||||
|
||||
If the technical vision succeeds and Agora reaches network effects, the
|
||||
combination unlocks:
|
||||
|
||||
1. *Verifiable public discourse.* Every published claim carries provenance back
|
||||
to source material. "I read this, I thought this, I changed my mind on this
|
||||
date, here is the evidence." Public discourse shifts from "competing opinions"
|
||||
to "competing evidence chains." The quality floor rises because claims without
|
||||
provenance are visibly weaker than claims with provenance.
|
||||
|
||||
2. *Sovereign AI agents with legal and economic personhood.* A Passepartout
|
||||
agent with an Agora Persona can own assets, enter contracts, earn reputation,
|
||||
and face consequences for failure. This is not a chatbot. It is an autonomous
|
||||
entity with cryptographic identity, verified provenance, and economic agency
|
||||
— more like a corporation than a tool.
|
||||
|
||||
3. *Self-auditing AI safety.* Every action the agent takes is traceable. Every
|
||||
gate decision is recorded. Every fact that informed a decision is queryable.
|
||||
AI safety moves from "trust us" to "here is the audit trail." This is the
|
||||
transparency that every AI ethics framework calls for.
|
||||
|
||||
4. *A personal knowledge economy.* If your memex can publish Notes as Agora
|
||||
content, your intellectual work — your analyses, your syntheses, your
|
||||
discoveries — becomes a publishable, attributable, monetizable asset. Not
|
||||
through advertising or subscriptions, but through direct value exchange:
|
||||
Lightning payments for content access, contract work for your verified
|
||||
expertise, reputation that follows your Persona across platforms.
|
||||
|
||||
5. *Collective intelligence without centralized control.* If multiple
|
||||
Passepartout agents share facts through Agora Notes, the collective symbolic
|
||||
index represents the verified, provenanced knowledge of a community — not the
|
||||
averaged opinion of a crowd, but the auditable intersection of independently
|
||||
verified claims. This is Wikipedia without the editorial board, science
|
||||
without the journal gatekeepers, journalism without the corporate owners.
|
||||
|
||||
6. *A memory prosthesis that outlives the individual.* A memex with a decade of
|
||||
diary entries, linked to Wikidata's entity graph, with Screamer deductions
|
||||
surfacing patterns and contradictions, with ontology versioning preserving
|
||||
intellectual evolution — this is not a knowledge management tool. It is an
|
||||
externalized, queryable, auditable record of a life's thinking. It is what
|
||||
Vannevar Bush imagined in 1945: "an enlarged intimate supplement to one's
|
||||
memory."
|
||||
|
||||
* Conclusion
|
||||
|
||||
The architecture described in these notes is genuinely novel. Not incrementally
|
||||
novel — most agent architectures are variations on "LLM + tools + prompt-based
|
||||
safety." Passepartout's neurosymbolic vision is categorically different: an
|
||||
inversion where the deterministic layer judges the probabilistic layer, where
|
||||
facts carry provenance chains, where contradiction is a feature rather than an
|
||||
error, and where the user's Org files are the single source of truth for both
|
||||
human and machine.
|
||||
|
||||
The largest risk is not that the architecture is wrong. It is that the ontology
|
||||
problem — the genuine difficulty of defining what a "fact" is, what relations
|
||||
are, what categories are useful, and how they evolve — is harder than the notes
|
||||
anticipate, and that the system spends years in a partially-working state where
|
||||
the symbolic index is too sparse to be useful but too entangled to be discarded.
|
||||
|
||||
The second-largest risk is that Agora never reaches the network effects needed
|
||||
to make the PDS integration valuable beyond a local experiment, and that the
|
||||
engineering investment in DIDComm gateways, Note signing, contract verification,
|
||||
and Relay integration produces infrastructure for a network that doesn't exist.
|
||||
|
||||
The opportunity is equally large: a system that makes your own mind legible to
|
||||
you, that proves its reasoning rather than asserting it, that accumulates
|
||||
knowledge across sessions through deduction rather than re-prompting, and that
|
||||
publishes verified, provenanced knowledge to a decentralized network. If this
|
||||
works — even partially, even slowly — it is a category-level advance over every
|
||||
existing agent architecture and every existing personal knowledge management
|
||||
tool.
|
||||
|
||||
The notes are a map of territory that no one has walked. The territory is real.
|
||||
The map is detailed enough to navigate by. Whether the journey completes depends
|
||||
on whether the ontology problem yields to engineering, and whether the user —
|
||||
the one human whose memex this serves — finds value in the partial system well
|
||||
before the full vision materializes.
|
||||
314
notes/passepartout-agora.org
Normal file
314
notes/passepartout-agora.org
Normal file
@@ -0,0 +1,314 @@
|
||||
#+TITLE: Passepartout-Agora Integration — Unified Container Format
|
||||
#+AUTHOR: Agent
|
||||
#+FILETAGS: :notes:integration:agora:passepartout:design:
|
||||
#+CREATED: [2026-05-08 Fri]
|
||||
|
||||
* Summary
|
||||
|
||||
Org files and Agora Notes are the same container. Both are text with headers,
|
||||
tags, properties, and prose body. Both contain zero or more symbolic facts
|
||||
extractable by Passepartout's archivist. The only difference is that an Agora
|
||||
Note carries a DID signature and a CID for cryptographic provenance on the
|
||||
network. An Org file without a signature is a local Note. A signed Org file
|
||||
pushed to the PDS is an Agora Note.
|
||||
|
||||
Passepartout's =memory-object= struct serves as the storage format for both.
|
||||
The archivist extracts facts from one unified store. Authorship is distinguished
|
||||
by provenance, not location.
|
||||
|
||||
* The Unification
|
||||
|
||||
** Org files and Notes are the same container
|
||||
|
||||
| Property | Org file (local) | Agora Note (network) |
|
||||
|------------------+------------------------------+-------------------------------------|
|
||||
| Format | Org-mode text | Org-mode text |
|
||||
| Identity | Merkle hash (=memory-object=) | CIDv1 (same hash) |
|
||||
| Contains facts | Yes (archivist extracts) | Yes (archivist extracts) |
|
||||
| Author identity | Implicit (file in =~/memex/=) | Explicit (DID signature in =proof=) |
|
||||
| Access control | Filesystem permissions | =access_control= flags |
|
||||
| Routing | N/A (local disk) | =notify= + =references= + Relay |
|
||||
| Ephemeral | No | =ephemeral_duration= |
|
||||
| Behavioral flag | Implicit (convention) | =is_feed= field |
|
||||
|
||||
The structure converges in a single plist:
|
||||
|
||||
#+begin_src lisp
|
||||
(:cid <merkle-hash> ;; Identity across local and network
|
||||
:title <string> ;; Org headline title
|
||||
:content <org-text> ;; Full Org body (headings, prose, source blocks)
|
||||
:owner <did-or-nil> ;; For Agora Notes: the signing Persona DID. nil for local
|
||||
:proof <plist-or-nil> ;; ( :editor <did> :signature <bytes> )
|
||||
;; Agora behavioral flags (nil for local files)
|
||||
:is-feed <boolean-or-nil>
|
||||
:access-control <did-list-or-nil>
|
||||
:notify <did-list-or-nil>
|
||||
:references <cid-list-or-nil>
|
||||
:reply-to <cid-or-nil>
|
||||
:thread-root <cid-or-nil>
|
||||
:ephemeral-duration <integer-or-nil>
|
||||
;; Passepartout metadata
|
||||
:created-at <timestamp>
|
||||
:tags <string-list> ;; Org tags
|
||||
:properties <plist> ;; Org property drawer
|
||||
:extracted-facts <fact-list>) ;; Populated by archivist after extraction
|
||||
#+end_src
|
||||
|
||||
** Facts are extracted from both, identically
|
||||
|
||||
An Org file in =~/memex/literature/pale-fire.org= and an Agora Note from
|
||||
=did:agora:heather= with =:references <post-CID>= both contain prose. The
|
||||
archivist scans both, proposes triples via the LLM, verifies via Screamer,
|
||||
and admits facts to the symbolic index. The facts carry different provenance:
|
||||
|
||||
#+begin_src lisp
|
||||
;; Extracted from local Org file
|
||||
(:entity :pale-fire :relation :theme :value :unreliable-narration
|
||||
:provenance :local-prose :grounding "heading-42")
|
||||
|
||||
;; Extracted from Agora Note
|
||||
(:entity :kafka :relation :influence :value :nabokov
|
||||
:provenance :agora-note :grounding <incoming-note-cid> :author "did:agora:heather")
|
||||
#+end_src
|
||||
|
||||
No new extraction path. The archivist already walks containers and extracts
|
||||
facts. The container type determines the provenance tag and the grounding
|
||||
identifier (local heading ID vs. Note CID).
|
||||
|
||||
** The memex distinguishes provenance by location, not format
|
||||
|
||||
Incoming Agora Notes arrive at =~/memex/social/notes/<did>/<cid>.org=.
|
||||
The directory structure encodes authorship:
|
||||
|
||||
| Path | Meaning |
|
||||
|---------------------------------------------------+------------------------------------|
|
||||
| ~/memex/daily/ | Local diary entries |
|
||||
| ~/memex/projects/ | Local project files |
|
||||
| ~/memex/literature/ | Local reading notes |
|
||||
| ~/memex/notes/ | Local design and thinking notes |
|
||||
| ~/memex/social/notes/<did>/<cid>.org | Incoming Notes from other DIDs |
|
||||
| ~/memex/social/outbox/<cid>.org | Outgoing Notes signed by the user |
|
||||
|
||||
The archivist scans all directories. Local files produce facts with
|
||||
=:provenance :local-prose=. Agora files produce facts with =:provenance
|
||||
:agora-note= + =:author <did>=. The symbolic index maps the provenance
|
||||
to the cardinality policy: local prose is =:plural= (the human's own notes —
|
||||
multiple interpretations coexist). Agora Notes are =:plural= by default (the
|
||||
author's claim, not authoritative over local facts). Agora Notes can be promoted
|
||||
to =:singular= or =:dual= if they carry cryptographic proofs of specific claims.
|
||||
|
||||
** Publishing Org content as Agora Notes
|
||||
|
||||
When the user wants to publish a diary entry, project log, or literary note as
|
||||
an Agora Note, the operation is:
|
||||
|
||||
1. Select the Org heading or file.
|
||||
2. Compute the Merkle hash (=memory-object= hash → CIDv1).
|
||||
3. Sign with the user's Persona DID key (Phase 0b key registry).
|
||||
4. Set Agora flags: =:is-feed= t/nil, =:access-control= [], =:references= [previous-note-cid].
|
||||
5. Push to the PDS. The Note is an Org plist with a DID signature.
|
||||
6. The PDS stores and relays it. The Note remains in =~/memex/social/outbox/= with its CID.
|
||||
|
||||
All of this is a single function: =(note-publish heading-id &key is-feed access-control references)=.
|
||||
~40 lines, extending the vault (key signing), the fact store (CID generation),
|
||||
and the memex (output directory).
|
||||
|
||||
* Implications for Passepartout's Architecture
|
||||
|
||||
** The symbolic index now has a second ingestion path
|
||||
|
||||
Facts enter through three gates:
|
||||
1. Gate outcomes (bootstrap + runtime, =:provenance :gate-outcome=)
|
||||
2. Screamer deductions (=:provenance :deduced=)
|
||||
3. Archivist extraction (=:provenance :local-prose= or =:provenance :agora-note=)
|
||||
|
||||
The third path now covers both local Org files and incoming Agora Notes. No new
|
||||
path needed. The archivist gains no new code — only a new directory to walk
|
||||
(=~/memex/social/notes/=) and a new provenance tag to assign.
|
||||
|
||||
** Authentication Layer 1 now has Agora-native verification
|
||||
|
||||
Phase 0b's cryptographic gate (vector 0) verifies DID signatures. An incoming
|
||||
Agora Note carries =:owner <did>= and =:proof.signature <bytes>=. Gate vector 0
|
||||
verifies the signature against the DID's public key (from the key registry, which
|
||||
is now also an Agora DID registry). Verification is identical for local signals
|
||||
and Agora signals — the same gate, the same key lookup.
|
||||
|
||||
** Self-preservation gains an Agora dimension
|
||||
|
||||
The resource monitor (Phase 1a) tracks =~/memex/social/= as a source of storage
|
||||
growth. Incoming Notes from network sources are lower preservation priority than
|
||||
local prose — if disk pressure hits, incoming Agora Notes are evicted first
|
||||
(their source is the remote PDS; they can be re-fetched). Quarantine (Phase 1a)
|
||||
extends to Agora channels: if a DID is sending spam or malformed Notes, their
|
||||
incoming directory is quarantined and the DID is flagged for human review.
|
||||
|
||||
** Sufficiency tracks Agora as a provenance source
|
||||
|
||||
The sufficiency score (Phase 4) gains a new provenance category:
|
||||
|
||||
#+begin_example
|
||||
Symbolic Index
|
||||
Facts: 3,847
|
||||
Gate outcomes: 847 (22%)
|
||||
Deduced: 921 (24%)
|
||||
Human-authored: 72 (2%)
|
||||
Local prose: 1,247 (32%)
|
||||
Agora Notes: 760 (20%)
|
||||
─────────────────────────
|
||||
Non-lossy: 1,840 (48%)
|
||||
LLM-proposed: 2,007 (52%)
|
||||
#+end_example
|
||||
|
||||
Agora Notes are a provenance source, not a lossiness category. Facts from Agora
|
||||
Notes carry =:provenance :agora-note= — they are LLM-extracted (the archivist
|
||||
proposes them) but the source is cryptographically signed by a known DID. They
|
||||
are neither =:gate-outcome= (mechanical) nor =:llm-proposed= from local prose
|
||||
(uncertain source). They occupy a middle ground: verified source, uncertain
|
||||
extraction.
|
||||
|
||||
* Implications for Agora
|
||||
|
||||
** Passepartout IS the PDS
|
||||
|
||||
The TODO.org in =projects/agora/= already captures this: "Passepartout IS the
|
||||
PDS — the agent runs a personal data store in-process." With Org files as the
|
||||
Note format, this is literal. The PDS stores Org files. The agent reads them.
|
||||
The network accesses them via the PDS API. There is no separate PDS process.
|
||||
|
||||
** Level 0 pre-arbitration via Screamer
|
||||
|
||||
Section 07 of the Agora requirements describes a "Tier 0 Arbitrator" — a local
|
||||
AI that provides a sanity check before human arbitration. Passepartout's
|
||||
Screamer + fact store provides this at zero LLM tokens when working from
|
||||
existing facts:
|
||||
|
||||
- "Contract CID X references arbitrator DID Y. DID Y is active. Verified."
|
||||
- "All parties have signed. The HODL invoice is locked. Verified."
|
||||
- "The buyer's claim of non-delivery is supported by 3 signed messages with
|
||||
timestamps after the delivery deadline."
|
||||
- "The seller's proof-of-delivery field is empty. No QR scan recorded."
|
||||
|
||||
Each check is a Screamer query against the contract-lifecycle domain. Results
|
||||
are a plist, not a ruling. Both parties see the same evidence summary before
|
||||
escalating to Level 1.
|
||||
|
||||
** Reputation as deduced facts
|
||||
|
||||
Screamer deduces reputation from signed contract chains, not asserted claims:
|
||||
|
||||
#+begin_src lisp
|
||||
(:entity "did:agora:heather" :relation :contract-reputation
|
||||
:value (:completed 47 :defaulted 0 :disputes 3 :won 3 :escalated 0)
|
||||
:provenance :deduced :derived-from (<list of 47 contract CIDs>))
|
||||
#+end_src
|
||||
|
||||
This is the strong version of Agora's Trust Score. It's a fact deduced from
|
||||
cryptographic evidence, not a claim by the persona (self-reporting could be
|
||||
false) and not a claim by a centralized reputation service (could be bought).
|
||||
The deduction is auditable — `/audit did:agora:heather` shows every contract,
|
||||
every outcome, every ruling.
|
||||
|
||||
** Agent Behavioral Contracts — formal enforcement for the ABC of Agora
|
||||
|
||||
Bhardwaj (2026) introduces a formal framework that brings Design-by-Contract
|
||||
principles to autonomous AI agents. An ABC contract =C = (P, I, G, R)=
|
||||
specifies /Preconditions/, /Invariants/ (hard and soft), /Governance/ policies
|
||||
(hard and soft), and /Recovery/ mechanisms as first-class runtime-enforceable
|
||||
components.
|
||||
|
||||
This maps directly onto Agora's contract lifecycle:
|
||||
|
||||
| ABC component | Agora mapping |
|
||||
|------------------------+--------------------------------------------------------------|
|
||||
| =P= (Preconditions) | Contract Note validity checks: all signers' DIDs active, |
|
||||
| | contract CID correctly referenced, HODL invoice locked |
|
||||
| =I= (Invariants) | Hard: payment amount unchanged, arbitrator DID unchanged. |
|
||||
| | Soft: delivery within estimated window |
|
||||
| =G= (Governance) | Hard: no party modifies contract terms unilaterally. |
|
||||
| | Soft: parties communicate through designated channels |
|
||||
| =R= (Recovery) | Arbitration escalation, HODL invoice release, reputation |
|
||||
| | deduction |
|
||||
|
||||
The framework's key mathematical results have direct implications for Agora:
|
||||
|
||||
- /Drift Bounds Theorem/: contracts with recovery rate γ > α (natural drift rate
|
||||
from LLM non-determinism in agent behavior) bound behavioral drift to D* = α/γ.
|
||||
For Agora, this means contract enforcement can be /predictive/ — detecting drift
|
||||
before violation — rather than just /corrective/ after breach.
|
||||
|
||||
- /Compositionality Theorem/: sufficient conditions (interface compatibility,
|
||||
assumption discharge, governance consistency, recovery independence) under
|
||||
which individual contract guarantees compose end-to-end for multi-agent chains.
|
||||
This is essential for Agora's multi-party contracts, where a buyer, seller,
|
||||
arbitrator, and escrow agent form a chain of interdependent behavioral
|
||||
expectations.
|
||||
|
||||
- /(p, δ, k)-satisfaction/: probabilistic compliance accounting for LLM
|
||||
non-determinism — contracts hold with probability p, deviations stay within
|
||||
tolerance δ, recovery within k steps. This formalizes what Screamer's
|
||||
contract-lifecycle domain queries verify: whether the current state of a
|
||||
contract satisfies its agreed-upon conditions, given the inherent uncertainty
|
||||
in any agent's behavior.
|
||||
|
||||
The empirical results are significant: across 1,980 sessions on 7 models,
|
||||
contracted agents (with ABC enforcement) detected 5.2-6.8 soft violations per
|
||||
session that uncontracted agents missed entirely, with <10ms per-action overhead.
|
||||
Overhead is critical for Passepartout as the PDS — contract enforcement must not
|
||||
add latency to Note processing.
|
||||
|
||||
ABC does not replace Screamer. ABC specifies /what/ must hold; Screamer verifies
|
||||
/whether/ it holds against the fact store. The contract-lifecycle domain already
|
||||
planned for Phase 0b (signal chain) can be implemented as an ABC-like structure:
|
||||
a tuple of preconditions, invariants, governance rules, and recovery mechanisms,
|
||||
each expressed as Screamer-verifiable facts with Merkle provenance.
|
||||
|
||||
See also:
|
||||
- Bhardwaj, V.P. (2026). Agent Behavioral Contracts: Formal Specification and
|
||||
Runtime Enforcement for Reliable Autonomous AI Agents. arXiv:2602.22302.
|
||||
|
||||
** The merkle DAG IS the Key Event Log
|
||||
|
||||
Agora's KEL specification (Section 02) describes an append-only log of key
|
||||
events — inception, rotation, revocation, follow events. Passepartout's Merkle
|
||||
DAG (Phase 5, built on v0.2.0 memory-object infrastructure) is this log. Each
|
||||
key event is a fact in the =:key-lifecycle= domain. Each event has a
|
||||
=:parent-id= chaining to the previous event. The DAG is content-addressed —
|
||||
every event is a CID. The full KEL is queryable: `/audit did:agora:heather`
|
||||
renders every key event, every follow event, every contract signature, with
|
||||
provenance chains.
|
||||
|
||||
* Relation to the Neurosymbolic Roadmap
|
||||
|
||||
The Agora integration is not a new phase. It is a consequence of decisions
|
||||
already made:
|
||||
|
||||
| Roadmap item | Agora consequence |
|
||||
|-------------------------+----------------------------------------------------------------|
|
||||
| Phase 0b (key registry) | Key registry uses Agora DIDs. DID store is =:key-lifecycle= domain |
|
||||
| Phase 1 (fact store) | Fact store is also Note store. Same API, same hash table |
|
||||
| Phase 1a (self-pres.) | Incoming Notes tracked. Spam DIDs quarantined. Disk eviction |
|
||||
| Phase 3 (archivist) | Archivist walks =~/memex/social/notes/= alongside local dirs |
|
||||
| Phase 4 (sufficiency) | Agora Notes are a provenance category in the sufficiency score |
|
||||
| Phase 5 (Merkle DAG) | DAG = KEL. DAG = contract audit trail |
|
||||
| Phase 0b (signal chain) | Signal chain = contract lifecycle chain. Same Merkle linking |
|
||||
|
||||
No new lines in the roadmap. The Note publishing function (~40 lines) is a
|
||||
utility, not a phase.
|
||||
|
||||
* What Is NOT Built
|
||||
|
||||
1. *A separate Note parser.* Agora Notes ARE Org files. The existing Org parser
|
||||
reads both.
|
||||
2. *A separate Note store.* The =memory-object= struct stores both. The
|
||||
=*memory-store*= hash table holds both.
|
||||
3. *A separate extraction path for Agora content.* The archivist extracts facts
|
||||
from prose regardless of origin. The provenance tag distinguishes source.
|
||||
4. *A new authentication mechanism for Agora signals.* Gate vector 0 verifies
|
||||
DID signatures. The key registry is the DID registry.
|
||||
|
||||
See also:
|
||||
- =projects/agora/docs/= — Agora requirements (overview, identity, primitive, social, contracts, governance)
|
||||
- =projects/agora/TODO.org= — Passepartout integration track
|
||||
- =passepartout-neurosymbolic-design-decisions-and-options.org= — the full design rationale
|
||||
- =passepartout-neurosymbolic-roadmap.org= — the phased implementation plan
|
||||
@@ -1,719 +1,24 @@
|
||||
#+TITLE: Passepartout Neurosymbolic Engine — Design Decisions and Architecture Options
|
||||
#+TITLE: Passepartout Neurosymbolic Engine — Design Decisions — SUPERSEDED
|
||||
#+AUTHOR: Agent
|
||||
#+FILETAGS: :notes:design-decisions:neurosymbolic:architecture:v3.0.0:
|
||||
#+FILETAGS: :notes:design-decisions:neurosymbolic:superseded:
|
||||
#+CREATED: [2026-05-08 Fri]
|
||||
|
||||
* The Hallucination Problem — Why Neurosymbolic
|
||||
|
||||
An LLM is a statistical engine trained on token sequences. It generates the most
|
||||
probable continuation of a prompt. Given sufficient context, that continuation is
|
||||
correct. Given novel context, it is often wrong in confident-sounding ways.
|
||||
|
||||
This is not a training deficiency. Hallucination is a fundamental property of
|
||||
probabilistic inference. You can reduce it with better models, longer contexts,
|
||||
and clever prompting, but you cannot eliminate it by making the LLM better. You
|
||||
eliminate it by not asking the LLM to do things that require certainty.
|
||||
|
||||
This is the architectural bet at the heart of Passepartout's neurosymbolic design.
|
||||
The LLM should not be the reasoning engine. It should be the *creative* engine —
|
||||
proposing possibilities, surfacing connections, translating between natural
|
||||
language and formal representation. The *reasoning* engine should be symbolic:
|
||||
deterministic, verification-grounded, provenance-tracked, and incapable of
|
||||
hallucination by construction.
|
||||
|
||||
This is not a rejection of neural methods. It is a division of labor. The neuro
|
||||
is the brain — generative, associative, creative, comfortable with ambiguity. It
|
||||
produces hypotheses. The symbolic engine is the education — accumulated, verified,
|
||||
provenance-tracked knowledge that the brain draws on and is disciplined by. It
|
||||
doesn't think. It remembers, checks, and constrains.
|
||||
|
||||
The brain is always smarter than the education, but the education prevents the
|
||||
brain from being confidently wrong.
|
||||
|
||||
** See also:
|
||||
|
||||
- =passepartout/docs/DESIGN_DECISIONS.org=: "The Probabilistic-Deterministic Split"
|
||||
for the gate-level version of this argument.
|
||||
- =notes/passepartout-whitehead.org=: Whitehead's ramified theory of types as
|
||||
the structural guarantee against self-referential contradictions.
|
||||
- =notes/passepartout-symbolic-engine-exploration.org=: the full design space and
|
||||
the lossiness problem at the neural-symbolic boundary.
|
||||
|
||||
* The Five Architecture Options
|
||||
|
||||
The symbolic engine must relate to the human memex. The relationship is not
|
||||
obvious because knowledge lives in two incompatible forms: natural language
|
||||
prose (what the human reads and writes) and formal facts (what the symbolic
|
||||
engine reasons about). The translation between them is lossy by nature. The
|
||||
architecture is defined by how it handles that lossiness.
|
||||
|
||||
=notes/passepartout-symbolic-engine-exploration.org= explores five options. They are
|
||||
summarized here to make subsequent decisions legible.
|
||||
|
||||
** Option 1: The Auto-Formalizer
|
||||
|
||||
A separate knowledge graph stores symbolic facts. The LLM populates it by
|
||||
extracting triples from unstructured data — documentation, manuals, logs,
|
||||
session histories. The KG becomes co-authoritative with the human prose.
|
||||
|
||||
This is the simplest to implement but inherits the dual-representation problem
|
||||
in its most acute form. The KG and the prose can disagree, and the architecture
|
||||
provides no mechanism for resolving disagreements. It also stores knowledge
|
||||
twice — once in the user's Org files, once in the KG — with no guarantee that
|
||||
they stay synchronized.
|
||||
|
||||
** Option 2: Two Intentionally Separate Memexes
|
||||
|
||||
The human memex contains prose: thoughts, diaries, decisions, documentation.
|
||||
The symbolic memex contains formal facts: constraints, rules, relationships,
|
||||
deductions. The archivist bridges between them but does not try to keep them
|
||||
synchronized. They are allowed to diverge because they serve different purposes.
|
||||
The prose captures what the human intended. The symbolic memex captures what
|
||||
the symbolic engine has proven.
|
||||
|
||||
This is philosophically honest — it admits that no lossless translation between
|
||||
natural language and formal logic is possible. But it forces the user to reason
|
||||
about two separate knowledge stores and understand when to trust each.
|
||||
|
||||
** Option 3: Tangled Fact Blocks in Org Files
|
||||
|
||||
The tangle mechanism already handles the dual-representation problem for code.
|
||||
Lisp code lives in literate blocks within Org files (=#+begin_src lisp=). The
|
||||
tangle mechanism extracts these blocks and generates =.lisp= files. A new block
|
||||
type — =#+begin_src knowledge= — would contain symbolic facts in a formal
|
||||
language. The tangle mechanism would load these facts into the symbolic engine's
|
||||
in-memory store, just as it loads Lisp code into the SBCL image.
|
||||
|
||||
This is aesthetically appealing because it unifies the format. One toolchain,
|
||||
one version control system, one Merkle tree. But the block language itself IS
|
||||
the knowledge representation language, and that language is the ontology we
|
||||
have not yet defined. The format is unified but the content is unspecified.
|
||||
|
||||
** Option 4: One Memex, Two Indices
|
||||
|
||||
The prose remains in human language in Org files. The prose is always the ground
|
||||
truth. Two indices sit on top of the prose as derived views:
|
||||
|
||||
- The *neural index* uses vector embeddings to enable semantic search. The LLM
|
||||
navigates the prose through embedding space, retrieving relevant headings.
|
||||
- The *symbolic index* stores formal assertions about what the prose says —
|
||||
predicates, relations, constraints — each grounded to a specific heading or
|
||||
block in the Org file.
|
||||
|
||||
Each index serves its own side of the machine. They do not need to understand
|
||||
each other's representations. They only need to agree on which heading or block
|
||||
they are referring to. Because the prose is always the ground truth, the symbolic
|
||||
index can be thrown away and rebuilt from scratch if it becomes corrupted or
|
||||
stale. No information is lost — only the extracted assertions.
|
||||
|
||||
** Option 5: Ephemeral Symbolic Facts
|
||||
|
||||
No persistence, no serialization format, no knowledge graph stored on disk.
|
||||
VivaceGraph exists in memory during the session. Screamer derives facts from the
|
||||
prose as needed. When the session ends, the facts are discarded and re-derived
|
||||
from the prose on the next start.
|
||||
|
||||
This punts the ontological design problem entirely. You never have to decide on
|
||||
a serialization format because you never serialize. The cost is compute
|
||||
(re-derivation on every restart) and the inability to accumulate facts across
|
||||
sessions. But it is the correct first step — a way to learn what kinds of facts
|
||||
are actually useful before committing to a storage format.
|
||||
|
||||
* The Chosen Path: Option 4, Starting with Option 5
|
||||
|
||||
The one-memex-two-indices architecture (Option 4) is the correct long-term
|
||||
architecture. The prose is the ground truth. The symbolic index is a derived
|
||||
view that can be rebuilt. The neural index handles what the symbolic index
|
||||
cannot — semantic search, fuzzy matching, associative leaps.
|
||||
|
||||
But committing to a persistence format before knowing what facts are useful
|
||||
is premature. The practical path starts with Option 5 (ephemeral facts) as the
|
||||
Phase 1-4 implementation, then graduates to Option 4 with VivaceGraph
|
||||
persistence in Phase 5 when the fact language has been battle-tested (=see
|
||||
=passepartout-neurosymbolic-roadmap.org=).
|
||||
|
||||
** Why the dual index is permanent, not transitional
|
||||
|
||||
In the coding domain, there is an aspiration that the symbolic index could
|
||||
eventually capture enough of the prose's propositional content to become a
|
||||
complete representation — the "flip" described in the architecture note. But
|
||||
for the broader memex (literature, poetry, personal reflection, daily logs),
|
||||
completeness is neither possible nor desirable. You cannot formalize what makes
|
||||
a poem beautiful. You cannot extract a triple that captures the emotional weight
|
||||
of a diary entry. The neural index will always be the gateway to the full
|
||||
richness of the prose. The symbolic index handles what can be mechanically
|
||||
verified: citations, entities, temporal order, contradictions, provenance.
|
||||
The division of labor between the two indices is permanent because the domains
|
||||
they serve are fundamentally different kinds of knowledge.
|
||||
|
||||
* The Neuro as Brain, the Symbolic as Education
|
||||
|
||||
The original 10-80-10 architecture (10% neural, 80% symbolic, 10% neural)
|
||||
describes the target ratios for a *coding* agent — a domain where most reasoning
|
||||
is formalizable. For the broader memex, the ratios are different and less
|
||||
important than the metaphor itself.
|
||||
|
||||
The neuro is the *brain* — generative, associative, creative, comfortable with
|
||||
ambiguity. It produces insights that are provisional, connections that are
|
||||
speculative, hypotheses that may be wrong. It is the driver.
|
||||
|
||||
The symbolic engine is the *education* — accumulated, verified,
|
||||
provenance-tracked knowledge that the brain draws on and is disciplined by. It
|
||||
doesn't think creatively. It remembers, checks, and constrains. It prevents the
|
||||
brain from being confidently wrong.
|
||||
|
||||
This framing resolves a tension in the original architecture. The 10-80-10
|
||||
implies the symbolic engine /replaces/ the neuro for reasoning. But a symbolic
|
||||
engine is terrible at creativity, ambiguity, and associative leaps across
|
||||
unrelated domains — exactly what you need for a memex that contains /Pale Fire/,
|
||||
a shopping list, and a project plan. The brain proposes that your sudden interest
|
||||
in unreliable narrators coincides with a week where your project retrospective
|
||||
used the word "deception." The education verifies: "those two diary entries are
|
||||
4 days apart; the word 'deception' appears in both; here are the headings." The
|
||||
brain makes the leap. The education makes it trustworthy.
|
||||
|
||||
This means the symbolic engine never needs to be "complete." Education isn't
|
||||
complete knowledge — it's structured knowledge. You don't need a fact for every
|
||||
sentence in your diary. You need facts for what can be mechanically verified:
|
||||
dates, citations, entities, contradictions, temporal order. The brain handles
|
||||
the rest.
|
||||
|
||||
* The Gate-to-Fact Bootstrap — Extracting the First Ontology from Existing Code
|
||||
|
||||
The Dispatcher gate stack already encodes an implicit ontology. Every gate
|
||||
vector asserts the existence of a category of things:
|
||||
|
||||
- Gate vector 2 asserts there exists a class of files called /secrets/.
|
||||
- Gate vector 7 asserts there exists a class of commands called /destructive/.
|
||||
- Gate vector 8 asserts there exists a class of domains called /trusted/.
|
||||
- The self-build boundary asserts there exists a class of files called
|
||||
/core-harness/ and a class called /skills/.
|
||||
|
||||
These claims are currently expressed as code — Lisp functions that pattern-match
|
||||
against file paths, shell commands, and URLs. They are not facts the symbolic
|
||||
engine can query, derive from, or check for consistency. But they can be made
|
||||
explicit.
|
||||
|
||||
The bootstrap makes every gate a set of initial symbolic facts:
|
||||
=(:file ".env" :member-of-class :secret-files :source gate-vector-2)=,
|
||||
=(:command "rm -rf /" :classified-as :catastrophic :source gate-vector-7)=,
|
||||
=(:domain "api.telegram.org" :classified-as :trusted :source gate-vector-8)=.
|
||||
|
||||
This produces 50-70 entity classes directly from the existing gate stack,
|
||||
without any new infrastructure:
|
||||
|
||||
| Source | Count | Example categories |
|
||||
|----------------------------------------+-------+----------------------------------------------------|
|
||||
| ~*dispatcher-protected-paths*~ | 11 | :secret-config-file, :ssh-key-file, :gpg-key-file |
|
||||
| ~*dispatcher-shell-blocked*~ | 8 | :catastrophic-command, :injection-pattern |
|
||||
| ~*dispatcher-network-whitelist*~ | 2 | :trusted-domain, :untrusted-domain |
|
||||
| Self-build boundary | 2 | :core-harness-file, :skill-file |
|
||||
| Privacy tags | 3 | :private-content, :financial-content |
|
||||
| Permission table | 3 | :read-only-tool, :write-tool, :eval-tool |
|
||||
| Cognitive tools | 6 | :code-search-tool, :file-io-tool, :shell-tool |
|
||||
| Relations (all gates) | ~15 | :member-of-class, :classified-as, :depends-on |
|
||||
| Qualities | ~8 | :catastrophic, :dangerous, :moderate, :harmless |
|
||||
| Provenance sources | 4 | :gate-outcome, :human-authored, :deduced, :llm-proposed |
|
||||
|----------------------------------------+-------+----------------------------------------------------|
|
||||
|
||||
This is the seed. It gives Screamer a domain to reason about immediately, without
|
||||
any LLM involvement. It proves the pattern — code becomes facts, facts enable
|
||||
reasoning — at the cost of approximately 30 lines of Lisp.
|
||||
|
||||
* The LLM as Proposer — Verified Extraction
|
||||
|
||||
The LLM cannot be trusted to populate the symbolic index directly. Its outputs are
|
||||
sampled, not proven. A probabilistic extraction feeding a deterministic engine
|
||||
defeats the purpose of being deterministic.
|
||||
|
||||
But the LLM is still useful. It can surface facts that are obvious to a human
|
||||
reader of prose but would take the symbolic engine many deduction steps to reach
|
||||
independently. The solution is to demote the LLM from /extractor/ to /proposer/:
|
||||
|
||||
1. The archivist reads a prose heading.
|
||||
2. The LLM proposes candidate triples.
|
||||
3. Screamer checks each triple for consistency against the existing fact store.
|
||||
4. Only consistent triples are admitted to the symbolic index, flagged with
|
||||
=:provenance :llm-proposed= and grounded to the source heading.
|
||||
|
||||
The LLM might hallucinate facts that don't correspond to the prose. It might
|
||||
extract facts that contradict existing knowledge. It might produce syntactically
|
||||
malformed triples. None of these failures contaminate the symbolic index because
|
||||
proposals are not admitted automatically. The admission gate (Screamer) is
|
||||
deterministic.
|
||||
|
||||
This is the core architecture pattern. Everything else — the entity classes, the
|
||||
deduction engine, the persistence layer — follows from this single design decision:
|
||||
*the LLM proposes; the symbolic engine decides whether to accept.*
|
||||
|
||||
* Three Contradiction Policies — Domain-Dependent Consistency
|
||||
|
||||
Classical logic requires consistency. A contradiction implies everything
|
||||
(=ex contradictione quodlibet=). Screamer, as a constraint solver, also requires
|
||||
consistency — a contradictory constraint set has no solutions. But the symbolic
|
||||
engine operates across domains where the meaning of contradiction is fundamentally
|
||||
different.
|
||||
|
||||
A single architecture serves all domains by applying different contradiction
|
||||
policies, scoped to the entity class:
|
||||
|
||||
** Policy :exclusive — Contradiction Rejected at Admission
|
||||
|
||||
For domains where the world is physically singular — a file either exists or it
|
||||
doesn't, a command either was blocked or it wasn't, a gate rule either applies or
|
||||
it doesn't. When a new fact contradicts an existing one in an :exclusive domain,
|
||||
the new fact is rejected. The existing fact is authoritative unless a human
|
||||
explicitly retracts it.
|
||||
|
||||
Use for: security classifications, file system state, gate rules, code
|
||||
correctness, deterministic safety constraints.
|
||||
|
||||
** Policy :coexistent — Contradiction Flagged, Both Retained
|
||||
|
||||
For domains where multiple truths coexist — literary interpretations, historical
|
||||
accounts, personal beliefs held at different times, multi-source factual
|
||||
disagreement (Wikidata vs. DBpedia vs. your memex). When a new fact contradicts
|
||||
an existing one in a :coexistent domain, the contradiction is recorded with a
|
||||
cross-reference flag. Both facts are stored. Queries return all facts with
|
||||
provenance display.
|
||||
|
||||
Use for: literature, history, personal knowledge evolution, scientific consensus
|
||||
shift, multi-author knowledge bases.
|
||||
|
||||
** Policy :temporal — Contradiction Accepted as Version Change
|
||||
|
||||
For domains where truth changes over time. When a new fact contradicts an old one
|
||||
in a :temporal domain, the old fact is marked =:superseded= but retained. The
|
||||
timeline is queryable: "You believed X on Tuesday, Y on Friday, Z on Sunday."
|
||||
|
||||
Use for: personal belief evolution, project plan revisions, scientific
|
||||
consensus shift over time, any knowledge where the change itself is information.
|
||||
|
||||
** Policy Assignment
|
||||
|
||||
The policy is assigned when a category is defined. New categories default to
|
||||
=:coexistent= (never loses information). Core security categories are explicitly
|
||||
=:exclusive=. The gate stack's bootstrapped facts are =:exclusive= because they
|
||||
describe the actual filesystem, not perspectives.
|
||||
|
||||
The Screamer admission gate does not reject all contradictions. It rejects
|
||||
contradictions in =:exclusive= domains and flags them in =:coexistent= and
|
||||
=:temporal= domains. The constraint solver still works because queries scope
|
||||
their constraint set to a single provenance domain. "Is X true according to my
|
||||
memex?" is a different query than "Is X true according to Wikidata?" Each has
|
||||
a self-consistent internal logic. The contradiction is between domains, not
|
||||
within them.
|
||||
|
||||
** Why This Matters for the Broader Memex
|
||||
|
||||
In the coding domain, contradiction is rare and must be resolved — a gate can't
|
||||
both allow and block the same path. In the broader memex, contradiction is the
|
||||
product, not the error. Your poetry analysis contradicts your last diary entry
|
||||
on the same topic. Your reading of /Pale Fire/ changed between 2023 and 2025.
|
||||
Wikidata says Mount Everest is 8848m (China: rock height); DBpedia says 8849m
|
||||
(Nepal: snow height). The symbolic engine's job is not to decide which is right.
|
||||
It is to surface the tension with provenance — "these three sources disagree.
|
||||
Here is the chain for each."
|
||||
|
||||
* How Categories Grow — The Organic Ontology
|
||||
|
||||
Whitehead's /Principia Mathematica/ took over 300 pages to define the logical
|
||||
foundations before it could prove that one plus one equals two. Every category
|
||||
introduced carried a burden of justification. Every inference rule had to be
|
||||
demonstrated sound. This is the classical approach to ontology: define everything
|
||||
upfront, exhaustively, formally.
|
||||
|
||||
Passepartout cannot afford this and does not need it. Its domain is bounded
|
||||
(software engineering, personal knowledge, literary engagement, daily life) and
|
||||
its ontology grows from the system's own operation:
|
||||
|
||||
1. *The gate stack seeds the ontology.* Every gate vector is an implicit claim
|
||||
about a category of things. The bootstrap makes these claims explicit. The
|
||||
seed is 50-70 entity classes with no human authoring required — they are
|
||||
mechanically extracted from the existing code.
|
||||
|
||||
2. *New gate vectors add categories directly.* As the Dispatcher grows (new
|
||||
shell patterns, new path protections, new tool classifications), the ontology
|
||||
grows with it. Every new pattern in the gate stack becomes a fact on skill
|
||||
load. No human effort. The gate stack grows, the ontology grows.
|
||||
|
||||
3. *Screamer generalizes from gate outcomes.* After 37 shell commands are blocked
|
||||
as destructive, Screamer extracts structural commonalities: "commands writing
|
||||
to block devices," "commands recursively deleting outside the workspace."
|
||||
These become new subcategories (=:block-device-command=,
|
||||
=:workspace-external-delete=) that didn't exist in the original gate patterns.
|
||||
The ontology deepens through observation.
|
||||
|
||||
4. *The archivist proposes from prose.* The archivist reads a diary entry about
|
||||
a book: "Nabokov's lectures on Kafka." The LLM proposes =(:entity :nabokov
|
||||
:relation :lectures-on :value :kafka)=. Screamer checks consistency. Admitted.
|
||||
The categories =:author=, =:lectures-on=, and =:subject= didn't exist before —
|
||||
they are created on first use. This is the primary growth mechanism for the
|
||||
broader memex.
|
||||
|
||||
5. *The human declares explicitly.* The human writes a declarative fact directly
|
||||
into the symbolic index. No extraction step. No LLM involvement. The fact is
|
||||
admitted with =:provenance :human-authored= — the highest trust level.
|
||||
|
||||
6. *Temporal patterns crystallize into categories.* Every Sunday the memex gets a
|
||||
retrospective heading. Every Monday a planning heading. The time-awareness
|
||||
system observes the periodicity and proposes =:weekly-retrospective= and
|
||||
=:weekly-planning= as fact types. Screamer verifies they don't contradict
|
||||
existing categorizations. Admitted.
|
||||
|
||||
7. *Cross-domain overlap produces parent categories.* Screamer notices that
|
||||
=:secret-files= (from the gate stack) and =:private-content= (from privacy
|
||||
tags) share members — =.env= is both a secret file and private content. It
|
||||
proposes =:sensitive-material= as a parent with both as children. Taxonomy
|
||||
building happens automatically through overlap detection.
|
||||
|
||||
** Growth is self-limiting by design
|
||||
|
||||
Not every conceivable category is added. The system prunes through use:
|
||||
|
||||
- New categories are admitted only through Screamer's consistency check. A
|
||||
category that contradicts an existing classification is rejected.
|
||||
- A category that never gets queried costs nothing (a hash table entry) but
|
||||
produces no value. It fades from use naturally.
|
||||
- Overly fine-grained categories (=.env.foo.bar.baz= as its own class) are
|
||||
rejected because they are redundant with the wildcard pattern that already
|
||||
covers them.
|
||||
- Overly broad categories that subsume meaningful distinctions ("everything is
|
||||
a =:file=") produce contradictions when Screamer tries to apply existing rules.
|
||||
Rejected.
|
||||
|
||||
The system converges on a useful granularity through use, not through upfront
|
||||
design. The gate stack provides the seed. Gate outcomes, prose extraction,
|
||||
deduction, and human authoring grow the shoots. Screamer prunes contradictions.
|
||||
The ontology is a garden, not a building.
|
||||
|
||||
* Semantic Wikipedia as Entity Backbone
|
||||
|
||||
The gate stack provides 50-70 entity classes — adequate for a coding agent where
|
||||
the domain is bounded to files, commands, and code symbols. For a general-knowledge
|
||||
memex, 50-70 is starvation. Your memex mentions Nabokov, /Pale Fire/, Kinbote,
|
||||
Zembla, paranoid reading, unreliable narrators, postmodernism, butterfly
|
||||
migration, chess problems, and the Russian exile experience. The gate stack knows
|
||||
none of these. Organic growth through prose extraction would take years just to
|
||||
cover the entities in one person's engagement with a single novel.
|
||||
|
||||
Wikidata has already done this work: approximately 2 million entity classes, over
|
||||
100 million entities, a decade of human curation. By loading the neighborhood of
|
||||
your memex into the symbolic index (entities referenced in your prose, plus their
|
||||
N-hop property net from Wikidata), the entity recognition problem vanishes. The
|
||||
archivist doesn't need to discover Nabokov from your diary. It needs to connect
|
||||
your heading to the existing Wikidata entity. That is a simpler task — reference
|
||||
resolution, not knowledge extraction.
|
||||
|
||||
The LLM's role shrinks to three thin boundaries:
|
||||
|
||||
1. *Input translation* — natural language question to structured query. "What do
|
||||
I think about monorepos?" → =(fact-query :entity :monorepo :relation :opinion
|
||||
:source :memex)=. Formulaic, ~100 tokens, any model sufficient.
|
||||
|
||||
2. *Prose to candidate triple* — for personal memex entries that have no Wikidata
|
||||
counterpart: your opinions, your day's events, your project plans. Proposals
|
||||
are verified by Screamer before admission. This is the only extraction path
|
||||
that still requires an LLM, and its scope is limited to what Wikidata cannot
|
||||
provide — your subjective, personal, or novel content.
|
||||
|
||||
3. *Result to prose* — structured answer to readable sentence. "Your 2023 diary
|
||||
says 8848m. Wikidata (last edited Feb 2024) says 8849m. They disagree on
|
||||
height." The reasoning is done; the LLM wraps the plist in grammar. ~100
|
||||
tokens, any model sufficient, purely cosmetic. Users who prefer no LLM at all
|
||||
can navigate through command-driven interaction (=/query=, =/contradictions=,
|
||||
=/audit=, =/context why=).
|
||||
|
||||
Everything else — the gate stack, the fact store, the constraint solver, the type
|
||||
hierarchy, the provenance tracking, the contradiction surfacing, the cross-domain
|
||||
comparison — is pure deterministic Lisp with zero LLM tokens.
|
||||
|
||||
** The decisive simplification
|
||||
|
||||
Without Semantic Wikipedia, the archivist must /discover/ entities from prose:
|
||||
extract a triple for every person, place, work, concept, and event mentioned in
|
||||
the memex. This is unbounded LLM work and the quality depends on extraction
|
||||
accuracy.
|
||||
|
||||
With Wikidata loaded, the entity graph is pre-structured. The archivist's job
|
||||
changes from "discover that Nabokov wrote /Pale Fire/ and lectured on Kafka" to
|
||||
"verify that the Nabokov referenced in heading #47 is the same entity as Wikidata
|
||||
item Q36591." The second task is simpler, more reliable, and in many cases can
|
||||
be done without an LLM at all — a simple entity name match against the loaded
|
||||
Wikidata graph may suffice for unambiguous names.
|
||||
|
||||
* The "Flip" — From Lossy Extraction to Deterministic Derivation
|
||||
|
||||
The symbolic index begins its life as a lossy construct. The initial extraction
|
||||
from the prose — the first population of facts from LLM proposals verified by
|
||||
Screamer — is built from an uncertain foundation. Some facts are correct. Some
|
||||
are missing. Some are wrong.
|
||||
|
||||
But the symbolic engine accumulates non-lossy facts through three independent
|
||||
mechanisms:
|
||||
|
||||
1. *Gate outcomes* — every gate rejection is a fact. No LLM involved. These
|
||||
accumulate at the rate of user interactions.
|
||||
2. *Screamer deductions* — new facts derived from existing facts. No LLM
|
||||
involved. These accumulate whenever the fact store crosses a density threshold
|
||||
where structural patterns emerge.
|
||||
3. *Human authoring* — the human explicitly declares facts. No LLM involved.
|
||||
|
||||
At some point, the non-lossy facts constitute a sufficient foundation that the
|
||||
symbolic engine can reverse the flow: instead of the LLM extracting facts from
|
||||
prose, the symbolic engine reads prose through its own lens — its now-substantial
|
||||
ontology of categories, rules, and constraints — and asserts facts in its own
|
||||
language. The extraction mechanism ceases to be probabilistic and becomes
|
||||
deterministic.
|
||||
|
||||
** The sufficiency criterion
|
||||
|
||||
The architecture note (=notes/passepartout-symbolic-engine-exploration.org=) describes
|
||||
this "flip" as aspirational: "at some point, the non-lossy facts constitute a
|
||||
sufficient foundation." This design decision makes it operational:
|
||||
|
||||
=(/ (count-provenance :gate-outcome :human-authored :deduced) total-facts)=
|
||||
|
||||
When this ratio exceeds a configurable threshold (=SUFFICIENCY_THRESHOLD=,
|
||||
default 0.7), the system considers its foundation sufficient. The archivist
|
||||
switches from "LLM proposes, Screamer verifies" to "Screamer queries existing
|
||||
facts, applies to the new prose, and deduces new facts directly."
|
||||
|
||||
The flip is visible to the user through the TUI sidebar or =/status= command:
|
||||
"Symbolic index: 847 facts (73% non-lossy, 12% LLM-proposed, 15% Wikidata).
|
||||
Sufficient foundation: YES."
|
||||
|
||||
** The flip does not mean "complete"
|
||||
|
||||
In the broader memex, completeness is neither possible nor desirable. The flip
|
||||
means "deterministic enough to be trustworthy," not "comprehensive enough to be
|
||||
self-sufficient." The neural index remains the gateway to the full richness of
|
||||
prose. The symbolic index handles what can be mechanically verified. The boundary
|
||||
is permanent.
|
||||
|
||||
* Ephemeral First, Persistent Later
|
||||
|
||||
The architecture note's Option 5 (ephemeral facts, no disk persistence) is the
|
||||
correct first implementation. Three reasons:
|
||||
|
||||
1. *The fact language is unproven.* Triples with provenance and grounding is a
|
||||
hypothesis. It may be too simple for some domains, too complex for others.
|
||||
Committing to a serialization format before knowing what's useful is premature.
|
||||
|
||||
2. *The ontology is emergent.* Categories are created on first use. What proves
|
||||
useful stays; what doesn't fades. A persistent format would need a migration
|
||||
story every time the category structure changes. Ephemeral avoids this entirely
|
||||
— the facts are re-derived on each session start using the current (evolved)
|
||||
ontology.
|
||||
|
||||
3. *Rebuildability is the safety net.* Because all facts have a =:grounding= to
|
||||
an Org heading, and gate-outcome facts are regenerated from the gate stack on
|
||||
every load, the entire symbolic index can be thrown away and rebuilt from
|
||||
scratch. The cost is compute, not data. This is the practical realization of
|
||||
"the prose is always the ground truth."
|
||||
|
||||
The transition to persistence (Phase 5: VivaceGraph) happens when two conditions
|
||||
are met: the fact language has stabilized through use, and the accumulated
|
||||
deductions across sessions provide value that justifies the serialization cost.
|
||||
|
||||
* Whitehead's Concrete Contributions — Four Operational Contributions
|
||||
|
||||
=notes/passepartout-whitehead.org= extracts four concrete, engineerable ideas
|
||||
from Whitehead's /Principia Mathematica/ and /Process and Reality/. They are
|
||||
summarized here because each informs the neurosymbolic design.
|
||||
|
||||
** Contribution 1: PM-Type-Level Gates
|
||||
|
||||
PM's ramified theory of types solved Russell's paradox by assigning every
|
||||
propositional function a type level, making self-application syntactically
|
||||
invalid. Passepartout applies the same principle to prevent a request from
|
||||
modifying the rules that validate it. Every cognitive tool and gate vector
|
||||
carries a =:type-level= integer. Before any gate predicate runs, the dispatcher
|
||||
checks: if the signal's type level equals or exceeds the gate's type level, the
|
||||
signal is rejected. A request to modify dispatcher rules (type-level 5) cannot
|
||||
pass a gate of type-level 4 or lower. This is a structural prohibition, not a
|
||||
heuristic — self-modification of the safety layer is impossible by construction.
|
||||
|
||||
Implementation: approximately 30 lines in the existing dispatcher. No new
|
||||
dependencies. Backward compatible. This is Phase 0 of the symbolic engine
|
||||
roadmap.
|
||||
|
||||
** Contribution 2: Theory of Descriptions → Reference Resolution
|
||||
|
||||
PM's theory of descriptions addressed the problem of referring to nonexistent
|
||||
entities: "the current king of France is bald" is false, not meaningless, when
|
||||
there is no unique referent. Passepartout applies this to reference resolution:
|
||||
when the user says "the function that validates secrets," a cognitive tool checks
|
||||
uniqueness before resolving. Ambiguous references trigger a clarification prompt
|
||||
rather than a blind guess.
|
||||
|
||||
Implementation: approximately 40 lines as a cognitive tool. When the knowledge
|
||||
graph ships, descriptions become native Prolog queries with uniqueness constraints.
|
||||
|
||||
** Contribution 3: Process and Reality → Architectural Vocabulary
|
||||
|
||||
Whitehead's process ontology maps with surprising precision to Passepartout's
|
||||
pipeline architecture. Prehension = a gate grasping a signal. Positive prehension
|
||||
= a gate passing. Negative prehension = a gate rejecting. Concrescence = the
|
||||
pipeline process from input to output. Satisfaction = the final agent response.
|
||||
This vocabulary is precise, standard, and already mapped to the architecture. It
|
||||
provides the language for the =/why= command, the gate trace, and the ARCHITECTURE
|
||||
documentation. It is descriptive, not operational — the design would be correct
|
||||
without it, but it would lack the vocabulary to describe /why/ it is correct.
|
||||
|
||||
** Contribution 4: VivaceGraph + PM Types → KG Type Hierarchy
|
||||
|
||||
When the knowledge graph ships, every entity inherits PM's type hierarchy.
|
||||
Entities carry =:pm-type-level= metadata. Queries cannot return entities of the
|
||||
same level as the querying function. Self-referential knowledge becomes
|
||||
structurally impossible — no "this entity defines its own type level." This is
|
||||
Contribution 1 applied to the knowledge layer rather than the execution layer.
|
||||
The dispatcher prevents self-referential /actions/; the KG prevents
|
||||
self-referential /facts/.
|
||||
|
||||
* The Provenance Chain as Product
|
||||
|
||||
In the coding domain, the value of the symbolic engine is the verified fact:
|
||||
"this command is safe." In the broader memex, the value is the provenance itself:
|
||||
"this claim originated in that diary entry on that date, has been referenced 7
|
||||
times across 4 different projects, was contradicted in a retrospective 6 months
|
||||
later, and was revised in a note 3 weeks after that."
|
||||
|
||||
The symbolic engine doesn't tell you what is true. It tells you what you wrote,
|
||||
when, where, and how it connects to everything else you wrote — with a verifiable
|
||||
audit trail. It is a memory prosthesis that makes your own mind legible to you.
|
||||
|
||||
Every fact carries:
|
||||
|
||||
- =:grounding= — the specific Org heading from which it was extracted
|
||||
- =:provenance= — who or what produced it (gate-outcome, human-authored, deduced,
|
||||
LLM-proposed)
|
||||
- =:timestamp= — when it was admitted to the symbolic index
|
||||
- =:referenced-by= — other facts that depend on or reference this one
|
||||
- =:contradicted-by= — other facts that disagree with this one (if any)
|
||||
- =:superseded-by= — if this fact was replaced by a newer version
|
||||
|
||||
These fields make every fact auditable. The =/audit <node-id>= command renders
|
||||
the full provenance chain as an Org headline tree. The provenance is not a
|
||||
logging feature. It is the product.
|
||||
|
||||
* The Competitive Argument
|
||||
|
||||
No competitor has this problem because no competitor has a symbolic engine. The
|
||||
55 systems surveyed in =notes/competitive-landscape.org= range from pure chat
|
||||
agents (Claude, ChatGPT) to agent harnesses (Claude Code, OpenCode, Hermes) to
|
||||
platform agents (OpenClaw). None of them encode knowledge as formal facts with
|
||||
provenance. None of them verify extractions against an existing knowledge base.
|
||||
None of them can prove properties about their own rulesets.
|
||||
|
||||
Their safety is heuristic (prompt-based guardrails that consume LLM tokens and
|
||||
can be evaded with clever phrasing). Their memory is flat (JSONL transcripts
|
||||
without content-addressed identity or provenance chains). Their reasoning is
|
||||
entirely neural — when you ask "why did you decide that?", the answer is a
|
||||
regenerated LLM explanation, not a retrieved inference chain.
|
||||
|
||||
Passepartout's architectural bet is that this problem is worth solving — that a
|
||||
system which can surface contradictions with provenance, derive new facts from
|
||||
observations, and verify claims against a provenanced knowledge graph is
|
||||
fundamentally different from a system that can only call an LLM and hope the
|
||||
response is correct.
|
||||
|
||||
The cost is the ontological work that is genuinely difficult. The reward is a
|
||||
system that cannot hallucinate at the reasoning level, whose memory is provable
|
||||
rather than empirical, and whose knowledge accumulates across sessions through
|
||||
deduction rather than through LLM re-prompting. For a life's knowledge stored in
|
||||
a personal memex, this is not a performance advantage. It is a category difference.
|
||||
|
||||
* Open Questions
|
||||
|
||||
Several design questions are unresolved and should remain unresolved at this
|
||||
stage. They represent research decisions that require experience running the
|
||||
system.
|
||||
|
||||
** What is the minimum viable fact language?
|
||||
|
||||
Triples — =(:entity :relation :value)= with provenance and grounding — is the
|
||||
current hypothesis. It is simple enough to be parseable, expressive enough to
|
||||
capture the gate stack's implicit claims, and extensible enough that Screamer
|
||||
can operate on it. But it may be too simple. Triples do not naturally express
|
||||
temporal relations ("was X before Y?"), modal claims ("should not do X unless
|
||||
Y"), or counterfactuals — all of which may be essential for a symbolically-aided
|
||||
memex. The right granularity depends on what queries actually need to be made,
|
||||
and that cannot be known in advance.
|
||||
|
||||
** How does ontology refactoring work?
|
||||
|
||||
If the seed produces 50 categories from gate extraction and later experience
|
||||
shows they are wrong — wrong granularity, missing cross-cutting concerns, conflated
|
||||
categories — how are they migrated without invalidating all existing deductions
|
||||
that cross the old category boundaries? The ephemeral-first approach (no
|
||||
persistence, rebuild from scratch) is a temporary answer. Once persistence is
|
||||
committed (VivaceGraph), refactoring the category hierarchy is a schema migration
|
||||
problem that deduction provenance makes harder — every deduced fact's chain may
|
||||
cross the old category boundary. This is not addressed in the current architecture.
|
||||
|
||||
** What is the appropriate role of the human?
|
||||
|
||||
The human can explicitly declare facts, write constraints, and correct wrong
|
||||
extractions. But how much of the ontology should the human need to maintain? If
|
||||
the human must write a definition for every new category the symbolic engine
|
||||
encounters, the overhead is prohibitive. If the symbolic engine can generalize
|
||||
from instances, the human role becomes supervision rather than authorship — review
|
||||
and approve proposed generalizations. The balance cannot be set without experience.
|
||||
|
||||
** How much Wikidata is the right amount?
|
||||
|
||||
Loading Wikidata entities referenced in the memex is the minimum. Loading all
|
||||
Wikidata entities within N hops of those references expands the graph
|
||||
exponentially. The right N depends on the memex's breadth — a memex focused on
|
||||
software engineering needs fewer hops than a memex spanning literature, history,
|
||||
philosophy, and science. The query performance and memory costs of a large
|
||||
Wikidata load are unknown.
|
||||
|
||||
** Can the symbolic engine satisfy queries from the user without LLM involvement?
|
||||
|
||||
The design aims for zero-LLM query answering: the user issues a structured
|
||||
command (=/query=, =/contradictions=, =/audit=), and the symbolic engine responds
|
||||
directly. But natural language questions ("what do I think about monorepos?")
|
||||
still require the LLM as a thin translation layer. Whether the structured command
|
||||
interface is sufficient for daily use, or whether users will demand natural
|
||||
language interaction, determines how much LLM involvement remains in the mature
|
||||
system.
|
||||
|
||||
** Is the triplestore physically bounded or does it explode?
|
||||
|
||||
A personal memex with years of diary entries, project notes, reading logs, and
|
||||
literary analyses could produce millions of triples. A naive hash table scales
|
||||
linearly but VivaceGraph's Prolog-like queries may not. The performance
|
||||
characteristics of graph queries over a million-triple knowledge base have not
|
||||
been estimated.
|
||||
|
||||
* Relation to Passepartout's Existing Architecture
|
||||
|
||||
The neurosymbolic engine is an extension of the existing probabilistic-deterministic
|
||||
split, not a replacement for it. The current architecture divides cognition into
|
||||
LLM-driven proposals and Lisp-driven verification. The symbolic engine deepens the
|
||||
verification side from "is this action safe?" to "is this claim supported?" — the
|
||||
same architectural pattern applied to a broader domain.
|
||||
|
||||
The self-repair criterion (a file belongs in core only if, when corrupted, the
|
||||
agent cannot fix it without human help) applies to every component of the symbolic
|
||||
engine. Screamer, VivaceGraph, the fact store, the archivist — all are skills,
|
||||
loaded at runtime, hot-reloadable, and recoverable from corruption. A corrupted
|
||||
symbolic engine degrades reasoning capability but does not kill the agent. The
|
||||
eight existing core ASDF files are unchanged.
|
||||
|
||||
The symbolic engine is not v3.0.0 alone. It is the layer that sits between the
|
||||
existing gate stack (which it makes explicit as facts) and the existing skill
|
||||
system (which it extends with deduction, contradiction detection, and provenance
|
||||
tracking). It grows within the current architecture without replacing any existing
|
||||
component.
|
||||
|
||||
See also:
|
||||
|
||||
- =passepartout-neurosymbolic-roadmap.org= — the concrete phased implementation plan
|
||||
- =notes/passepartout-symbolic-engine-exploration.org= — the original architecture note
|
||||
- =notes/passepartout-whitehead.org= — the four Whitehead contributions
|
||||
- =passepartout/docs/DESIGN_DECISIONS.org= — the existing design decisions
|
||||
- =passepartout/docs/ARCHITECTURE.org= — the current pipeline architecture
|
||||
- =passepartout/docs/ROADMAP.org= — the feature roadmap through v0.13.0
|
||||
#+SUPERSEDED: [2026-05-10 Sun]
|
||||
|
||||
This document has been consolidated into ~passepartout/docs/DESIGN_DECISIONS.org~. The unified document interleaves the neurosymbolic design rationale into nine thematic parts with a single narrative arc:
|
||||
|
||||
| Part | Topic | Key New Sections |
|
||||
|------|-------|-----------------|
|
||||
| I | Foundation | Historical Lineage (McCarthy) |
|
||||
| II | The Two Brains | Hallucination Problem, 10-80-10, Brain/Education metaphor |
|
||||
| III | Safety & Self-Preservation | Active Third Law, Layered Signal Authentication |
|
||||
| IV | The Symbolic Engine | Five Options, Chosen Path, Gate-to-Fact Bootstrap, LLM as Proposer, Cardinality Policies, Organic Ontology, Ontology Versioning, Sufficiency Criterion, Merkle DAG, Abstract Fact Store Interface |
|
||||
| V | Knowledge Sources | Semantic Wikipedia, MOMo Empirical Validation |
|
||||
| VI | Implementation Properties | Performance Scaling, Provenance as Product |
|
||||
| VII | Engineering Infrastructure | REPL, Cybernetic Loop, Observability, Literate Programming, Eval Harness, MCP, Local-First, Token Economics, Time Awareness (carried over from existing) |
|
||||
| VIII | Validation | Marcus, CREST, KiL philosophical validation; Competitive Argument |
|
||||
| IX | Open Questions | Fact language, human role, Wikidata scope, natural language interface, graph query performance |
|
||||
|
||||
Cross-references are preserved in:
|
||||
- ~notes/passepartout-symbolic-engine-exploration.org~
|
||||
- ~notes/passepartout-whitehead.org~
|
||||
- ~notes/competitive-landscape.org~
|
||||
|
||||
@@ -1,920 +1,28 @@
|
||||
#+TITLE: Passepartout Neurosymbolic Engine — Implementation Roadmap
|
||||
#+TITLE: Passepartout Neurosymbolic Engine — SUPERSEDED
|
||||
#+AUTHOR: Agent
|
||||
#+FILETAGS: :notes:roadmap:neurosymbolic:v3.0.0:
|
||||
#+FILETAGS: :notes:roadmap:neurosymbolic:superseded:
|
||||
#+CREATED: [2026-05-08 Fri]
|
||||
|
||||
* Evolutionary Roadmap
|
||||
|
||||
This roadmap describes a phased implementation of the symbolic engine. It is
|
||||
independent of the feature roadmap in =passepartout/docs/ROADMAP.org= — Phase 0
|
||||
can ship immediately alongside any v0.7.x patch. The symbolic engine grows in
|
||||
parallel with feature work, not after it.
|
||||
|
||||
Every phase is loaded as a skill, not a core ASDF component. A corrupted symbolic
|
||||
engine degrades reasoning capability but does not kill the agent. This satisfies
|
||||
the self-repair criterion documented in =passepartout/docs/ARCHITECTURE.org= and
|
||||
=passepartout/AGENTS.md=.
|
||||
|
||||
The design rationale for each decision is in
|
||||
=notes/passepartout-neurosymbolic-design-decisions-and-options.org=. The original
|
||||
architecture exploration is in
|
||||
=notes/passepartout-symbolic-engine-exploration.org=. Whitehead's contributions are
|
||||
enumerated in =notes/passepartout-whitehead.org=.
|
||||
|
||||
* Phase 0: PM-Type-Level Gates (~30 lines — builds on existing Dispatcher)
|
||||
|
||||
** What
|
||||
|
||||
Add =:type-level= metadata to the existing =defgate= and =def-cognitive-tool=
|
||||
macros. Before any gate predicate evaluates, the dispatcher checks structural
|
||||
type compatibility: a signal at type-level 5 cannot pass a gate at type-level 4
|
||||
or lower. Self-modification of the safety layer becomes impossible by
|
||||
construction.
|
||||
|
||||
** Rationale
|
||||
|
||||
The Dispatcher gate stack currently prevents self-modification through pattern
|
||||
matching — gate vector 2b catches writes to =core-*= files as a heuristic. But
|
||||
there is no /structural/ guarantee preventing a request from modifying the rules
|
||||
that validate it. Pattern-based protection can be bypassed through indirection
|
||||
(an =eval= that constructs a write, a skill that redefines a gate function at
|
||||
runtime). A type-level check is not heuristic — it is a category error rejected
|
||||
before any predicate runs, just as PM's theory of types made self-membership
|
||||
syntactically invalid before any logical evaluation.
|
||||
|
||||
** Implementation
|
||||
|
||||
1. Add =:type-level= keyword argument to =defgate= (default 0) and
|
||||
=def-cognitive-tool= (default 0) in =core-skills.org=.
|
||||
2. Add =gate-type-check= to the dispatcher's =run-gates= function in
|
||||
=security-dispatcher.org=, executed before any gate predicate.
|
||||
3. Assign type levels to existing cognitive tools: self-build-core at 5,
|
||||
write-file at 3, read-file at 1, shell at 2, eval at 4.
|
||||
4. Assign type levels to existing gate vectors: self-build boundary at 5,
|
||||
shell safety at 3, path protection at 2, network exfil at 2, secret content at 1.
|
||||
|
||||
** Verification
|
||||
|
||||
Existing FiveAM gate tests continue to pass. New test: signal at type-level 5
|
||||
targeting a gate at type-level 4 returns =:reject-type-violation= without
|
||||
evaluating the gate predicate. New test: signal at type-level 1 passing through
|
||||
a gate at type-level 3 proceeds to predicate evaluation.
|
||||
|
||||
** Relation to Other Work
|
||||
|
||||
This is Contribution 1 from =notes/passepartout-whitehead.org=. It is also the
|
||||
gate-to-fact bootstrap mechanism — every type-level rejection emits a structured
|
||||
event that Phase 1 ingests as a fact. The ~30 lines implement the seed of the
|
||||
ontology without any new dependencies.
|
||||
|
||||
* Phase 1: Minimum Viable Fact Language (~150 lines — new skill)
|
||||
|
||||
** What
|
||||
|
||||
An ephemeral, in-memory triple store with provenance tracking and contradiction
|
||||
detection. No disk persistence. All facts live in a hash table and are discarded
|
||||
on session end. Gate outcomes are ingested as facts. The gate stack's implicit
|
||||
ontology is materialized as the seed fact set.
|
||||
|
||||
** Rationale
|
||||
|
||||
The architecture note's Option 5 (ephemeral facts, no persistence) is the correct
|
||||
first step. Three reasons:
|
||||
|
||||
1. *The fact language is unproven.* Triples with provenance and grounding is a
|
||||
hypothesis that must be tested against real memex content before being committed
|
||||
to a serialization format.
|
||||
2. *The ontology is emergent.* Categories are created on first use. A persistent
|
||||
format would require a migration story for every category change. Ephemeral
|
||||
avoids this — facts are re-derived on each session start using the evolved
|
||||
ontology.
|
||||
3. *Rebuildability is the safety net.* Because all facts have a =:grounding= to
|
||||
an Org heading, and gate-outcome facts are regenerated from the gate stack on
|
||||
load, the entire symbolic index can be thrown away and rebuilt from scratch.
|
||||
The cost is compute, not data.
|
||||
|
||||
** Implementation — =org/symbolic-facts.org= → =lisp/symbolic-facts.lisp= (skill)
|
||||
|
||||
*** Triple store
|
||||
|
||||
A hash table keyed by =(entity relation)=. Values are plists:
|
||||
|
||||
#+begin_example
|
||||
(:value <string-or-symbol>
|
||||
:grounding <heading-id-or-nil>
|
||||
:provenance <:gate-outcome | :human-authored | :deduced | :llm-proposed>
|
||||
:timestamp <universal-time>
|
||||
:contradiction <:awaiting-resolution-or-nil>
|
||||
:superseded-by <entity-string-or-nil>)
|
||||
#+end_example
|
||||
|
||||
The =:provenance= field tracks how the fact entered the store. The
|
||||
=:contradiction= field is nil on standard facts. The =:superseded-by= field is
|
||||
set when a =:temporal= domain fact is replaced by a newer version.
|
||||
|
||||
*** Bootstrap from gates
|
||||
|
||||
On skill load, scan the Dispatcher's existing data structures and produce triples:
|
||||
|
||||
#+begin_example
|
||||
;; From *dispatcher-protected-paths*
|
||||
(:entity ".env" :relation :member-of-class :value :secret-config-file :provenance :gate-outcome)
|
||||
(:entity "*id_rsa*" :relation :member-of-class :value :ssh-key-file :provenance :gate-outcome)
|
||||
|
||||
;; From *dispatcher-shell-blocked*
|
||||
(:entity "rm -rf /" :relation :classified-as :value :catastrophic-command :provenance :gate-outcome)
|
||||
(:entity "dd if=" :relation :classified-as :value :catastrophic-command :provenance :gate-outcome)
|
||||
|
||||
;; From *dispatcher-network-whitelist*
|
||||
(:entity "api.telegram.org" :relation :classified-as :value :trusted-domain :provenance :gate-outcome)
|
||||
#+end_example
|
||||
|
||||
This produces 50-70 entity classes immediately. No LLM involvement. No human
|
||||
authoring. Mechanically extracted from existing code.
|
||||
|
||||
*** Ingest gate outcomes
|
||||
|
||||
Register a post-gate hook on the Dispatcher's rejection path. Every gate rejection
|
||||
produces a triple with =:provenance :gate-outcome=:
|
||||
|
||||
#+begin_example
|
||||
(:entity "/tmp/secrets.env" :relation :blocked-by :value :dispatcher-path-protection
|
||||
:provenance :gate-outcome :grounding "signal-47")
|
||||
#+end_example
|
||||
|
||||
*** Query
|
||||
|
||||
=(fact-query &key entity relation value source-provenance)= — pure hash-table
|
||||
lookup. Returns the matching triple or nil. ~30 lines.
|
||||
|
||||
=(fact-query-all &key relation value source-provenance)= — returns all triples
|
||||
matching the filter criteria. Enables "find all files classified as secrets."
|
||||
|
||||
*** Contradiction detection
|
||||
|
||||
On every =fact-assert=, check if the new triple contradicts an existing one
|
||||
(same entity, same relation, different value, same provenance domain). If the
|
||||
entity's class has =:contradiction-policy :exclusive=, the new fact is rejected
|
||||
with a signal. If the policy is =:coexistent=, both facts are stored with a
|
||||
=:contradiction= flag cross-referencing each other. If the policy is =:temporal=,
|
||||
the old fact is marked =:superseded-by= the new one but retained.
|
||||
|
||||
The policy table is a hash table mapping entity classes to one of =:exclusive=,
|
||||
=:coexistent=, or =:temporal=. Gate-bootstrapped facts default to =:exclusive=
|
||||
(the filesystem is singular). New categories default to =:coexistent= (safe,
|
||||
never loses information).
|
||||
|
||||
** Verification — ~8 FiveAM tests
|
||||
|
||||
1. =test-bootstrap-creates-facts= — bootstrap produces correct triples from
|
||||
=*dispatcher-protected-paths*=.
|
||||
2. =test-bootstrap-creates-shell-facts= — bootstrap produces correct triples from
|
||||
=*dispatcher-shell-blocked*=.
|
||||
3. =test-gate-outcome-produces-fact= — a simulated gate rejection produces a
|
||||
triple with =:provenance :gate-outcome=.
|
||||
4. =test-fact-query-returns-correct-value= — querying by entity and relation
|
||||
returns the expected value plist.
|
||||
5. =test-duplicate-ingestion-idempotent= — asserting the same fact twice does
|
||||
not produce a duplicate or a contradiction.
|
||||
6. =test-exclusive-contradiction-rejected= — asserting a contradictory fact in
|
||||
an =:exclusive= domain returns a rejection.
|
||||
7. =test-coexistent-contradiction-flagged= — asserting a contradictory fact in a
|
||||
=:coexistent= domain stores both with cross-referencing flags.
|
||||
8. =test-temporal-supersedes= — asserting a newer fact in a =:temporal= domain
|
||||
marks the old fact as superseded but retains it.
|
||||
|
||||
** Relation to Other Work
|
||||
|
||||
This is Phase 1 of =notes/passepartout-v3.0.0-roadmap.org=. It implements Options 4 and 5
|
||||
from the architecture note. The contradiction policies are from
|
||||
=passepartout-neurosymbolic-design-decisions-and-options.org=.
|
||||
|
||||
* Phase 2: Screamer as Admission Gate (~200 lines — new skill)
|
||||
|
||||
** What
|
||||
|
||||
Wrap Screamer (a constraint solver with non-deterministic backtracking) as a
|
||||
skill. Use it for consistency checking against the triple store and for deduction
|
||||
of new facts from existing ones. Screamer is the *verification* layer; VivaceGraph
|
||||
(introduced in Phase 5) is the *storage* layer.
|
||||
|
||||
** Rationale
|
||||
|
||||
The architecture note's "verified extraction" pattern requires a deterministic
|
||||
admission gate. Screamer's non-deterministic backtracking finds contradictions
|
||||
that simple string comparison misses. For example, if existing facts say "all
|
||||
config files with extension =.env= are classified as secrets," and the LLM
|
||||
proposes "=app.env= is not secret," Screamer finds the contradiction by
|
||||
substituting =app.env= into the existing rule. A naive string-keyed hash table
|
||||
comparison would miss this because ="app.env"= and =".env"= are different strings.
|
||||
|
||||
Screamer also enables deduction — new facts from existing ones without any LLM
|
||||
involvement. If all files matching =*.env= are secrets, and =prod.env= matches
|
||||
=*.env=, then =prod.env= is a secret. Deduced facts carry =:provenance :deduced=
|
||||
and a =:derived-from= chain pointing to the facts they were derived from.
|
||||
|
||||
** Implementation — =org/symbolic-screamer.org= → =lisp/symbolic-screamer.lisp= (skill)
|
||||
|
||||
*** Wrap Screamer
|
||||
|
||||
Screamer is available via Quicklisp. Load at runtime via =ql:quickload :screamer=.
|
||||
Not an ASDF dependency — if Screamer is not installed, the skill degrades
|
||||
gracefully (no consistency checking, no deduction — the fact store still
|
||||
functions as a hash table with provenance tracking).
|
||||
|
||||
*** Consistency check
|
||||
|
||||
=(screamer-consistent-p candidate-fact existing-facts)= — expresses the fact
|
||||
store as Screamer constraint variables. The candidate fact is asserted. Screamer
|
||||
checks solvability. Returns =:consistent=, =:contradiction <details>=, or
|
||||
=:redundant= (the fact is already implied by existing facts).
|
||||
|
||||
Early-stage: the consistency check works on simple triples. As the fact store
|
||||
grows, rules of the form "all X are Y" (representing protected paths, shell
|
||||
patterns, class memberships) become Screamer constraints that new facts must
|
||||
satisfy.
|
||||
|
||||
*** Deduction
|
||||
|
||||
=(screamer-deduce existing-facts)= — Screamer finds implications of the existing
|
||||
fact set that are not already in the store. New facts are asserted with
|
||||
=:provenance :deduced= and a =:derived-from= list of source fact keys.
|
||||
|
||||
Deduction is not run on every assertion — it is a background task triggered by
|
||||
heartbeat or manually. The cost is compute (Screamer exploration), not tokens.
|
||||
|
||||
*** Admission gate
|
||||
|
||||
=(screamer-admit candidate-fact existing-facts)= — wraps consistency check with
|
||||
the contradiction policy lookup. If the candidate fact's entity class has policy
|
||||
=:exclusive=, contradictions reject. If =:coexistent=, flag. If =:temporal=,
|
||||
supersede.
|
||||
|
||||
This is the function the archivist calls before any LLM-proposed fact enters the
|
||||
store. It is also called on human-authored facts (which override the policy —
|
||||
the human can assert contradictory facts in any domain). It is not called on
|
||||
gate-outcome facts (gates are the ground truth for security domains).
|
||||
|
||||
** Verification — ~6 FiveAM tests
|
||||
|
||||
1. =test-screamer-consistency-passes= — a fact consistent with existing triples
|
||||
returns =:consistent=.
|
||||
2. =test-screamer-contradiction-detected= — "app.env is not secret" contradicts
|
||||
"all *.env files are secrets" and returns =:contradiction=.
|
||||
3. =test-screamer-redundant-detected= — asserting a fact already implied by
|
||||
existing facts returns =:redundant=.
|
||||
4. =test-screamer-deduction-produces-new-fact= — given "all *.env files are
|
||||
secrets" and "prod.env matches *.env", Screamer deduces "prod.env is secret."
|
||||
5. =test-admission-gate-rejects-contradiction= — the archivist's proposal that
|
||||
contradicts an =:exclusive= domain fact is rejected.
|
||||
6. =test-admission-gate-flags-coexistent-contradiction= — the archivist's proposal
|
||||
that contradicts a =:coexistent= domain fact is stored with a cross-reference.
|
||||
|
||||
** Relation to Other Work
|
||||
|
||||
This is Phase 2 of =notes/passepartout-v3.0.0-roadmap.org=. It implements the "LLM as proposer"
|
||||
pattern from the architecture note. Screamer's role is defined in
|
||||
=passepartout-neurosymbolic-design-decisions-and-options.org=.
|
||||
|
||||
* Phase 3: Archivist as Fact Proposer (~100 lines — extends existing archivist)
|
||||
|
||||
** What
|
||||
|
||||
Extend the existing archivist skill (=org/symbolic-archivist.org=) with a fact
|
||||
extraction mode. The LLM reads prose, proposes triples, and Screamer verifies
|
||||
them before admission. The archivist's existing Scribe (log distillation) and
|
||||
Gardener (link scanning) functions are unchanged.
|
||||
|
||||
** Rationale
|
||||
|
||||
The archivist already walks the entire memex (the Gardener scans for broken links
|
||||
and orphans). Adding fact extraction reuses the same traversal infrastructure
|
||||
rather than duplicating it. The extraction is gated by Screamer — the LLM is a
|
||||
proposer, not an extractor. Facts that fail consistency checking are discarded.
|
||||
Facts that pass are admitted with =:provenance :llm-proposed= and =:grounding=
|
||||
to the source heading.
|
||||
|
||||
** Implementation — extends =org/symbolic-archivist.org=
|
||||
|
||||
*** Propose from prose
|
||||
|
||||
Given an Org heading, call the LLM with a minimal prompt (~200 tokens):
|
||||
|
||||
#+begin_example
|
||||
Extract triples from this text as (:entity <name> :relation <keyword> :value <value>).
|
||||
Ground each triple to the heading. Return a list of triples.
|
||||
#+end_example
|
||||
|
||||
The LLM returns structured triples via the existing JSON→plist structured output
|
||||
path from v0.4.2. The prompt is environment-aware: if the heading's file is in
|
||||
=literature/= or has =:literature:= tags, the prompt includes literature-specific
|
||||
relations (=:wrote=, =:published-in=, =:influenced=). If the heading is in
|
||||
=projects/=, the prompt includes coding-specific relations (=:depends-on=,
|
||||
=:tested-by=).
|
||||
|
||||
*** Verify through Screamer
|
||||
|
||||
Each proposed triple runs through =(screamer-admit candidate existing-facts)=
|
||||
from Phase 2. Consistent and coexistent-flagged triples are admitted. Contradictory
|
||||
triples in =:exclusive= domains are discarded with a log entry.
|
||||
|
||||
*** Provenance tracking
|
||||
|
||||
After each extraction run, update provenance counts:
|
||||
|
||||
#+begin_example
|
||||
(:total-facts 847
|
||||
:gate-outcome 312
|
||||
:human-authored 12
|
||||
:deduced 89
|
||||
:llm-proposed 434)
|
||||
#+end_example
|
||||
|
||||
This is the data structure that Phase 4's sufficiency criterion reads. It is
|
||||
also surfaced in the TUI sidebar or =/status= command: "Symbolic index: 847
|
||||
facts (37% from gates, 52% LLM-proposed, 10% deduced, 1% human)."
|
||||
|
||||
*** Rebuildable
|
||||
|
||||
Because every fact has a =:grounding= to an Org heading, the entire LLM-extracted
|
||||
subset can be discarded and re-extracted without losing gate-outcome or deduced
|
||||
facts. The =(fact-purge :provenance :llm-proposed)= function removes all
|
||||
LLM-proposed facts. A subsequent =(archivist-extract-all)= re-extracts from
|
||||
scratch.
|
||||
|
||||
This is the safety net: if the LLM produces a bad extraction that passes
|
||||
Screamer's consistency check (possible in the early stages when the fact store
|
||||
has few existing facts to check against), the extraction can be redone after the
|
||||
fact store has grown. The cost is compute, not data.
|
||||
|
||||
** Verification — ~5 FiveAM tests
|
||||
|
||||
1. =test-archivist-extracts-triples= — given a known Org heading with explicit
|
||||
triples in the prose, the archivist produces the correct triples via LLM.
|
||||
2. =test-archivist-verified-extraction= — a hallucinated triple is rejected by
|
||||
the Screamer admission gate.
|
||||
3. =test-provenance-counts-update= — after extraction, the provenance breakdown
|
||||
is correct.
|
||||
4. =test-purge-llm-facts= — does not delete gate-outcome or deduced facts.
|
||||
5. =test-re-extraction-idempotent= — re-extracting from the same prose after
|
||||
purging produces the same facts (Screamer verification is deterministic
|
||||
given the same starting set).
|
||||
|
||||
** Relation to Other Work
|
||||
|
||||
This is Phase 3 of =notes/passepartout-v3.0.0-roadmap.org=. The archivist's role as proposer
|
||||
is described in =passepartout-neurosymbolic-design-decisions-and-options.org=
|
||||
under "The LLM as Proposer."
|
||||
|
||||
* Phase 4: The "Flip" — Sufficiency Criterion (~50 lines — extends Phase 3)
|
||||
|
||||
** What
|
||||
|
||||
Make the architecture note's central narrative arc operational: a measurable
|
||||
threshold for when the symbolic engine has enough non-lossy facts to bypass the
|
||||
LLM for extraction.
|
||||
|
||||
** Rationale
|
||||
|
||||
The architecture note describes "at some point, the non-lossy facts constitute a
|
||||
sufficient foundation that the symbolic engine can reverse the flow" but provides
|
||||
no criterion for "some point." The sufficiency score makes the flip computable
|
||||
and visible to the user.
|
||||
|
||||
** Implementation — extends =org/symbolic-facts.lisp=
|
||||
|
||||
*** Sufficiency score
|
||||
|
||||
=(fact-sufficiency-ratio)= — returns the ratio of non-lossy facts to total facts:
|
||||
|
||||
#+begin_src lisp
|
||||
(/ (+ (count-provenance :gate-outcome)
|
||||
(count-provenance :human-authored)
|
||||
(count-provenance :deduced))
|
||||
(fact-total-count))
|
||||
#+end_src
|
||||
|
||||
When this ratio exceeds =SUFFICIENCY_THRESHOLD= (configurable env var, default
|
||||
0.7), the system considers its foundation sufficient. The threshold defaults to
|
||||
0.7 because below this, the majority of facts are LLM-proposed and therefore
|
||||
uncertain. Above 0.7, the proven foundation provides enough constraint that
|
||||
Screamer can reliably detect incorrect LLM proposals.
|
||||
|
||||
*** Auto-extraction toggle
|
||||
|
||||
When sufficiency is reached, the archivist switches from "LLM proposes, Screamer
|
||||
verifies" to "Screamer queries existing facts, applies category rules to the new
|
||||
prose, and deduces new facts directly." The LLM is bypassed for categories that
|
||||
have sufficient non-lossy coverage. The LLM is still used for novel categories
|
||||
that have no existing facts.
|
||||
|
||||
The switch is configurable: =AUTO_EXTRACTION_ENABLED=true/false=. When disabled,
|
||||
the system continues with LLM proposals regardless of sufficiency — useful for
|
||||
domains where extraction quality is prioritized over extraction determinism.
|
||||
|
||||
*** Monitor
|
||||
|
||||
The TUI sidebar (v0.8.0) or =/status= command displays:
|
||||
|
||||
#+begin_example
|
||||
Symbolic Index
|
||||
Total facts: 1,247
|
||||
Proven:
|
||||
Gate outcomes: 312 (25%)
|
||||
Human-authored: 47 (4%)
|
||||
Deduced: 521 (42%)
|
||||
─────────────────────────
|
||||
Non-lossy: 880 (71%)
|
||||
LLM-proposed: 367 (29%)
|
||||
─────────────────────────
|
||||
Sufficiency: 71% ✓ (threshold: 70%)
|
||||
Mode: AUTO-EXTRACTION (LLM bypassed for known categories)
|
||||
#+end_example
|
||||
|
||||
** Verification — ~3 FiveAM tests
|
||||
|
||||
1. =test-sufficiency-below-threshold= — with 30% non-lossy facts, auto-extraction
|
||||
is not enabled.
|
||||
2. =test-sufficiency-above-threshold= — with 75% non-lossy facts, auto-extraction
|
||||
is enabled.
|
||||
3. =test-auto-extraction-produces-same-facts-as-llm-extraction= — for a category
|
||||
with sufficient non-lossy coverage, auto-extraction produces facts that a
|
||||
subsequent LLM extraction also produces (the deterministic path is consistent
|
||||
with the probabilistic path).
|
||||
|
||||
** Relation to Other Work
|
||||
|
||||
This is Phase 4 of =notes/passepartout-v3.0.0-roadmap.org=. The flip concept originates in
|
||||
=notes/passepartout-symbolic-engine-exploration.org= (lines 68-76) and is refined in
|
||||
=passepartout-neurosymbolic-design-decisions-and-options.org= under "The Flip."
|
||||
|
||||
* Phase 5: VivaceGraph as Persistent Store (~300 lines — new skill)
|
||||
|
||||
** What
|
||||
|
||||
Replace the ephemeral hash-table triple store with VivaceGraph, a Lisp-native
|
||||
graph database with Prolog-like queries. Add the KG type hierarchy (PM type
|
||||
levels applied to the knowledge layer). Define the persistence format from the
|
||||
fact language that survived Phases 1-4.
|
||||
|
||||
** Rationale
|
||||
|
||||
By this point, the triple fact language has been battle-tested through four
|
||||
phases of gate outcomes, Screamer deductions, LLM proposals, and cross-domain
|
||||
comparisons. The facts that proved useful define the persistent schema. The ones
|
||||
that weren't are left behind. The serialization format is not designed upfront;
|
||||
it emerges from use.
|
||||
|
||||
The transition from ephemeral to persistent is justified when two conditions are
|
||||
met: (1) the fact language has stabilized (categories are being queried, not
|
||||
constantly refactored), and (2) accumulated deductions across sessions provide
|
||||
value that justifies the serialization cost.
|
||||
|
||||
** Implementation — =org/symbolic-vivacegraph.org= → =lisp/symbolic-vivacegraph.lisp= (skill)
|
||||
|
||||
*** Wrap VivaceGraph
|
||||
|
||||
VivaceGraph is available via Quicklisp. Load at runtime. Not an ASDF dependency.
|
||||
If not installed, the fact store continues as a hash table (Phase 1-4 behavior)
|
||||
with a log warning: "VivaceGraph not available — persistence disabled."
|
||||
|
||||
*** Prolog-like queries
|
||||
|
||||
Replace =fact-query= with graph traversals:
|
||||
|
||||
#+begin_src lisp
|
||||
;; Find all files classified as secrets
|
||||
(vivace-query '(:and (:entity ?e)
|
||||
(:member-of-class ?e :secret-file)))
|
||||
|
||||
;; Find all files classified as secrets that were modified today
|
||||
(vivace-query '(:and (:entity ?e)
|
||||
(:member-of-class ?e :secret-file)
|
||||
(:modified-since ?e ,(today-timestamp))))
|
||||
|
||||
;; Find contradictions between Wikidata and the memex
|
||||
(vivace-query '(:and (:entity ?e)
|
||||
(:has-value ?e ?v1 :source :wikidata)
|
||||
(:has-value ?e ?v2 :source :memex)
|
||||
(:not-equal ?v1 ?v2)))
|
||||
#+end_src
|
||||
|
||||
*** KG type hierarchy (Contribution 4 from Whitehead)
|
||||
|
||||
Every entity in the graph carries =:pm-type-level= metadata. Queries cannot
|
||||
return entities whose type level equals or exceeds the querying function's type
|
||||
level. A fact-finding query at type-level 2 cannot return facts at type-level
|
||||
3 or higher. Self-referential knowledge — "this fact defines its own type" —
|
||||
becomes structurally impossible because the type level is assigned at creation
|
||||
and cannot be modified by a fact of the same or higher level.
|
||||
|
||||
This is Contribution 1 (type-level gates) applied to the knowledge layer rather
|
||||
than the execution layer. The dispatcher prevents self-referential /actions/; the
|
||||
KG prevents self-referential /facts/.
|
||||
|
||||
*** Persistence format
|
||||
|
||||
The fact language that survived Phases 1-4 defines the format. Each entity is a
|
||||
node; each triple is an edge with properties (=:grounding=, =:provenance=,
|
||||
=:timestamp=). The format is not a new design — it is the triple schema evolved
|
||||
through use, serialized by VivaceGraph's native persistence.
|
||||
|
||||
If the fact language later evolves to n-ary relations, VivaceGraph's graph model
|
||||
accommodates this natively — edges can carry arbitrary property plists. The
|
||||
triple form is a special case of the general graph model.
|
||||
|
||||
*** Load on startup, save on interval
|
||||
|
||||
On daemon start, =(vivacegraph-load)= reads the last saved graph. On heartbeat,
|
||||
=(vivacegraph-save)= persists the graph in its native format to
|
||||
=~/.cache/passepartout/facts.vg~. The interval matches the existing
|
||||
=*memory-auto-save-interval*=. The save is atomic: write to a temp file, rename
|
||||
on success. Corruption-safe.
|
||||
|
||||
** Verification — ~5 FiveAM tests
|
||||
|
||||
1. =test-vivacegraph-roundtrip= — save and load preserves all facts with
|
||||
provenance metadata.
|
||||
2. =test-prolog-query-returns-results= — a query for all secret files returns
|
||||
the bootstrapped gate facts.
|
||||
3. =test-prolog-query-cross-domain= — a query for contradictions between Wikidata
|
||||
and memex provenance returns correct results.
|
||||
4. =test-type-level-prevents-self-reference= — a query from a type-level-2
|
||||
function cannot return type-level-3 facts.
|
||||
5. =test-fact-store-fallback-without-vivacegraph= — when VivaceGraph is not
|
||||
loaded, the hash-table fallback functions identically to Phase 1-4 behavior.
|
||||
|
||||
** Relation to Other Work
|
||||
|
||||
This is Phase 5 of =notes/passepartout-v3.0.0-roadmap.org= and Contribution 4 from
|
||||
=notes/passepartout-whitehead.org=. The architecture note's Option 1
|
||||
(auto-formalizer KG) converges with Option 4 (one memex, two indices) here —
|
||||
VivaceGraph is the persistence layer for the symbolic index within the
|
||||
one-memex-two-indices architecture.
|
||||
|
||||
* Phase 6: ACL2 for Structural Verification (~200 lines — new skill)
|
||||
|
||||
** What
|
||||
|
||||
Wrap ACL2 as a skill. Prove structural properties of the KG type hierarchy and
|
||||
rule sets. Not for empirical claims.
|
||||
|
||||
** Rationale
|
||||
|
||||
The architecture note positions ACL2 as verifying LLM-proposed facts. But many
|
||||
facts are empirical ("this command is destructive on Linux"), not logical. The
|
||||
Whitehead note clarifies the right role: structural verification. ACL2 proves
|
||||
that the type hierarchy has no cycles, that the rule set is non-contradictory,
|
||||
and that the gate-to-fact bootstrap preserves the Dispatcher's intent. These are
|
||||
structural properties that can be formally verified, not empirical claims that
|
||||
depend on external reality.
|
||||
|
||||
** Implementation — =org/symbolic-acl2.org= → =lisp/symbolic-acl2.lisp= (skill)
|
||||
|
||||
*** Type consistency proofs
|
||||
|
||||
=(acl2-verify-type-hierarchy facts)= — prove that the KG type hierarchy has no
|
||||
cycles: no entity of type-level 3 depends on an entity of type-level 5, no parent
|
||||
category has a child that subsumes it, no category is its own ancestor via the
|
||||
child-of relation. These are structural properties of the graph, independent of
|
||||
what the facts /say/.
|
||||
|
||||
*** Rule set consistency
|
||||
|
||||
=(acl2-verify-rule-consistency rules)= — prove that the accumulated Dispatcher
|
||||
rules (from HITL approvals) are non-contradictory: no rule allows a command that
|
||||
another rule blocks, no rule permits a path access that another denies. If the
|
||||
rule set is contradictory, ACL2 identifies the contradictory subset with the
|
||||
provenance of each rule. The human resolves the contradiction.
|
||||
|
||||
*** Extraction verification
|
||||
|
||||
=(acl2-verify-bootstrap-preservation)= — prove that the gate-to-fact bootstrap
|
||||
(Phase 0-1) preserves the Dispatcher's intent: every blocked pattern in the gate
|
||||
stack maps to a fact in the store; every fact with =:provenance :gate-outcome= is
|
||||
grounded in a specific gate vector; no gate-bootstrapped fact contradicts another
|
||||
gate-bootstrapped fact.
|
||||
|
||||
** Not in scope
|
||||
|
||||
ACL2 does not verify that =rm -rf / is destructive. That is an empirical claim
|
||||
about Linux. Screamer handles empirical consistency (does this new claim
|
||||
contradict existing observations?). ACL2 handles structural consistency (does
|
||||
this reasoning structure have formal flaws?). The boundary is: empirical claims
|
||||
go to Screamer; structural claims go to ACL2.
|
||||
|
||||
** Verification — ~4 FiveAM tests
|
||||
|
||||
1. =test-acl2-type-hierarchy-no-cycles= — a synthetic KG with a type-level cycle
|
||||
is detected and reported.
|
||||
2. =test-acl2-rule-set-contradiction-detected= — two Dispatcher rules that
|
||||
contradict each other produce a contradiction report with provenance.
|
||||
3. =test-acl2-bootstrap-preservation= — the bootstrap extraction from the gate
|
||||
stack is verified to have no missing or extra facts.
|
||||
4. =test-acl2-not-loaded-graceful-degradation= — when ACL2 is not installed, the
|
||||
skill loads but returns ":ACL2 not available — structural verification
|
||||
disabled" without crashing.
|
||||
|
||||
** Relation to Other Work
|
||||
|
||||
This is Phase 6 of =notes/passepartout-v3.0.0-roadmap.org=. ACL2's role is refined in
|
||||
=passepartout-neurosymbolic-design-decisions-and-options.org= from the
|
||||
architecture note's broader claim to the structural verification scope.
|
||||
|
||||
* Phase 7: The 10-80-10 Planner (~500 lines — new skills, last phase)
|
||||
|
||||
** What
|
||||
|
||||
A planning engine built on the mature symbolic index. Screamer expresses task
|
||||
planning as a constraint satisfaction problem. ACL2 verifies plans for structural
|
||||
soundness. The LLM handles the I/O boundaries (natural language → structured goal
|
||||
← natural language response). The symbolic engine handles the reasoning.
|
||||
|
||||
** Rationale
|
||||
|
||||
This is v3.0.0 as described in the architecture note and the ROADMAP. It is the
|
||||
final phase because it requires a populated, queried, and trusted symbolic index.
|
||||
The full planner is useless without a mature ontology and a proven deducer. By
|
||||
the time Phase 7 begins, Phases 0-6 have accumulated months of gate outcomes,
|
||||
Screamer deductions, verified LLM proposals, and human-authored facts. The
|
||||
symbolic index has achieved sufficiency. The ontology has stabilized through use.
|
||||
The planner is built on a foundation, not a speculation.
|
||||
|
||||
** Implementation — =org/symbolic-planner.org= → =lisp/symbolic-planner.lisp= (skill)
|
||||
|
||||
*** Task decomposition as constraint satisfaction
|
||||
|
||||
The user specifies a goal: "refactor the authentication module to support OAuth2."
|
||||
The LLM translates this to a structured goal plist. Screamer expresses the planning
|
||||
problem:
|
||||
|
||||
- /Variables/: subtasks (write OAuth2 client, add token store, update auth
|
||||
middleware, write tests, update documentation)
|
||||
- /Constraints/: dependency ordering (tests depend on implementation), resource
|
||||
limits (one file write at a time), safety invariants (no modification of
|
||||
=core-*= files)
|
||||
- /Objective/: find an ordering that satisfies all constraints
|
||||
|
||||
Screamer returns a viable plan or reports unsolvability with the conflicting
|
||||
constraints.
|
||||
|
||||
*** Plan verification
|
||||
|
||||
ACL2 proves that the plan contains no deadlocks (two subtasks waiting on each
|
||||
other), no dependency cycles (A depends on B depends on C depends on A), and
|
||||
no safety violations (no plan step requires a gate-blocked operation).
|
||||
|
||||
If verification fails, ACL2 identifies the failing subtask and the violated
|
||||
constraint. The planner re-decomposes the problematic branch (the existing
|
||||
ROADMAP's branch pruning, v0.11.0, but symbolically rather than neurally).
|
||||
|
||||
*** Neuro-symbolic boundary
|
||||
|
||||
The LLM handles the I/O boundaries:
|
||||
|
||||
- *Input* (10%): natural language → structured goal plist. "Refactor auth for
|
||||
OAuth2" → =(:goal :refactor-component :target :auth-module :add-feature :oauth2)=.
|
||||
Small prompt, formulaic translation, ~100 tokens.
|
||||
- *Reasoning* (80%): Screamer plans. ACL2 verifies. VivaceGraph provides the
|
||||
facts about file structure, dependencies, and gate constraints. Zero LLM
|
||||
tokens.
|
||||
- *Output* (10%): structured plan → natural language response. The verified plan
|
||||
plist is formatted as "I'll refactor the authentication module in 5 steps:
|
||||
1) Create the OAuth2 client (depends on: nothing, modifies: auth/client.lisp)
|
||||
2) Add the token store..." Small prompt, formulaic translation, ~150 tokens.
|
||||
|
||||
*** TUI visualization
|
||||
|
||||
The plan is rendered as an Org headline tree in the TUI, with each subtask as a
|
||||
node showing its terminal state (=todo=, =next-action=, =in-progress=, =done=,
|
||||
=blocked=, =stuck=), its constraints, and its verified properties. This is the
|
||||
same task tree visualization planned for v0.11.0 in the feature roadmap, but
|
||||
with the addition of Screamer constraint annotations and ACL2 verification
|
||||
badges.
|
||||
|
||||
** Verification — ~6 FiveAM tests
|
||||
|
||||
1. =test-goal-plist-from-natural-language= — natural language input produces
|
||||
correct structured goal plist (LLM-dependent but formulaic; tested with
|
||||
deterministic mock).
|
||||
2. =test-screamer-plan-satisfies-constraints= — Screamer produces a plan that
|
||||
satisfies all specified dependencies and safety constraints.
|
||||
3. =test-screamer-report-unsolvable= — Screamer reports unsolvability when
|
||||
constraints are contradictory.
|
||||
4. =test-acl2-verifies-plan-no-cycles= — ACL2 verifies a valid plan has no
|
||||
dependency cycles.
|
||||
5. =test-acl2-rejects-cyclic-plan= — ACL2 detects a dependency cycle in an
|
||||
invalid plan.
|
||||
6. =test-plan-to-natural-language= — structured plan plist produces readable
|
||||
natural language output.
|
||||
|
||||
** Relation to Other Work
|
||||
|
||||
This is Phase 7 of =notes/passepartout-v3.0.0-roadmap.org=. It corresponds to the ROADMAP's
|
||||
v0.9.0 (task planning) and v3.0.0 (full 10-80-10 architecture). It is the last
|
||||
component because it depends on a mature symbolic index from Phases 0-6.
|
||||
|
||||
* Phase 8+: Semantic Wikipedia Integration (TBD lines — optional acceleration)
|
||||
|
||||
** What
|
||||
|
||||
Load Wikidata entities referenced in the memex into the symbolic index. Every
|
||||
entity the user's prose mentions gets its Wikidata property graph — type hierarchy,
|
||||
relations, dates, citations — as triples with =:provenance :wikidata=.
|
||||
|
||||
** Rationale
|
||||
|
||||
The gate stack provides 50-70 entity classes — adequate for a coding agent.
|
||||
For a general-knowledge memex containing literature, philosophy, history,
|
||||
science, and daily life, 50-70 is starvation. Organic growth through prose
|
||||
extraction (Phase 3) would take years to cover the entities mentioned in a single
|
||||
reading of /Pale Fire/. Wikidata has already done this work at scale.
|
||||
|
||||
The LLM's role in extraction shrinks dramatically. Without Wikidata, the archivist
|
||||
must /discover/ that Nabokov wrote /Pale Fire/, lectured on Kafka, and emigrated
|
||||
from Russia — extracting each triple from prose. With Wikidata, the Nabokov entity
|
||||
is pre-structured. The archivist's job changes from "discover entities" to
|
||||
"connect your heading to the existing entity."
|
||||
|
||||
** Implementation sketch
|
||||
|
||||
1. *Index referenced entities.* Scan memex prose for entity names (capitalized
|
||||
noun phrases, names in Org links, headings in =literature/= directories). For
|
||||
each, attempt Wikidata entity resolution (string match, disambiguation via
|
||||
context).
|
||||
|
||||
2. *Load N-hop property net.* For each resolved entity, load its Wikidata
|
||||
properties: instance-of, subclass-of, authored, published-in, influenced-by,
|
||||
birth-date, death-date, etc. Load the same for entities directly connected
|
||||
to it (1-hop neighbors). Optionally expand to 2-hop for deeply connected
|
||||
domains.
|
||||
|
||||
3. *Admit with co-existent policy.* Wikidata facts are admitted with
|
||||
=:provenance :wikidata= and contradiction policy =:coexistent=. They do not
|
||||
override your memex's facts. They sit alongside them. Contradictions are
|
||||
surfaced, not resolved.
|
||||
|
||||
4. *Cross-domain query.* "What does my memex say about Nabokov that Wikidata
|
||||
doesn't?" "Where does my memex disagree with Wikidata?" "What entities in my
|
||||
memex have no Wikidata counterpart?" These queries are pure VivaceGraph
|
||||
traversals — zero LLM tokens.
|
||||
|
||||
** Not a Phase 0 prerequisite
|
||||
|
||||
Semantic Wikipedia integration is an accelerator, not a prerequisite. Phases
|
||||
0-7 work without it — the ontology grows through gate outcomes, Screamer
|
||||
deductions, LLM proposals, and human authoring. Wikidata compresses the timeline
|
||||
for the broad domain but does not change the architecture. The admission gate
|
||||
(Screamer), contradiction policies, provenance tracking, and neuro-symbolic
|
||||
boundary are identical with or without it.
|
||||
|
||||
** Open question
|
||||
|
||||
How much Wikidata is the right amount? Loading entities referenced in the memex
|
||||
is the minimum. Loading all entities within N hops of those references expands
|
||||
the graph exponentially. The right N depends on the memex's breadth and the user's
|
||||
query patterns. A memex focused entirely on software engineering may need only 1
|
||||
hop. A memex spanning literature, history, philosophy, and science may need 3-4
|
||||
hops. The query performance and memory costs of a large Wikidata load have not
|
||||
been estimated.
|
||||
|
||||
* Summary: Lines and Dependencies
|
||||
|
||||
| Phase | Component | Lines | New Skill? | Depends On | Earliest Release |
|
||||
|-------+-------------------------+-------+------------+-----------------+------------------|
|
||||
| 0 | PM-type-level gates | ~30 | No | Dispatcher | Immediately |
|
||||
| 1 | Triple fact store | ~150 | Yes | Phase 0 | v0.7.2+ |
|
||||
| 2 | Screamer admission | ~200 | Yes | Phase 1 | v0.7.2+ |
|
||||
| 3 | Archivist extraction | ~100 | Extends | Phase 2 | v0.8.0+ |
|
||||
| 4 | Flip — sufficiency | ~50 | Extends | Phase 3 | v0.8.0+ |
|
||||
| 5 | VivaceGraph store | ~300 | Yes | Phase 4 | v0.10.0+ |
|
||||
| 6 | ACL2 verification | ~200 | Yes | Phase 5 | v0.12.0+ |
|
||||
| 7 | 10-80-10 planner | ~500 | Yes | Phase 6 | v3.0.0 |
|
||||
| 8+ | Semantic Wikipedia | TBD | Yes | Phase 5 | TBD |
|
||||
|-------+-------------------------+-------+------------+-----------------+------------------|
|
||||
| Total | | ~1530 | | | |
|
||||
|
||||
This roadmap is independent of the feature roadmap in
|
||||
=passepartout/docs/ROADMAP.org=. Phase 0 ships alongside any v0.7.x patch. The
|
||||
symbolic engine grows in parallel with feature work (TUI improvements, MCP tools,
|
||||
gateway expansion, etc.), not after it. The feature roadmap describes /what/ the
|
||||
agent can do; this roadmap describes /how/ it knows what it knows.
|
||||
|
||||
The total new code across all phases is approximately 1,530 lines. Relative to
|
||||
the existing codebase (~8,000+ lines across 40+ Org source files and 30+ skills),
|
||||
the symbolic engine is a ~20% addition. Relative to the ROADMAP's planned feature
|
||||
work through v0.13.0 (thousands of lines of TUI rendering, MCP protocol
|
||||
implementation, skin engine, planning, etc.), the symbolic engine is a small,
|
||||
orthogonal thread that grows the architecture's reasoning depth while the feature
|
||||
work grows its interaction breadth.
|
||||
|
||||
* Competitive Advantage Analysis
|
||||
|
||||
** Phase 0-1: Deterministic safety, now with type-level guarantees
|
||||
|
||||
The existing Dispatcher gate stack already provides 0-LLM-token safety verification.
|
||||
Phase 0 adds structural guarantees: no heuristic bypassing of the type hierarchy.
|
||||
A request to modify the dispatcher's own rules is impossible by construction, not
|
||||
just caught by pattern matching. No competitor has this — their equivalent of
|
||||
"core file protection" is a prompt instruction, not a type system.
|
||||
|
||||
** Phase 2-3: Verified extraction — the symbolic index grows without corruption
|
||||
|
||||
No competitor verifies extracted facts against an existing knowledge base. Their
|
||||
memory systems (Claude Code's ~extractMemories~, Hermes's MemoryProvider, OpenClaw's
|
||||
session transcripts) record what the LLM /said/ happened, not what the system
|
||||
/proved/ happened. Passepartout's Screamer-gated admission makes the symbolic index
|
||||
a monotonic, verified structure. Facts are admitted because they are consistent,
|
||||
not because the LLM generated them.
|
||||
|
||||
** Phase 4-5: Self-accelerating knowledge — the downward cost curve
|
||||
|
||||
The sufficiency criterion makes Passepartout's "cheaper over time" thesis
|
||||
measurable. As the ratio of non-lossy facts grows, LLM calls for extraction
|
||||
decrease. At sufficiency, extraction of known categories becomes deterministic.
|
||||
The downward cost curve is not a marketing claim — it is a structural property
|
||||
of the architecture, visible through the sufficiency score.
|
||||
|
||||
** Phase 6-7: Provable plan soundness
|
||||
|
||||
No competitor verifies task plans against formal constraints. Claude Code plans
|
||||
in a single LLM call with no post-hoc verification. Hermes decomposes tasks into
|
||||
subtasks but does not prove them non-contradictory. Passepartout's ACL2-verified
|
||||
plans are structurally guaranteed to have no deadlocks, no dependency cycles,
|
||||
and no safety violations. The verification is a proof, not a prompt.
|
||||
|
||||
** Semantic Wikipedia: Entity coverage at zero marginal cost
|
||||
|
||||
No competitor has a general-knowledge entity graph because no competitor has a
|
||||
symbolic engine to populate. Claude Code knows codebases; it doesn't know that
|
||||
Nabokov wrote /Pale Fire/ and lectured on Kafka. Passepartout with Wikidata
|
||||
loaded knows both, and the entity knowledge costs zero LLM tokens — it is loaded
|
||||
once as structured data and queried via VivaceGraph traversals.
|
||||
|
||||
** The permanent competitive advantage
|
||||
|
||||
The competitive advantage is not any single feature. It is the architecture's
|
||||
ability to accumulate verified knowledge from four independent sources (gates,
|
||||
deduction, verified LLM proposals, human authoring) and to make that knowledge
|
||||
queryable with provenance. Competitors accumulate chat transcripts. Passepartout
|
||||
accumulates a provenanced, self-verifying knowledge graph. Transcripts become
|
||||
stale and unreliable. The knowledge graph becomes richer and more trustworthy
|
||||
with every session.
|
||||
|
||||
* What Is NOT Built
|
||||
|
||||
1. *A separate knowledge graph serialization format before the ephemeral phase
|
||||
proves what facts are useful.* Premature format commitment is the ontology
|
||||
problem writ small. Let use determine the format.
|
||||
|
||||
2. *ACL2 verification of empirical claims.* Apple is red. rm -rf / is destructive.
|
||||
These are observations, not theorems. Screamer handles empirical consistency.
|
||||
ACL2 handles structural verification.
|
||||
|
||||
3. *VivaceGraph before Screamer.* The admission gate is the critical path. The
|
||||
persistence layer is an optimization of a working system.
|
||||
|
||||
4. *A per-fact ontology designed upfront.* Extract from the gate stack, extend
|
||||
from deductions and observations, prune through contradiction detection. The
|
||||
ontology is a garden, not a building.
|
||||
|
||||
5. *New core ASDF components.* Every phase is a skill. A corrupted symbolic
|
||||
engine degrades reasoning but does not kill the agent. Satisfies the
|
||||
self-repair criterion.
|
||||
|
||||
6. *A "complete" symbolic index for the broad domain.* The neural index is the
|
||||
permanent gateway to the richness of prose. The symbolic index handles what
|
||||
can be mechanically verified. The boundary is permanent, not transitional.
|
||||
The neuro is the brain. The symbolic is the education.
|
||||
|
||||
* Relation to the Feature Roadmap
|
||||
|
||||
The feature roadmap (=passepartout/docs/ROADMAP.org=) describes Passepartout's
|
||||
evolution through v0.13.0: TUI improvements, MCP-native tools, task planning,
|
||||
skill creation, evaluation harnesses, voice gateways, themes, and channels.
|
||||
These are /interaction surface/ features — they expand what the agent can do.
|
||||
|
||||
This roadmap describes the /reasoning substrate/ — it deepens how the agent
|
||||
knows what it knows. It is independent of the feature sequence. Phase 0 can ship
|
||||
alongside any v0.7.x patch. Phases 1-4 ship during the v0.8.x-v0.10.x feature
|
||||
cycle. Phases 5-7 ship during the v0.11.x-v0.13.x cycle.
|
||||
|
||||
The two roadmaps converge at v3.0.0: the feature roadmap provides the interaction
|
||||
surface (a polished TUI, a rich tool ecosystem, a multi-gateway communication
|
||||
layer); this roadmap provides the reasoning depth (a provenanced knowledge graph,
|
||||
a deterministic constraint solver, a verified planning engine). The surface
|
||||
without the substrate is a chat agent with good UX. The substrate without the
|
||||
surface is a theorem prover without a user. Together, they are the v3.0.0
|
||||
architecture.
|
||||
|
||||
See also:
|
||||
|
||||
- =notes/passepartout-neurosymbolic-design-decisions-and-options.org= — the
|
||||
design rationale for every decision in this roadmap
|
||||
- =notes/passepartout-symbolic-engine-exploration.org= — the original architecture
|
||||
exploration and five architecture options
|
||||
- =notes/passepartout-whitehead.org= — Whitehead's four concrete contributions
|
||||
- =passepartout/docs/ROADMAP.org= — the feature roadmap through v0.13.0
|
||||
- =passepartout/docs/ARCHITECTURE.org= — the current pipeline architecture
|
||||
- =notes/passepartout-v3.0.0-roadmap.org= — the original concrete plan (superseded by this
|
||||
document)
|
||||
#+SUPERSEDED: [2026-05-10 Sun]
|
||||
|
||||
This document has been consolidated into ~passepartout/docs/ROADMAP.org~. Each neurosymbolic phase now has its full implementation spec (rationale, code sketches, test catalog, line budget) inline in the roadmap's version sections:
|
||||
|
||||
| Phase | Version |
|
||||
|-------+---------|
|
||||
| 0 | v0.10.0 |
|
||||
| 0b | v0.12.0 |
|
||||
| 1 | v0.14.0 |
|
||||
| 1a | v0.16.0 |
|
||||
| 2 | v0.18.0 |
|
||||
| 3 | v0.20.0 |
|
||||
| 4 | v0.22.0 |
|
||||
| 5 | v0.25.0 |
|
||||
| 6 | v0.27.0 |
|
||||
| 7 | v0.36.0 |
|
||||
| 8+ | v0.36.1+ |
|
||||
|
||||
The "What Is NOT Built" rationale and "Competitive Advantage Analysis" sections are also now in ROADMAP.org.
|
||||
|
||||
Cross-references are preserved in the original files:
|
||||
- ~notes/passepartout-neurosymbolic-design-decisions-and-options.org~
|
||||
- ~notes/passepartout-symbolic-engine-exploration.org~
|
||||
- ~notes/passepartout-whitehead.org~
|
||||
|
||||
180
projects/AGENTS.md
Normal file
180
projects/AGENTS.md
Normal file
@@ -0,0 +1,180 @@
|
||||
# AGENTS.md
|
||||
|
||||
## Development Cycle (every change)
|
||||
|
||||
0. **Start the runtime** — boot the Lisp image that loads your project.
|
||||
For passepartout: `passepartout daemon` (loads the entire project into one SBCL image).
|
||||
For standalone CL projects: SBCL with `(ql:quickload :your-project)`.
|
||||
The running image IS the development environment. The REPL is mandatory.
|
||||
The SBCL fallback below exists only for bootstrapping (when the runtime cannot
|
||||
start) and CI.
|
||||
|
||||
1. **Read the next TODO** — find the next unreached `*** TODO` item in
|
||||
`docs/ROADMAP.org` (search `*** TODO`). Read its prose, `:PROPERTIES:`,
|
||||
and estimated line budget. That item is the target for this change cycle.
|
||||
|
||||
2. **Create a branch** — `git checkout -b feature/<version>-<name>` from main.
|
||||
Every feature develops in its own branch. Branches are cheap, disposable,
|
||||
and keep abandoned work off main. Name the branch after the version and
|
||||
a short slug: `feature/v0.1.0-layout-engine`, `feature/v0.9.0-eval-harness`.
|
||||
Complex features that span multiple phases may use a single branch with
|
||||
multiple commits rather than one branch per phase.
|
||||
|
||||
3. **Think in org** — write your reasoning, goals, and approach in the .org file first.
|
||||
|
||||
4. **Write contract** — define a `** Contract` section listing each function's behavior:
|
||||
`(fn-name args)`: description. Returns/guarantees ...
|
||||
|
||||
5. **TDD in REPL** — the inner loop runs entirely in the running image:
|
||||
|
||||
a. **Write tests in org** — add `fiveam:test` forms to the `* Test Suite` section
|
||||
of the .org source file. Tests are definitions, not explorations — write them
|
||||
in the file first.
|
||||
|
||||
b. **Send tests to REPL → RED** — evaluate the test forms in the running image.
|
||||
Run the suite. It must FAIL — the implementation doesn't exist yet.
|
||||
Record the failure output in the .org file under the test.
|
||||
|
||||
c. **Develop implementation in REPL** — redefine functions directly in the
|
||||
running image. Explore. Discover the real argument shapes, edge cases, and
|
||||
helper functions through interaction, not speculation. Each `defun` in the
|
||||
REPL is immediate — no tangle, no reload, sub-second feedback.
|
||||
|
||||
d. **Run tests → GREEN** — after each change, re-run the suite from the REPL.
|
||||
When all tests pass, the implementation is complete. If still RED, return to
|
||||
step c. Record the passing output in the .org file under the test.
|
||||
|
||||
e. **Copy code to org** — copy each finished function from the REPL into its
|
||||
own `#+begin_src lisp` block in the .org file. The code is already working;
|
||||
the file is now its permanent home. One function per block. Never write a
|
||||
function in a file that hasn't been proven in the image.
|
||||
|
||||
6. **Update literate prose** — write/update the explanatory text around the code:
|
||||
what it does, why it exists, how it connects to the rest of the system.
|
||||
|
||||
7. **Tangle** — generate the .lisp file from the .org source:
|
||||
```
|
||||
emacs --batch --eval "(progn (require 'org) (find-file \"org/FILE.org\") (org-babel-tangle) (kill-buffer))"
|
||||
```
|
||||
Tangling is a finalization step, not part of the inner loop. The inner loop
|
||||
(steps 5a–5e) happens entirely in the REPL. Tangle once, when the file is
|
||||
ready to commit.
|
||||
|
||||
8. **Run full test suite** — from the REPL, run every test suite in the project:
|
||||
```
|
||||
(fiveam:run-all-tests)
|
||||
```
|
||||
This catches regressions across the entire system. A function that passes its
|
||||
own tests but breaks another module is not done.
|
||||
|
||||
9. **Validate block balance** — check that every `#+begin_src lisp` block in the
|
||||
modified .org files has balanced parentheses. Use your project's equivalent
|
||||
function or the SBCL fallback below.
|
||||
|
||||
10. **Commit on the branch** — include the RED and GREEN test output recorded
|
||||
in the .org file as part of the commit message evidence:
|
||||
```
|
||||
git add org/ lisp/ docs/
|
||||
git commit -m "v0.9.0: eval harness — 10 tasks, regression detection
|
||||
|
||||
RED: 0/10 pass (tasks not yet defined)
|
||||
GREEN: 10/10 pass"
|
||||
```
|
||||
|
||||
11. **Mark the origin TODO DONE** — in `docs/ROADMAP.org`, change the
|
||||
`*** TODO` item to `*** DONE` and add a `:LOGBOOK:` entry with the
|
||||
completion date. This is a separate commit on the branch:
|
||||
#+begin_src org
|
||||
:LOGBOOK:
|
||||
- State "DONE" from "TODO" [YYYY-MM-DD Day]
|
||||
:END:
|
||||
#+end_src
|
||||
|
||||
12. **Merge to main** — the merge IS the release. Rebase onto main first
|
||||
to keep history linear, then fast-forward merge:
|
||||
```
|
||||
git checkout main
|
||||
git merge feature/v0.9.0-eval-harness
|
||||
```
|
||||
|
||||
13. **Bump the submodule** — if the project is a submodule in the parent
|
||||
`memex` repo (e.g., passepartout), stage the submodule pointer and commit:
|
||||
```
|
||||
git add projects/passepartout
|
||||
git commit -m "bump passepartout → v0.9.0"
|
||||
```
|
||||
Standalone projects skip this step.
|
||||
|
||||
14. **Delete the branch** — `git branch -d feature/v0.9.0-eval-harness`.
|
||||
Abandoned branches can be deleted before merge with no cleanup needed.
|
||||
|
||||
## Branch Policy
|
||||
|
||||
- Every feature starts on a branch from main. Branch names: `feature/<version>-<slug>`.
|
||||
- ROADMAP.org changes (DONE markers, LOGBOOK entries) happen on the branch, not
|
||||
on main directly. They merge to main with the feature.
|
||||
- If a feature fails or is abandoned, delete the branch. No revert commits, no
|
||||
dead code on main, no `;; OBSOLETE` comments. Git history preserves the
|
||||
experiment if you need to reference it later.
|
||||
- Rebase onto main before merging. Keep history linear. No merge commits.
|
||||
- Complex features that span multiple roadmap versions may live on one branch
|
||||
with multiple commits, merging to main when the entire chain is stable.
|
||||
- **Bug fixes, typos, docs-only edits, and single-session jobs do not get a
|
||||
branch.** Commit them directly to main. The heuristic: if it can be finished
|
||||
in one session and has no plausible alternative that could replace it, it
|
||||
goes to main. If it spans sessions or might be abandoned for a better
|
||||
approach, it gets a branch.
|
||||
|
||||
## Commands
|
||||
|
||||
Tangle a single file:
|
||||
emacs --batch --eval "(progn (require 'org) (find-file \"org/FILE.org\") (org-babel-tangle) (kill-buffer))"
|
||||
|
||||
Validate structural integrity (org/ source files only):
|
||||
emacs --batch -Q --eval '(progn (find-file "org/FILE.org") (check-parens) (kill-buffer))'
|
||||
|
||||
Run tests (from REPL):
|
||||
(fiveam:run (intern "SUITE-NAME" :project-TESTS))
|
||||
(fiveam:run-all-tests)
|
||||
|
||||
Run tests (SBCL fallback — only when the runtime cannot start):
|
||||
sbcl --noinform \
|
||||
--eval '(load (merge-pathnames "quicklisp/setup.lisp" (user-homedir-pathname)))' \
|
||||
--eval '(ql:quickload :your-project :silent t)' \
|
||||
--eval '(load "lisp/FILE.lisp")' \
|
||||
--eval '(fiveam:run (intern "SUITE-NAME" :project-TESTS))' --quit
|
||||
|
||||
For error details: bind fiveam:*on-failure* to :debug
|
||||
|
||||
## REPL — mandatory
|
||||
|
||||
All development happens in a running Lisp image. Start your runtime:
|
||||
- Passepartout: `passepartout daemon` — boots the entire project, listens on port 9105
|
||||
- Standalone CL projects: `sbcl` with `(ql:quickload :your-project)`
|
||||
|
||||
Send code from opencode using the `lisp` tool (any SBCL project) or the `repl`
|
||||
tool (passepartout daemon on port 9105). The inner loop (step 5a–5e) never leaves
|
||||
the REPL:
|
||||
|
||||
1. Send test forms from .org to REPL → RED
|
||||
2. Redefine functions in REPL → test → iterate
|
||||
3. Send tests → GREEN
|
||||
4. Copy working code back to .org
|
||||
|
||||
Tangle only when the file is complete and ready to commit. Never batch-compile
|
||||
outside the image when the runtime is available. Use the SBCL fallback above only
|
||||
when the runtime itself cannot start.
|
||||
|
||||
## Rules
|
||||
|
||||
- .org is source of truth; .lisp is generated — never edit .lisp directly
|
||||
- Every code change starts with a contract and a failing test
|
||||
- Prove RED before writing implementation
|
||||
- Implementation is developed in the REPL, then copied to .org — never write
|
||||
code in a file that hasn't been proven in the image
|
||||
- Validate before committing
|
||||
- If a tool fails, explain why and ask before trying alternatives
|
||||
- Before shipping a version, run the `** File Update Checklist` in `docs/ROADMAP.org`
|
||||
- **YOU MAY NOT** push a version tag (e.g., `v0.5.0`), create a GitHub release,
|
||||
or run `git push` that triggers CI/CD version workflows without explicit
|
||||
permission. Ask first.
|
||||
Submodule projects/passepartout updated: 138f909a33...96628d00e9
Submodule projects/passepartout-contrib updated: ce17336acd...825ef832ba
Reference in New Issue
Block a user