memex: update AGENTS.md, add passepartout design-decisions notes, SWOT + agora notes, bump submodules → v0.8.1

This commit is contained in:
2026-05-10 07:11:08 -04:00
parent 04944a62e2
commit e719443ce7
6 changed files with 1566 additions and 3 deletions

View File

@@ -442,6 +442,371 @@ design. The gate stack provides the seed. Gate outcomes, prose extraction,
deduction, and human authoring grow the shoots. Screamer prunes contradictions.
The ontology is a garden, not a building.
* Empirical Validation — Modular Ontology Engineering with LLMs
Shimizu and Hitzler (2025, /Journal of Web Semantics/) argue that LLMs can
significantly accelerate knowledge graph and ontology engineering — modeling,
extension, population, alignment, and entity disambiguation — but /only/ if
ontologies are modular. Their paper provides empirical evidence that validates
the modular architecture described in this document and exposes concrete patterns
the archivist should adopt.
** The central finding: modularity is the key variable
In a complex ontology alignment task (mapping between two oceanography ontologies
with hundreds of classes and properties), an LLM without module information
detected correct mappings for 5 of 109 alignment rules — effectively useless. When
the same LLM was given the module structure of the target ontology (20 named
conceptual modules such as "Organization," "Cruise," "Physical Sample"), it
detected correct mappings for 104 of 109 rules — 95% accuracy. The variable was
modularity.
For ontology population (extracting triples from text), their best results came
from prompts that included a schematic representation of a /single module/ plus
one extraction example. Against ground truth, this achieved approximately 90%
extraction accuracy. Without module-scoped prompting, quality degraded
substantially.
The mechanism: conceptual modules scope the LLM's attention to something
human-sized. The paper's central claim — "by somehow limiting the scope, we
achieve a more human-like approach — and one more capable of being expressed
succinctly in language, and thus more appropriate for LLM-based assistance" — is
an independent discovery of the same principle underlying Passepartout's
domain-scoped Screamer checks and per-domain cardinality policies.
** MOMo: a mature modular ontology methodology
The authors' approach, MOMo (Modular Ontology Modeling), has been developed over a
decade and includes:
- A /step-by-step methodology/ that breaks ontology design into clearly delineated
pieces, each "easier to automate than going one-shot from base data to an
ontology."
- A /pattern description language/ (OPLa, expressed in OWL) for annotating modules
so they can be identified programmatically.
- A /design library/ (MODL) containing hundreds of commonsense micropatterns
organized for programmatic access, including via RAG.
- A /Protégé plugin/ (CoModIDE) for graphical modular ontology development.
Critically, their modules are not formal sub-ontologies with logical boundaries.
They are /conceptual/ partitions — groupings of classes, properties, and axioms
around "key notions" identified by domain experts. Modules can overlap and nest.
There are "no precise rules" for what belongs in a module. The modules provide
"conceptual bridges between human expert conceptualization and data reality."
** What Passepartout should adopt
*** The modular prompt pattern for the archivist
The extraction prompt structure that achieved 90% accuracy is concrete and
replicable: a schematic representation of a domain module plus a single extraction
example. The archivist should use this pattern when extracting facts from prose.
Instead of a generic "extract triples from this text" prompt (200 tokens), the
prompt should reference the relevant module(s) and include an example triple for
each relation in that module. The module provides /context/; the example provides
/format/. Both improve LLM extraction quality without increasing Screamer's
verification burden.
*** MOMo modules as ontology scaffold
The Passepartout notes describe an organic growth model: gate-bootstrapped facts
seed the ontology; gate outcomes, Screamer deductions, and archivist proposals
grow the shoots. This is correct for the /security and filesystem/ domains where
the gate stack already encodes expertise. For the broader memex — literature,
daily reflection, project planning — the 50-70 gate-bootstrapped entity classes
are starvation.
MOMo's micropattern library provides a ready-made scaffold for these domains.
Hundreds of commonsense patterns already exist for temporal relations, spatial
relations, agent-action, organizational structure, provenance, and event
participation. Loading these as initial modules — with :policy :plural and
=:provenance :external-ontology= — would give the symbolic index a structured
vocabulary for domains where the gate stack has nothing to offer. The organic
growth model then /extends and refines/ these modules rather than inventing them
from scratch. This is the Wikidata strategy applied at the schema level: adopt
existing structured knowledge, connect personal facts to it, and surface
disagreements rather than resolve them.
*** OPLa annotation for module identification
MOMo modules annotated in OPLa can "easily be identified programmatically." If
Passepartout annotates its ontology modules in a compatible format (even a
simplified plist-based equivalent), the archivist can automatically select the
right module(s) when extracting facts from prose. A heading in =literature/=
triggers the literature module; a heading in =projects/= triggers the software
engineering module; a heading tagged =:personal:= triggers the diary module. The
module scopes the prompt. The prompt improves extraction. Screamer gates the
result. This is the full pipeline, validated at each step.
** What this means for the Passepartout architecture
The paper validates three design decisions already made:
1. /Modularity is non-negotiable./ The paper found that modularity is the
difference between 5% and 95% accuracy on alignment. Passepartout's per-domain
cardinality policies and domain-scoped Screamer checks are the same insight
implemented in a different context. The paper proves the approach works;
Passepartout applies it to verification rather than extraction.
2. /The extraction pipeline is feasible./ 90% population accuracy with module-
scoped prompts means the archivist /can/ extract useful facts from prose. The
remaining 10% — the hallucination rate — is what Screamer catches. The paper
validates the LLM-as-proposer role; Passepartout adds the Screamer-as-verifier
role.
3. /KGs are positioned as anti-hallucination infrastructure./ The paper explicitly
frames knowledge graphs as "ground truth to escape from LLM hallucinations" and
as "components of other neurosymbolic approaches." This is the Passepartout
thesis — the symbolic index as ground truth against which LLM proposals are
checked — stated in the academic literature by the editors of the neurosymbolic
AI handbooks.
And it exposes one gap in the current design:
1. /Emergent modularity may be slower than designed modularity./ Passepartout's
modules are supposed to emerge organically from gate patterns, Screamer
generalizations, and cross-domain overlap detection. MOMo's modules are
designed by domain experts who identify key notions upfront. The emergent
approach is philosophically cleaner — the system learns its own categories —
but practically slower. The paper's results suggest that adopting designed
modules as a scaffold, and letting emergent growth /refine/ rather than
/invent/ them, would compress the timeline for sufficiency by years.
** Relation to Wikidata loading
The MOMo micropattern approach and the Wikidata loading strategy are complementary:
| Layer | MOMo provides | Wikidata provides |
|----------------+--------------------------------+--------------------------|
| Schema | Modular ontology of relations | — (Wikidata's schema is |
| | and entity classes | implicit in its data) |
| Instances | — (patterns, not entities) | 100M+ entities with |
| | | property-value pairs |
MOMo gives Passepartout the /relations/ (wrote, lectured-on, influenced,
published-in). Wikidata gives Passepartout the /instances/ (Nabokov, Pale Fire,
Kafka). Both are needed. Neither alone is sufficient. The MOMo scaffold tells the
archivist /what kinds of facts to look for/. The Wikidata graph tells the
archivist /which entities those facts are about/. Together they transform the
extraction task from "discover entities and their relations from prose" to
"connect this prose heading to known entities using known relations" — a
dramatically simpler prompt with dramatically higher expected accuracy.
** Reference
- Shimizu, C., & Hitzler, P. (2025). Accelerating knowledge graph and ontology
engineering with large language models. /Journal of Web Semantics, 85/,
100862. https://doi.org/10.1016/j.websem.2025.100862
** See also
- =passepartout-neurosymbolic-roadmap.org=: Phase 3 (Archivist) — the modular
prompt pattern should be incorporated into the extraction pipeline.
- =passepartout-agora.org=: the KEL / contract audit trail as instances of
MOMo-style key-lifecycle and contract-lifecycle modules.
- =notes/passepartout-SWOT.org=: the SWOT analysis which identifies the ontology
problem as the key bottleneck — MOMo partially addresses this.
** Supporting References
*** MOMo: the canonical methodology
Shimizu, Hammar & Hitzler (2023, /Semantic Web Journal/) present the full MOMo
methodology — 31 pages covering the step-by-step design process, schema diagrams
as knowledge elicitation tools, ODP libraries, OPLa annotation language, and
CoModIDE, a Protégé plugin for graphical modular ontology development. The paper
was evaluated with usability studies and demonstrates that modular development
significantly improves approachability for domain experts who are not ontology
engineers.
Key architectural commitments from MOMo that Passepartout should adopt:
- /Schema diagrams/ as the primary communication format between ontologist and
domain expert. Passepartout's equivalent: the archivist's module-scoped prompt
includes a simplified schema diagram of the module being populated.
- /Template-based instantiation/ of ontology design patterns into concrete
modules. Passepartout's equivalent: micropatterns loaded from MODL are
instantiated with entities from the user's memex, producing concrete facts.
- /Systematic axiomatization/ — 17 frequently used axiom patterns for each
node-edge-node construction in a schema diagram. Passepartout's equivalent:
Screamer constraint rules derived from module structure.
Reference:
- Shimizu, C., Hammar, K., & Hitzler, P. (2023). Modular ontology modeling.
/Semantic Web, 14/(3), 459489. https://doi.org/10.3233/SW-222886
*** Ontology Population — the empirical methodology
Norouzi et al. (2024) provide the full experimental methodology behind the ~90%
extraction accuracy claim. Using the Enslaved.org Hub Ontology as ground truth
and Wikipedia articles as source text, they tested five LLMs across a three-stage
pipeline: preprocessing, text retrieval, and KG population. The critical finding:
prompts that included a /schema diagram/ of the target ontology module (using
MOMo's visual conventions with colored boxes for classes, arrows for relations)
plus a single extraction example achieved the highest accuracy. Without
module-scoped prompts, quality degraded substantially.
Three findings are directly applicable to the archivist:
1. /Role chain simplification./ The Enslaved Ontology has complex role chains
(e.g., Person → hasRole → Role → inEvent → Event). These were collapsed into
shortcut relations (e.g., Person → participatedIn → Event) for LLM extraction.
The archivist should maintain two layers: the /logical/ schema with full role
chains for Screamer verification, and the /extraction/ schema with simplified
relations for LLM prompting.
2. /Variance across models./ Five LLMs were tested. Performance varied
significantly. The archivist should benchmark extraction accuracy per provider
and per module, and route extraction tasks to the best-performing model for
each module — extending the existing model-tier routing (v0.3.0) from
complexity-based to accuracy-based routing.
3. /Cross-source validation./ The paper used both Wikipedia text and Wikidata
as overlapping sources for the same entities, enabling cross-verification.
The archivist can do the same: extract facts from the user's prose, extract
facts from Wikidata for the same entities, and present disagreements with
provenance. This is the =:plural= cardinality policy applied at extraction time.
Reference:
- Norouzi, S.S., Barua, A., Christou, A., Gautam, N., Eells, A., Hitzler, P.,
& Shimizu, C. (2024). Ontology Population using LLMs. arXiv:2411.01612.
* Historical Lineage — McCarthy's Advice Taker
McCarthy's "Programs with Common Sense" (1959) is the direct intellectual ancestor
of the Passepartout architecture. The paper proposed an "advice taker" — a program
that "will draw immediate conclusions from a list of premises" expressed in
"a suitable formal language (most likely a part of the predicate calculus)." The
program would:
1. Accept declarative statements about the world as input.
2. Store them as logical formulas.
3. Reason from them to produce new conclusions.
4. Accept new facts and revise its conclusions.
This is precisely the Passepartout pipeline: the archivist extracts declarative
facts from prose → Screamer checks them for consistency → VivaceGraph stores them
→ the planner reasons from them → new facts from gate outcomes and deductions
revise the store. McCarthy proposed it in 1959. Passepartout is building it in
2026.
The gap between McCarthy's proposal and Passepartout's implementation is the
/hallucination problem/. McCarthy assumed facts would be entered by a human
programmer in formal logic. Passepartout's facts are extracted from natural
language prose by an LLM — a probabilistic process that requires deterministic
verification. Screamer is the component McCarthy didn't need: a constraint solver
that gates LLM-proposed facts against the existing fact store.
The connection is not metaphorical. McCarthy cited Principia Mathematica as an
influence on Lisp. Passepartout's Whitehead note traces the same PM → Lisp
lineage. The advice taker → Passepartout lineage completes the arc: PM's formal
logic → Lisp → McCarthy's advice taker → Passepartout's neurosymbolic engine.
Reference:
- McCarthy, J. (1959). Programs with Common Sense. /Proceedings of the
Teddington Conference on the Mechanization of Thought Processes./
* Philosophical Validation — The Neurosymbolic Consensus
Three papers from the neurosymbolic AI research community validate the
architectural thesis from complementary angles.
** Marcus (2020): The Case Against Pure Deep Learning
Gary Marcus's "The Next Decade in AI" argues that deep learning alone is "data
hungry, shallow, brittle, and limited in its ability to generalize." The paper
demonstrates GPT-2 failing at basic commonsense reasoning:
- "Yesterday I dropped my clothes off at the dry cleaners and have yet to pick
them up. Where are my clothes?" → GPT-2: "at my mom's house."
- "There are six frogs on a log. Two leave, but three join. The number of frogs
on the log is now" → GPT-2: "seventeen."
Marcus proposes four steps toward robust AI: hybrid architecture (combining
neural and symbolic), large-scale knowledge (abstract and causal, not just
statistical), reasoning (formal inference over structured representations), and
cognitive models (frameworks for how entities relate). Passepartout implements all
four: the perceive-reason-act pipeline is hybrid, the symbolic index is causal
knowledge, Screamer + ACL2 provide reasoning, and the gate-bootstrapped ontology
plus MOMo modules provide cognitive models.
Marcus's core claim — "we have no hope of achieving robust intelligence without
first developing systems with deep understanding" — is the justification for
Passepartout's entire neurosymbolic investment. The alternative is a system that
works "on a good day" and fails unpredictably. The deterministic gate stack and
Screamer admission gate are the engineering realization of Marcus's call for
robustness.
Reference:
- Marcus, G. (2020). The Next Decade in AI: Four Steps Towards Robust
Artificial Intelligence. arXiv:2002.06177.
** Gaur & Sheth (2023): CREST — Trustworthy Neurosymbolic AI
Gaur and Sheth present the CREST framework: Consistency, Reliability, user-level
Explainability, and Safety build Trust — and they argue these require
neurosymbolic methods. Their empirical finding: GPT-3.5 breached safety
constraints 30% of the time when asked identical questions repeatedly. Claude's
16 safety rules and Sparrow's 23 rules provide no /inherent/ safety — they are
heuristic guardrails that can be breached through prompt variation.
These findings validate three Passepartout design commitments:
1. /Prompt-level safety is insufficient./ Claude and Sparrow use rules that
consume LLM tokens and can be evaded. Passepartout's deterministic gates run
in pure Lisp, cost 0 tokens, and cannot be evaded by prompt engineering.
2. /Inconsistency is the norm, not the exception./ Gaur & Sheth show that even
identical queries produce inconsistent responses ~30% of the time. This
validates the cardinality model: a system that expects contradiction and
surfaces it with provenance is architecturally more honest than one that
assumes consistency and silently resolves it.
3. /Knowledge infusion is required for trust./ The CREST framework embeds
domain knowledge (clinical guidelines, procedural knowledge) into LLM
pipelines. Passepartout's symbolic index IS the knowledge infusion layer —
facts extracted from prose, verified by Screamer, and available for any LLM
call through the context assembly pipeline.
Reference:
- Gaur, M., & Sheth, A. (2023). Building Trustworthy NeuroSymbolic AI Systems:
Consistency, Reliability, Explainability, and Safety. arXiv:2312.06798.
** Sheth et al. (2022): Knowledge-Infused Learning
Sheth, Gunaratna, Bhatt, and Gaur define Knowledge-infused Learning (KiL) as
"combining various types of explicit knowledge with data-driven deep learning
techniques." They identify three infusion levels (shallow, semi-deep, deep) and
position KiL as "a sweet spot in neuro-symbolic AI."
The paper makes two observations relevant to Passepartout:
1. /Data alone is not enough./ The opening cites Pedro Domingos ("Data Alone is
Not Enough"), Andrew Ng ("the importance of Big Data is overhyped"), and
Gary Marcus ("AI that captures how humans think"). These are the intellectual
warrant for the symbolic index: a knowledge layer that is independent of any
specific LLM call, accumulated across sessions, and verified against existing
facts.
2. /Expert knowledge is external to the model./ Domain experts use "their past
experience, web or domain-specific knowledge sources, and annotation
guidelines" to create ground truth — resources the LLM cannot access during
training. The symbolic index makes these resources queryable: facts from the
gate stack (security expertise), from the human (declarative authoring), from
Wikidata (world knowledge), and from Screamer deductions (derived expertise).
Passepartout's architecture is a specific implementation of KiL at the deepest
infusion level: knowledge is not appended to prompts (shallow) or embedded in
fine-tuning (semi-deep). It is a first-class data structure — the symbolic index
— that the LLM queries through the archivist and the planner. The knowledge is
living: it accumulates, is verified, carries provenance, and evolves through
ontology versioning.
Reference:
- Gaur, M., Gunaratna, K., Bhatt, S., & Sheth, A. (2022). Knowledge-Infused
Learning: A Sweet Spot in Neuro-Symbolic AI. /IEEE Internet Computing, 26/(4),
511. https://doi.org/10.1109/MIC.2022.3179759
* Semantic Wikipedia as Entity Backbone
The gate stack provides 50-70 entity classes — adequate for a coding agent where
@@ -1412,3 +1777,19 @@ See also:
- =passepartout/docs/DESIGN_DECISIONS.org= — the existing design decisions
- =passepartout/docs/ARCHITECTURE.org= — the current pipeline architecture
- =passepartout/docs/ROADMAP.org= — the feature roadmap through v0.13.0
- =notes/passepartout-SWOT.org= — SWOT analysis of the neurosymbolic architecture
- =passepartout-agora.org= — Passepartout-Agora integration design
- Shimizu, C. & Hitzler, P. (2025). Accelerating knowledge graph and ontology
engineering with large language models. /Journal of Web Semantics, 85/, 100862.
https://doi.org/10.1016/j.websem.2025.100862
- Shimizu, C., Hammar, K., & Hitzler, P. (2023). Modular ontology modeling.
/Semantic Web, 14/(3), 459489. https://doi.org/10.3233/SW-222886
- Norouzi, S.S. et al. (2024). Ontology Population using LLMs. arXiv:2411.01612.
- McCarthy, J. (1959). Programs with Common Sense. /Proc. Teddington Conf. on
the Mechanization of Thought Processes./
- Marcus, G. (2020). The Next Decade in AI. arXiv:2002.06177.
- Gaur, M. & Sheth, A. (2023). Building Trustworthy NeuroSymbolic AI Systems.
arXiv:2312.06798.
- Gaur, M., Gunaratna, K., Bhatt, S., & Sheth, A. (2022). Knowledge-Infused
Learning. /IEEE Internet Computing, 26/(4), 511.
- Bhardwaj, V.P. (2026). Agent Behavioral Contracts. arXiv:2602.22302.