memex: update AGENTS.md, add passepartout design-decisions notes, SWOT + agora notes, bump submodules → v0.8.1
This commit is contained in:
@@ -442,6 +442,371 @@ design. The gate stack provides the seed. Gate outcomes, prose extraction,
|
||||
deduction, and human authoring grow the shoots. Screamer prunes contradictions.
|
||||
The ontology is a garden, not a building.
|
||||
|
||||
* Empirical Validation — Modular Ontology Engineering with LLMs
|
||||
|
||||
Shimizu and Hitzler (2025, /Journal of Web Semantics/) argue that LLMs can
|
||||
significantly accelerate knowledge graph and ontology engineering — modeling,
|
||||
extension, population, alignment, and entity disambiguation — but /only/ if
|
||||
ontologies are modular. Their paper provides empirical evidence that validates
|
||||
the modular architecture described in this document and exposes concrete patterns
|
||||
the archivist should adopt.
|
||||
|
||||
** The central finding: modularity is the key variable
|
||||
|
||||
In a complex ontology alignment task (mapping between two oceanography ontologies
|
||||
with hundreds of classes and properties), an LLM without module information
|
||||
detected correct mappings for 5 of 109 alignment rules — effectively useless. When
|
||||
the same LLM was given the module structure of the target ontology (20 named
|
||||
conceptual modules such as "Organization," "Cruise," "Physical Sample"), it
|
||||
detected correct mappings for 104 of 109 rules — 95% accuracy. The variable was
|
||||
modularity.
|
||||
|
||||
For ontology population (extracting triples from text), their best results came
|
||||
from prompts that included a schematic representation of a /single module/ plus
|
||||
one extraction example. Against ground truth, this achieved approximately 90%
|
||||
extraction accuracy. Without module-scoped prompting, quality degraded
|
||||
substantially.
|
||||
|
||||
The mechanism: conceptual modules scope the LLM's attention to something
|
||||
human-sized. The paper's central claim — "by somehow limiting the scope, we
|
||||
achieve a more human-like approach — and one more capable of being expressed
|
||||
succinctly in language, and thus more appropriate for LLM-based assistance" — is
|
||||
an independent discovery of the same principle underlying Passepartout's
|
||||
domain-scoped Screamer checks and per-domain cardinality policies.
|
||||
|
||||
** MOMo: a mature modular ontology methodology
|
||||
|
||||
The authors' approach, MOMo (Modular Ontology Modeling), has been developed over a
|
||||
decade and includes:
|
||||
|
||||
- A /step-by-step methodology/ that breaks ontology design into clearly delineated
|
||||
pieces, each "easier to automate than going one-shot from base data to an
|
||||
ontology."
|
||||
- A /pattern description language/ (OPLa, expressed in OWL) for annotating modules
|
||||
so they can be identified programmatically.
|
||||
- A /design library/ (MODL) containing hundreds of commonsense micropatterns
|
||||
organized for programmatic access, including via RAG.
|
||||
- A /Protégé plugin/ (CoModIDE) for graphical modular ontology development.
|
||||
|
||||
Critically, their modules are not formal sub-ontologies with logical boundaries.
|
||||
They are /conceptual/ partitions — groupings of classes, properties, and axioms
|
||||
around "key notions" identified by domain experts. Modules can overlap and nest.
|
||||
There are "no precise rules" for what belongs in a module. The modules provide
|
||||
"conceptual bridges between human expert conceptualization and data reality."
|
||||
|
||||
** What Passepartout should adopt
|
||||
|
||||
*** The modular prompt pattern for the archivist
|
||||
|
||||
The extraction prompt structure that achieved 90% accuracy is concrete and
|
||||
replicable: a schematic representation of a domain module plus a single extraction
|
||||
example. The archivist should use this pattern when extracting facts from prose.
|
||||
Instead of a generic "extract triples from this text" prompt (200 tokens), the
|
||||
prompt should reference the relevant module(s) and include an example triple for
|
||||
each relation in that module. The module provides /context/; the example provides
|
||||
/format/. Both improve LLM extraction quality without increasing Screamer's
|
||||
verification burden.
|
||||
|
||||
*** MOMo modules as ontology scaffold
|
||||
|
||||
The Passepartout notes describe an organic growth model: gate-bootstrapped facts
|
||||
seed the ontology; gate outcomes, Screamer deductions, and archivist proposals
|
||||
grow the shoots. This is correct for the /security and filesystem/ domains where
|
||||
the gate stack already encodes expertise. For the broader memex — literature,
|
||||
daily reflection, project planning — the 50-70 gate-bootstrapped entity classes
|
||||
are starvation.
|
||||
|
||||
MOMo's micropattern library provides a ready-made scaffold for these domains.
|
||||
Hundreds of commonsense patterns already exist for temporal relations, spatial
|
||||
relations, agent-action, organizational structure, provenance, and event
|
||||
participation. Loading these as initial modules — with :policy :plural and
|
||||
=:provenance :external-ontology= — would give the symbolic index a structured
|
||||
vocabulary for domains where the gate stack has nothing to offer. The organic
|
||||
growth model then /extends and refines/ these modules rather than inventing them
|
||||
from scratch. This is the Wikidata strategy applied at the schema level: adopt
|
||||
existing structured knowledge, connect personal facts to it, and surface
|
||||
disagreements rather than resolve them.
|
||||
|
||||
*** OPLa annotation for module identification
|
||||
|
||||
MOMo modules annotated in OPLa can "easily be identified programmatically." If
|
||||
Passepartout annotates its ontology modules in a compatible format (even a
|
||||
simplified plist-based equivalent), the archivist can automatically select the
|
||||
right module(s) when extracting facts from prose. A heading in =literature/=
|
||||
triggers the literature module; a heading in =projects/= triggers the software
|
||||
engineering module; a heading tagged =:personal:= triggers the diary module. The
|
||||
module scopes the prompt. The prompt improves extraction. Screamer gates the
|
||||
result. This is the full pipeline, validated at each step.
|
||||
|
||||
** What this means for the Passepartout architecture
|
||||
|
||||
The paper validates three design decisions already made:
|
||||
|
||||
1. /Modularity is non-negotiable./ The paper found that modularity is the
|
||||
difference between 5% and 95% accuracy on alignment. Passepartout's per-domain
|
||||
cardinality policies and domain-scoped Screamer checks are the same insight
|
||||
implemented in a different context. The paper proves the approach works;
|
||||
Passepartout applies it to verification rather than extraction.
|
||||
|
||||
2. /The extraction pipeline is feasible./ 90% population accuracy with module-
|
||||
scoped prompts means the archivist /can/ extract useful facts from prose. The
|
||||
remaining 10% — the hallucination rate — is what Screamer catches. The paper
|
||||
validates the LLM-as-proposer role; Passepartout adds the Screamer-as-verifier
|
||||
role.
|
||||
|
||||
3. /KGs are positioned as anti-hallucination infrastructure./ The paper explicitly
|
||||
frames knowledge graphs as "ground truth to escape from LLM hallucinations" and
|
||||
as "components of other neurosymbolic approaches." This is the Passepartout
|
||||
thesis — the symbolic index as ground truth against which LLM proposals are
|
||||
checked — stated in the academic literature by the editors of the neurosymbolic
|
||||
AI handbooks.
|
||||
|
||||
And it exposes one gap in the current design:
|
||||
|
||||
1. /Emergent modularity may be slower than designed modularity./ Passepartout's
|
||||
modules are supposed to emerge organically from gate patterns, Screamer
|
||||
generalizations, and cross-domain overlap detection. MOMo's modules are
|
||||
designed by domain experts who identify key notions upfront. The emergent
|
||||
approach is philosophically cleaner — the system learns its own categories —
|
||||
but practically slower. The paper's results suggest that adopting designed
|
||||
modules as a scaffold, and letting emergent growth /refine/ rather than
|
||||
/invent/ them, would compress the timeline for sufficiency by years.
|
||||
|
||||
** Relation to Wikidata loading
|
||||
|
||||
The MOMo micropattern approach and the Wikidata loading strategy are complementary:
|
||||
|
||||
| Layer | MOMo provides | Wikidata provides |
|
||||
|----------------+--------------------------------+--------------------------|
|
||||
| Schema | Modular ontology of relations | — (Wikidata's schema is |
|
||||
| | and entity classes | implicit in its data) |
|
||||
| Instances | — (patterns, not entities) | 100M+ entities with |
|
||||
| | | property-value pairs |
|
||||
|
||||
MOMo gives Passepartout the /relations/ (wrote, lectured-on, influenced,
|
||||
published-in). Wikidata gives Passepartout the /instances/ (Nabokov, Pale Fire,
|
||||
Kafka). Both are needed. Neither alone is sufficient. The MOMo scaffold tells the
|
||||
archivist /what kinds of facts to look for/. The Wikidata graph tells the
|
||||
archivist /which entities those facts are about/. Together they transform the
|
||||
extraction task from "discover entities and their relations from prose" to
|
||||
"connect this prose heading to known entities using known relations" — a
|
||||
dramatically simpler prompt with dramatically higher expected accuracy.
|
||||
|
||||
** Reference
|
||||
|
||||
- Shimizu, C., & Hitzler, P. (2025). Accelerating knowledge graph and ontology
|
||||
engineering with large language models. /Journal of Web Semantics, 85/,
|
||||
100862. https://doi.org/10.1016/j.websem.2025.100862
|
||||
|
||||
** See also
|
||||
|
||||
- =passepartout-neurosymbolic-roadmap.org=: Phase 3 (Archivist) — the modular
|
||||
prompt pattern should be incorporated into the extraction pipeline.
|
||||
- =passepartout-agora.org=: the KEL / contract audit trail as instances of
|
||||
MOMo-style key-lifecycle and contract-lifecycle modules.
|
||||
- =notes/passepartout-SWOT.org=: the SWOT analysis which identifies the ontology
|
||||
problem as the key bottleneck — MOMo partially addresses this.
|
||||
|
||||
** Supporting References
|
||||
|
||||
*** MOMo: the canonical methodology
|
||||
|
||||
Shimizu, Hammar & Hitzler (2023, /Semantic Web Journal/) present the full MOMo
|
||||
methodology — 31 pages covering the step-by-step design process, schema diagrams
|
||||
as knowledge elicitation tools, ODP libraries, OPLa annotation language, and
|
||||
CoModIDE, a Protégé plugin for graphical modular ontology development. The paper
|
||||
was evaluated with usability studies and demonstrates that modular development
|
||||
significantly improves approachability for domain experts who are not ontology
|
||||
engineers.
|
||||
|
||||
Key architectural commitments from MOMo that Passepartout should adopt:
|
||||
|
||||
- /Schema diagrams/ as the primary communication format between ontologist and
|
||||
domain expert. Passepartout's equivalent: the archivist's module-scoped prompt
|
||||
includes a simplified schema diagram of the module being populated.
|
||||
- /Template-based instantiation/ of ontology design patterns into concrete
|
||||
modules. Passepartout's equivalent: micropatterns loaded from MODL are
|
||||
instantiated with entities from the user's memex, producing concrete facts.
|
||||
- /Systematic axiomatization/ — 17 frequently used axiom patterns for each
|
||||
node-edge-node construction in a schema diagram. Passepartout's equivalent:
|
||||
Screamer constraint rules derived from module structure.
|
||||
|
||||
Reference:
|
||||
- Shimizu, C., Hammar, K., & Hitzler, P. (2023). Modular ontology modeling.
|
||||
/Semantic Web, 14/(3), 459–489. https://doi.org/10.3233/SW-222886
|
||||
|
||||
*** Ontology Population — the empirical methodology
|
||||
|
||||
Norouzi et al. (2024) provide the full experimental methodology behind the ~90%
|
||||
extraction accuracy claim. Using the Enslaved.org Hub Ontology as ground truth
|
||||
and Wikipedia articles as source text, they tested five LLMs across a three-stage
|
||||
pipeline: preprocessing, text retrieval, and KG population. The critical finding:
|
||||
prompts that included a /schema diagram/ of the target ontology module (using
|
||||
MOMo's visual conventions with colored boxes for classes, arrows for relations)
|
||||
plus a single extraction example achieved the highest accuracy. Without
|
||||
module-scoped prompts, quality degraded substantially.
|
||||
|
||||
Three findings are directly applicable to the archivist:
|
||||
|
||||
1. /Role chain simplification./ The Enslaved Ontology has complex role chains
|
||||
(e.g., Person → hasRole → Role → inEvent → Event). These were collapsed into
|
||||
shortcut relations (e.g., Person → participatedIn → Event) for LLM extraction.
|
||||
The archivist should maintain two layers: the /logical/ schema with full role
|
||||
chains for Screamer verification, and the /extraction/ schema with simplified
|
||||
relations for LLM prompting.
|
||||
|
||||
2. /Variance across models./ Five LLMs were tested. Performance varied
|
||||
significantly. The archivist should benchmark extraction accuracy per provider
|
||||
and per module, and route extraction tasks to the best-performing model for
|
||||
each module — extending the existing model-tier routing (v0.3.0) from
|
||||
complexity-based to accuracy-based routing.
|
||||
|
||||
3. /Cross-source validation./ The paper used both Wikipedia text and Wikidata
|
||||
as overlapping sources for the same entities, enabling cross-verification.
|
||||
The archivist can do the same: extract facts from the user's prose, extract
|
||||
facts from Wikidata for the same entities, and present disagreements with
|
||||
provenance. This is the =:plural= cardinality policy applied at extraction time.
|
||||
|
||||
Reference:
|
||||
- Norouzi, S.S., Barua, A., Christou, A., Gautam, N., Eells, A., Hitzler, P.,
|
||||
& Shimizu, C. (2024). Ontology Population using LLMs. arXiv:2411.01612.
|
||||
|
||||
* Historical Lineage — McCarthy's Advice Taker
|
||||
|
||||
McCarthy's "Programs with Common Sense" (1959) is the direct intellectual ancestor
|
||||
of the Passepartout architecture. The paper proposed an "advice taker" — a program
|
||||
that "will draw immediate conclusions from a list of premises" expressed in
|
||||
"a suitable formal language (most likely a part of the predicate calculus)." The
|
||||
program would:
|
||||
|
||||
1. Accept declarative statements about the world as input.
|
||||
2. Store them as logical formulas.
|
||||
3. Reason from them to produce new conclusions.
|
||||
4. Accept new facts and revise its conclusions.
|
||||
|
||||
This is precisely the Passepartout pipeline: the archivist extracts declarative
|
||||
facts from prose → Screamer checks them for consistency → VivaceGraph stores them
|
||||
→ the planner reasons from them → new facts from gate outcomes and deductions
|
||||
revise the store. McCarthy proposed it in 1959. Passepartout is building it in
|
||||
2026.
|
||||
|
||||
The gap between McCarthy's proposal and Passepartout's implementation is the
|
||||
/hallucination problem/. McCarthy assumed facts would be entered by a human
|
||||
programmer in formal logic. Passepartout's facts are extracted from natural
|
||||
language prose by an LLM — a probabilistic process that requires deterministic
|
||||
verification. Screamer is the component McCarthy didn't need: a constraint solver
|
||||
that gates LLM-proposed facts against the existing fact store.
|
||||
|
||||
The connection is not metaphorical. McCarthy cited Principia Mathematica as an
|
||||
influence on Lisp. Passepartout's Whitehead note traces the same PM → Lisp
|
||||
lineage. The advice taker → Passepartout lineage completes the arc: PM's formal
|
||||
logic → Lisp → McCarthy's advice taker → Passepartout's neurosymbolic engine.
|
||||
|
||||
Reference:
|
||||
- McCarthy, J. (1959). Programs with Common Sense. /Proceedings of the
|
||||
Teddington Conference on the Mechanization of Thought Processes./
|
||||
|
||||
* Philosophical Validation — The Neurosymbolic Consensus
|
||||
|
||||
Three papers from the neurosymbolic AI research community validate the
|
||||
architectural thesis from complementary angles.
|
||||
|
||||
** Marcus (2020): The Case Against Pure Deep Learning
|
||||
|
||||
Gary Marcus's "The Next Decade in AI" argues that deep learning alone is "data
|
||||
hungry, shallow, brittle, and limited in its ability to generalize." The paper
|
||||
demonstrates GPT-2 failing at basic commonsense reasoning:
|
||||
|
||||
- "Yesterday I dropped my clothes off at the dry cleaners and have yet to pick
|
||||
them up. Where are my clothes?" → GPT-2: "at my mom's house."
|
||||
- "There are six frogs on a log. Two leave, but three join. The number of frogs
|
||||
on the log is now" → GPT-2: "seventeen."
|
||||
|
||||
Marcus proposes four steps toward robust AI: hybrid architecture (combining
|
||||
neural and symbolic), large-scale knowledge (abstract and causal, not just
|
||||
statistical), reasoning (formal inference over structured representations), and
|
||||
cognitive models (frameworks for how entities relate). Passepartout implements all
|
||||
four: the perceive-reason-act pipeline is hybrid, the symbolic index is causal
|
||||
knowledge, Screamer + ACL2 provide reasoning, and the gate-bootstrapped ontology
|
||||
plus MOMo modules provide cognitive models.
|
||||
|
||||
Marcus's core claim — "we have no hope of achieving robust intelligence without
|
||||
first developing systems with deep understanding" — is the justification for
|
||||
Passepartout's entire neurosymbolic investment. The alternative is a system that
|
||||
works "on a good day" and fails unpredictably. The deterministic gate stack and
|
||||
Screamer admission gate are the engineering realization of Marcus's call for
|
||||
robustness.
|
||||
|
||||
Reference:
|
||||
- Marcus, G. (2020). The Next Decade in AI: Four Steps Towards Robust
|
||||
Artificial Intelligence. arXiv:2002.06177.
|
||||
|
||||
** Gaur & Sheth (2023): CREST — Trustworthy Neurosymbolic AI
|
||||
|
||||
Gaur and Sheth present the CREST framework: Consistency, Reliability, user-level
|
||||
Explainability, and Safety build Trust — and they argue these require
|
||||
neurosymbolic methods. Their empirical finding: GPT-3.5 breached safety
|
||||
constraints 30% of the time when asked identical questions repeatedly. Claude's
|
||||
16 safety rules and Sparrow's 23 rules provide no /inherent/ safety — they are
|
||||
heuristic guardrails that can be breached through prompt variation.
|
||||
|
||||
These findings validate three Passepartout design commitments:
|
||||
|
||||
1. /Prompt-level safety is insufficient./ Claude and Sparrow use rules that
|
||||
consume LLM tokens and can be evaded. Passepartout's deterministic gates run
|
||||
in pure Lisp, cost 0 tokens, and cannot be evaded by prompt engineering.
|
||||
|
||||
2. /Inconsistency is the norm, not the exception./ Gaur & Sheth show that even
|
||||
identical queries produce inconsistent responses ~30% of the time. This
|
||||
validates the cardinality model: a system that expects contradiction and
|
||||
surfaces it with provenance is architecturally more honest than one that
|
||||
assumes consistency and silently resolves it.
|
||||
|
||||
3. /Knowledge infusion is required for trust./ The CREST framework embeds
|
||||
domain knowledge (clinical guidelines, procedural knowledge) into LLM
|
||||
pipelines. Passepartout's symbolic index IS the knowledge infusion layer —
|
||||
facts extracted from prose, verified by Screamer, and available for any LLM
|
||||
call through the context assembly pipeline.
|
||||
|
||||
Reference:
|
||||
- Gaur, M., & Sheth, A. (2023). Building Trustworthy NeuroSymbolic AI Systems:
|
||||
Consistency, Reliability, Explainability, and Safety. arXiv:2312.06798.
|
||||
|
||||
** Sheth et al. (2022): Knowledge-Infused Learning
|
||||
|
||||
Sheth, Gunaratna, Bhatt, and Gaur define Knowledge-infused Learning (KiL) as
|
||||
"combining various types of explicit knowledge with data-driven deep learning
|
||||
techniques." They identify three infusion levels (shallow, semi-deep, deep) and
|
||||
position KiL as "a sweet spot in neuro-symbolic AI."
|
||||
|
||||
The paper makes two observations relevant to Passepartout:
|
||||
|
||||
1. /Data alone is not enough./ The opening cites Pedro Domingos ("Data Alone is
|
||||
Not Enough"), Andrew Ng ("the importance of Big Data is overhyped"), and
|
||||
Gary Marcus ("AI that captures how humans think"). These are the intellectual
|
||||
warrant for the symbolic index: a knowledge layer that is independent of any
|
||||
specific LLM call, accumulated across sessions, and verified against existing
|
||||
facts.
|
||||
|
||||
2. /Expert knowledge is external to the model./ Domain experts use "their past
|
||||
experience, web or domain-specific knowledge sources, and annotation
|
||||
guidelines" to create ground truth — resources the LLM cannot access during
|
||||
training. The symbolic index makes these resources queryable: facts from the
|
||||
gate stack (security expertise), from the human (declarative authoring), from
|
||||
Wikidata (world knowledge), and from Screamer deductions (derived expertise).
|
||||
|
||||
Passepartout's architecture is a specific implementation of KiL at the deepest
|
||||
infusion level: knowledge is not appended to prompts (shallow) or embedded in
|
||||
fine-tuning (semi-deep). It is a first-class data structure — the symbolic index
|
||||
— that the LLM queries through the archivist and the planner. The knowledge is
|
||||
living: it accumulates, is verified, carries provenance, and evolves through
|
||||
ontology versioning.
|
||||
|
||||
Reference:
|
||||
- Gaur, M., Gunaratna, K., Bhatt, S., & Sheth, A. (2022). Knowledge-Infused
|
||||
Learning: A Sweet Spot in Neuro-Symbolic AI. /IEEE Internet Computing, 26/(4),
|
||||
5–11. https://doi.org/10.1109/MIC.2022.3179759
|
||||
|
||||
* Semantic Wikipedia as Entity Backbone
|
||||
|
||||
The gate stack provides 50-70 entity classes — adequate for a coding agent where
|
||||
@@ -1412,3 +1777,19 @@ See also:
|
||||
- =passepartout/docs/DESIGN_DECISIONS.org= — the existing design decisions
|
||||
- =passepartout/docs/ARCHITECTURE.org= — the current pipeline architecture
|
||||
- =passepartout/docs/ROADMAP.org= — the feature roadmap through v0.13.0
|
||||
- =notes/passepartout-SWOT.org= — SWOT analysis of the neurosymbolic architecture
|
||||
- =passepartout-agora.org= — Passepartout-Agora integration design
|
||||
- Shimizu, C. & Hitzler, P. (2025). Accelerating knowledge graph and ontology
|
||||
engineering with large language models. /Journal of Web Semantics, 85/, 100862.
|
||||
https://doi.org/10.1016/j.websem.2025.100862
|
||||
- Shimizu, C., Hammar, K., & Hitzler, P. (2023). Modular ontology modeling.
|
||||
/Semantic Web, 14/(3), 459–489. https://doi.org/10.3233/SW-222886
|
||||
- Norouzi, S.S. et al. (2024). Ontology Population using LLMs. arXiv:2411.01612.
|
||||
- McCarthy, J. (1959). Programs with Common Sense. /Proc. Teddington Conf. on
|
||||
the Mechanization of Thought Processes./
|
||||
- Marcus, G. (2020). The Next Decade in AI. arXiv:2002.06177.
|
||||
- Gaur, M. & Sheth, A. (2023). Building Trustworthy NeuroSymbolic AI Systems.
|
||||
arXiv:2312.06798.
|
||||
- Gaur, M., Gunaratna, K., Bhatt, S., & Sheth, A. (2022). Knowledge-Infused
|
||||
Learning. /IEEE Internet Computing, 26/(4), 5–11.
|
||||
- Bhardwaj, V.P. (2026). Agent Behavioral Contracts. arXiv:2602.22302.
|
||||
|
||||
Reference in New Issue
Block a user