memex: update AGENTS.md, add passepartout design-decisions notes, SWOT + agora notes, bump submodules → v0.8.1

2026-05-10 07:11:08 -04:00
parent 04944a62e2
commit e719443ce7
6 changed files with 1566 additions and 3 deletions
--- a/notes/passepartout-neurosymbolic-design-decisions-and-options.org
+++ b/notes/passepartout-neurosymbolic-design-decisions-and-options.org
@@ -442,6 +442,371 @@ design. The gate stack provides the seed. Gate outcomes, prose extraction,
 deduction, and human authoring grow the shoots. Screamer prunes contradictions.
 The ontology is a garden, not a building.

+* Empirical Validation — Modular Ontology Engineering with LLMs
+
+Shimizu and Hitzler (2025, /Journal of Web Semantics/) argue that LLMs can
+significantly accelerate knowledge graph and ontology engineering — modeling,
+extension, population, alignment, and entity disambiguation — but /only/ if
+ontologies are modular. Their paper provides empirical evidence that validates
+the modular architecture described in this document and exposes concrete patterns
+the archivist should adopt.
+
+** The central finding: modularity is the key variable
+
+In a complex ontology alignment task (mapping between two oceanography ontologies
+with hundreds of classes and properties), an LLM without module information
+detected correct mappings for 5 of 109 alignment rules — effectively useless. When
+the same LLM was given the module structure of the target ontology (20 named
+conceptual modules such as "Organization," "Cruise," "Physical Sample"), it
+detected correct mappings for 104 of 109 rules — 95% accuracy. The variable was
+modularity.
+
+For ontology population (extracting triples from text), their best results came
+from prompts that included a schematic representation of a /single module/ plus
+one extraction example. Against ground truth, this achieved approximately 90%
+extraction accuracy. Without module-scoped prompting, quality degraded
+substantially.
+
+The mechanism: conceptual modules scope the LLM's attention to something
+human-sized. The paper's central claim — "by somehow limiting the scope, we
+achieve a more human-like approach — and one more capable of being expressed
+succinctly in language, and thus more appropriate for LLM-based assistance" — is
+an independent discovery of the same principle underlying Passepartout's
+domain-scoped Screamer checks and per-domain cardinality policies.
+
+** MOMo: a mature modular ontology methodology
+
+The authors' approach, MOMo (Modular Ontology Modeling), has been developed over a
+decade and includes:
+
+- A /step-by-step methodology/ that breaks ontology design into clearly delineated
+  pieces, each "easier to automate than going one-shot from base data to an
+  ontology."
+- A /pattern description language/ (OPLa, expressed in OWL) for annotating modules
+  so they can be identified programmatically.
+- A /design library/ (MODL) containing hundreds of commonsense micropatterns
+  organized for programmatic access, including via RAG.
+- A /Protégé plugin/ (CoModIDE) for graphical modular ontology development.
+
+Critically, their modules are not formal sub-ontologies with logical boundaries.
+They are /conceptual/ partitions — groupings of classes, properties, and axioms
+around "key notions" identified by domain experts. Modules can overlap and nest.
+There are "no precise rules" for what belongs in a module. The modules provide
+"conceptual bridges between human expert conceptualization and data reality."
+
+** What Passepartout should adopt
+
+*** The modular prompt pattern for the archivist
+
+The extraction prompt structure that achieved 90% accuracy is concrete and
+replicable: a schematic representation of a domain module plus a single extraction
+example. The archivist should use this pattern when extracting facts from prose.
+Instead of a generic "extract triples from this text" prompt (200 tokens), the
+prompt should reference the relevant module(s) and include an example triple for
+each relation in that module. The module provides /context/; the example provides
+/format/. Both improve LLM extraction quality without increasing Screamer's
+verification burden.
+
+*** MOMo modules as ontology scaffold
+
+The Passepartout notes describe an organic growth model: gate-bootstrapped facts
+seed the ontology; gate outcomes, Screamer deductions, and archivist proposals
+grow the shoots. This is correct for the /security and filesystem/ domains where
+the gate stack already encodes expertise. For the broader memex — literature,
+daily reflection, project planning — the 50-70 gate-bootstrapped entity classes
+are starvation.
+
+MOMo's micropattern library provides a ready-made scaffold for these domains.
+Hundreds of commonsense patterns already exist for temporal relations, spatial
+relations, agent-action, organizational structure, provenance, and event
+participation. Loading these as initial modules — with :policy :plural and
+=:provenance :external-ontology= — would give the symbolic index a structured
+vocabulary for domains where the gate stack has nothing to offer. The organic
+growth model then /extends and refines/ these modules rather than inventing them
+from scratch. This is the Wikidata strategy applied at the schema level: adopt
+existing structured knowledge, connect personal facts to it, and surface
+disagreements rather than resolve them.
+
+*** OPLa annotation for module identification
+
+MOMo modules annotated in OPLa can "easily be identified programmatically." If
+Passepartout annotates its ontology modules in a compatible format (even a
+simplified plist-based equivalent), the archivist can automatically select the
+right module(s) when extracting facts from prose. A heading in =literature/=
+triggers the literature module; a heading in =projects/= triggers the software
+engineering module; a heading tagged =:personal:= triggers the diary module. The
+module scopes the prompt. The prompt improves extraction. Screamer gates the
+result. This is the full pipeline, validated at each step.
+
+** What this means for the Passepartout architecture
+
+The paper validates three design decisions already made:
+
+1. /Modularity is non-negotiable./ The paper found that modularity is the
+   difference between 5% and 95% accuracy on alignment. Passepartout's per-domain
+   cardinality policies and domain-scoped Screamer checks are the same insight
+   implemented in a different context. The paper proves the approach works;
+   Passepartout applies it to verification rather than extraction.
+
+2. /The extraction pipeline is feasible./ 90% population accuracy with module-
+   scoped prompts means the archivist /can/ extract useful facts from prose. The
+   remaining 10% — the hallucination rate — is what Screamer catches. The paper
+   validates the LLM-as-proposer role; Passepartout adds the Screamer-as-verifier
+   role.
+
+3. /KGs are positioned as anti-hallucination infrastructure./ The paper explicitly
+   frames knowledge graphs as "ground truth to escape from LLM hallucinations" and
+   as "components of other neurosymbolic approaches." This is the Passepartout
+   thesis — the symbolic index as ground truth against which LLM proposals are
+   checked — stated in the academic literature by the editors of the neurosymbolic
+   AI handbooks.
+
+And it exposes one gap in the current design:
+
+1. /Emergent modularity may be slower than designed modularity./ Passepartout's
+   modules are supposed to emerge organically from gate patterns, Screamer
+   generalizations, and cross-domain overlap detection. MOMo's modules are
+   designed by domain experts who identify key notions upfront. The emergent
+   approach is philosophically cleaner — the system learns its own categories —
+   but practically slower. The paper's results suggest that adopting designed
+   modules as a scaffold, and letting emergent growth /refine/ rather than
+   /invent/ them, would compress the timeline for sufficiency by years.
+
+** Relation to Wikidata loading
+
+The MOMo micropattern approach and the Wikidata loading strategy are complementary:
+
+| Layer          | MOMo provides                  | Wikidata provides        |
+|----------------+--------------------------------+--------------------------|
+| Schema         | Modular ontology of relations  | — (Wikidata's schema is  |
+|                | and entity classes             |   implicit in its data)  |
+| Instances      | — (patterns, not entities)     | 100M+ entities with      |
+|                |                                | property-value pairs     |
+
+MOMo gives Passepartout the /relations/ (wrote, lectured-on, influenced,
+published-in). Wikidata gives Passepartout the /instances/ (Nabokov, Pale Fire,
+Kafka). Both are needed. Neither alone is sufficient. The MOMo scaffold tells the
+archivist /what kinds of facts to look for/. The Wikidata graph tells the
+archivist /which entities those facts are about/. Together they transform the
+extraction task from "discover entities and their relations from prose" to
+"connect this prose heading to known entities using known relations" — a
+dramatically simpler prompt with dramatically higher expected accuracy.
+
+** Reference
+
+- Shimizu, C., & Hitzler, P. (2025). Accelerating knowledge graph and ontology
+  engineering with large language models. /Journal of Web Semantics, 85/,
+  100862. https://doi.org/10.1016/j.websem.2025.100862
+
+** See also
+
+- =passepartout-neurosymbolic-roadmap.org=: Phase 3 (Archivist) — the modular
+  prompt pattern should be incorporated into the extraction pipeline.
+- =passepartout-agora.org=: the KEL / contract audit trail as instances of
+  MOMo-style key-lifecycle and contract-lifecycle modules.
+- =notes/passepartout-SWOT.org=: the SWOT analysis which identifies the ontology
+  problem as the key bottleneck — MOMo partially addresses this.
+
+** Supporting References
+
+*** MOMo: the canonical methodology
+
+Shimizu, Hammar & Hitzler (2023, /Semantic Web Journal/) present the full MOMo
+methodology — 31 pages covering the step-by-step design process, schema diagrams
+as knowledge elicitation tools, ODP libraries, OPLa annotation language, and
+CoModIDE, a Protégé plugin for graphical modular ontology development. The paper
+was evaluated with usability studies and demonstrates that modular development
+significantly improves approachability for domain experts who are not ontology
+engineers.
+
+Key architectural commitments from MOMo that Passepartout should adopt:
+
+- /Schema diagrams/ as the primary communication format between ontologist and
+  domain expert. Passepartout's equivalent: the archivist's module-scoped prompt
+  includes a simplified schema diagram of the module being populated.
+- /Template-based instantiation/ of ontology design patterns into concrete
+  modules. Passepartout's equivalent: micropatterns loaded from MODL are
+  instantiated with entities from the user's memex, producing concrete facts.
+- /Systematic axiomatization/ — 17 frequently used axiom patterns for each
+  node-edge-node construction in a schema diagram. Passepartout's equivalent:
+  Screamer constraint rules derived from module structure.
+
+Reference:
+- Shimizu, C., Hammar, K., & Hitzler, P. (2023). Modular ontology modeling.
+  /Semantic Web, 14/(3), 459–489. https://doi.org/10.3233/SW-222886
+
+*** Ontology Population — the empirical methodology
+
+Norouzi et al. (2024) provide the full experimental methodology behind the ~90%
+extraction accuracy claim. Using the Enslaved.org Hub Ontology as ground truth
+and Wikipedia articles as source text, they tested five LLMs across a three-stage
+pipeline: preprocessing, text retrieval, and KG population. The critical finding:
+prompts that included a /schema diagram/ of the target ontology module (using
+MOMo's visual conventions with colored boxes for classes, arrows for relations)
+plus a single extraction example achieved the highest accuracy. Without
+module-scoped prompts, quality degraded substantially.
+
+Three findings are directly applicable to the archivist:
+
+1. /Role chain simplification./ The Enslaved Ontology has complex role chains
+   (e.g., Person → hasRole → Role → inEvent → Event). These were collapsed into
+   shortcut relations (e.g., Person → participatedIn → Event) for LLM extraction.
+   The archivist should maintain two layers: the /logical/ schema with full role
+   chains for Screamer verification, and the /extraction/ schema with simplified
+   relations for LLM prompting.
+
+2. /Variance across models./ Five LLMs were tested. Performance varied
+   significantly. The archivist should benchmark extraction accuracy per provider
+   and per module, and route extraction tasks to the best-performing model for
+   each module — extending the existing model-tier routing (v0.3.0) from
+   complexity-based to accuracy-based routing.
+
+3. /Cross-source validation./ The paper used both Wikipedia text and Wikidata
+   as overlapping sources for the same entities, enabling cross-verification.
+   The archivist can do the same: extract facts from the user's prose, extract
+   facts from Wikidata for the same entities, and present disagreements with
+   provenance. This is the =:plural= cardinality policy applied at extraction time.
+
+Reference:
+- Norouzi, S.S., Barua, A., Christou, A., Gautam, N., Eells, A., Hitzler, P.,
+  & Shimizu, C. (2024). Ontology Population using LLMs. arXiv:2411.01612.
+
+* Historical Lineage — McCarthy's Advice Taker
+
+McCarthy's "Programs with Common Sense" (1959) is the direct intellectual ancestor
+of the Passepartout architecture. The paper proposed an "advice taker" — a program
+that "will draw immediate conclusions from a list of premises" expressed in
+"a suitable formal language (most likely a part of the predicate calculus)." The
+program would:
+
+1. Accept declarative statements about the world as input.
+2. Store them as logical formulas.
+3. Reason from them to produce new conclusions.
+4. Accept new facts and revise its conclusions.
+
+This is precisely the Passepartout pipeline: the archivist extracts declarative
+facts from prose → Screamer checks them for consistency → VivaceGraph stores them
+→ the planner reasons from them → new facts from gate outcomes and deductions
+revise the store. McCarthy proposed it in 1959. Passepartout is building it in
+2026.
+
+The gap between McCarthy's proposal and Passepartout's implementation is the
+/hallucination problem/. McCarthy assumed facts would be entered by a human
+programmer in formal logic. Passepartout's facts are extracted from natural
+language prose by an LLM — a probabilistic process that requires deterministic
+verification. Screamer is the component McCarthy didn't need: a constraint solver
+that gates LLM-proposed facts against the existing fact store.
+
+The connection is not metaphorical. McCarthy cited Principia Mathematica as an
+influence on Lisp. Passepartout's Whitehead note traces the same PM → Lisp
+lineage. The advice taker → Passepartout lineage completes the arc: PM's formal
+logic → Lisp → McCarthy's advice taker → Passepartout's neurosymbolic engine.
+
+Reference:
+- McCarthy, J. (1959). Programs with Common Sense. /Proceedings of the
+  Teddington Conference on the Mechanization of Thought Processes./
+
+* Philosophical Validation — The Neurosymbolic Consensus
+
+Three papers from the neurosymbolic AI research community validate the
+architectural thesis from complementary angles.
+
+** Marcus (2020): The Case Against Pure Deep Learning
+
+Gary Marcus's "The Next Decade in AI" argues that deep learning alone is "data
+hungry, shallow, brittle, and limited in its ability to generalize." The paper
+demonstrates GPT-2 failing at basic commonsense reasoning:
+
+- "Yesterday I dropped my clothes off at the dry cleaners and have yet to pick
+  them up. Where are my clothes?" → GPT-2: "at my mom's house."
+- "There are six frogs on a log. Two leave, but three join. The number of frogs
+  on the log is now" → GPT-2: "seventeen."
+
+Marcus proposes four steps toward robust AI: hybrid architecture (combining
+neural and symbolic), large-scale knowledge (abstract and causal, not just
+statistical), reasoning (formal inference over structured representations), and
+cognitive models (frameworks for how entities relate). Passepartout implements all
+four: the perceive-reason-act pipeline is hybrid, the symbolic index is causal
+knowledge, Screamer + ACL2 provide reasoning, and the gate-bootstrapped ontology
+plus MOMo modules provide cognitive models.
+
+Marcus's core claim — "we have no hope of achieving robust intelligence without
+first developing systems with deep understanding" — is the justification for
+Passepartout's entire neurosymbolic investment. The alternative is a system that
+works "on a good day" and fails unpredictably. The deterministic gate stack and
+Screamer admission gate are the engineering realization of Marcus's call for
+robustness.
+
+Reference:
+- Marcus, G. (2020). The Next Decade in AI: Four Steps Towards Robust
+  Artificial Intelligence. arXiv:2002.06177.
+
+** Gaur & Sheth (2023): CREST — Trustworthy Neurosymbolic AI
+
+Gaur and Sheth present the CREST framework: Consistency, Reliability, user-level
+Explainability, and Safety build Trust — and they argue these require
+neurosymbolic methods. Their empirical finding: GPT-3.5 breached safety
+constraints 30% of the time when asked identical questions repeatedly. Claude's
+16 safety rules and Sparrow's 23 rules provide no /inherent/ safety — they are
+heuristic guardrails that can be breached through prompt variation.
+
+These findings validate three Passepartout design commitments:
+
+1. /Prompt-level safety is insufficient./ Claude and Sparrow use rules that
+   consume LLM tokens and can be evaded. Passepartout's deterministic gates run
+   in pure Lisp, cost 0 tokens, and cannot be evaded by prompt engineering.
+
+2. /Inconsistency is the norm, not the exception./ Gaur & Sheth show that even
+   identical queries produce inconsistent responses ~30% of the time. This
+   validates the cardinality model: a system that expects contradiction and
+   surfaces it with provenance is architecturally more honest than one that
+   assumes consistency and silently resolves it.
+
+3. /Knowledge infusion is required for trust./ The CREST framework embeds
+   domain knowledge (clinical guidelines, procedural knowledge) into LLM
+   pipelines. Passepartout's symbolic index IS the knowledge infusion layer —
+   facts extracted from prose, verified by Screamer, and available for any LLM
+   call through the context assembly pipeline.
+
+Reference:
+- Gaur, M., & Sheth, A. (2023). Building Trustworthy NeuroSymbolic AI Systems:
+  Consistency, Reliability, Explainability, and Safety. arXiv:2312.06798.
+
+** Sheth et al. (2022): Knowledge-Infused Learning
+
+Sheth, Gunaratna, Bhatt, and Gaur define Knowledge-infused Learning (KiL) as
+"combining various types of explicit knowledge with data-driven deep learning
+techniques." They identify three infusion levels (shallow, semi-deep, deep) and
+position KiL as "a sweet spot in neuro-symbolic AI."
+
+The paper makes two observations relevant to Passepartout:
+
+1. /Data alone is not enough./ The opening cites Pedro Domingos ("Data Alone is
+   Not Enough"), Andrew Ng ("the importance of Big Data is overhyped"), and
+   Gary Marcus ("AI that captures how humans think"). These are the intellectual
+   warrant for the symbolic index: a knowledge layer that is independent of any
+   specific LLM call, accumulated across sessions, and verified against existing
+   facts.
+
+2. /Expert knowledge is external to the model./ Domain experts use "their past
+   experience, web or domain-specific knowledge sources, and annotation
+   guidelines" to create ground truth — resources the LLM cannot access during
+   training. The symbolic index makes these resources queryable: facts from the
+   gate stack (security expertise), from the human (declarative authoring), from
+   Wikidata (world knowledge), and from Screamer deductions (derived expertise).
+
+Passepartout's architecture is a specific implementation of KiL at the deepest
+infusion level: knowledge is not appended to prompts (shallow) or embedded in
+fine-tuning (semi-deep). It is a first-class data structure — the symbolic index
+— that the LLM queries through the archivist and the planner. The knowledge is
+living: it accumulates, is verified, carries provenance, and evolves through
+ontology versioning.
+
+Reference:
+- Gaur, M., Gunaratna, K., Bhatt, S., & Sheth, A. (2022). Knowledge-Infused
+  Learning: A Sweet Spot in Neuro-Symbolic AI. /IEEE Internet Computing, 26/(4),
+  5–11. https://doi.org/10.1109/MIC.2022.3179759
+
 * Semantic Wikipedia as Entity Backbone

 The gate stack provides 50-70 entity classes — adequate for a coding agent where
@@ -1412,3 +1777,19 @@ See also:
 - =passepartout/docs/DESIGN_DECISIONS.org= — the existing design decisions
 - =passepartout/docs/ARCHITECTURE.org= — the current pipeline architecture
 - =passepartout/docs/ROADMAP.org= — the feature roadmap through v0.13.0
+- =notes/passepartout-SWOT.org= — SWOT analysis of the neurosymbolic architecture
+- =passepartout-agora.org= — Passepartout-Agora integration design
+- Shimizu, C. & Hitzler, P. (2025). Accelerating knowledge graph and ontology
+  engineering with large language models. /Journal of Web Semantics, 85/, 100862.
+  https://doi.org/10.1016/j.websem.2025.100862
+- Shimizu, C., Hammar, K., & Hitzler, P. (2023). Modular ontology modeling.
+  /Semantic Web, 14/(3), 459–489. https://doi.org/10.3233/SW-222886
+- Norouzi, S.S. et al. (2024). Ontology Population using LLMs. arXiv:2411.01612.
+- McCarthy, J. (1959). Programs with Common Sense. /Proc. Teddington Conf. on
+  the Mechanization of Thought Processes./
+- Marcus, G. (2020). The Next Decade in AI. arXiv:2002.06177.
+- Gaur, M. & Sheth, A. (2023). Building Trustworthy NeuroSymbolic AI Systems.
+  arXiv:2312.06798.
+- Gaur, M., Gunaratna, K., Bhatt, S., & Sheth, A. (2022). Knowledge-Infused
+  Learning. /IEEE Internet Computing, 26/(4), 5–11.
+- Bhardwaj, V.P. (2026). Agent Behavioral Contracts. arXiv:2602.22302.