39 lines
3.9 KiB
Org Mode
39 lines
3.9 KiB
Org Mode
---
|
||
title: Empirical Validation — MOMo and Modular Ontology Engineering
|
||
type: reference
|
||
tags: :passepartout:architecture:
|
||
---
|
||
|
||
* Empirical Validation — MOMo and Modular Ontology Engineering
|
||
:PROPERTIES:
|
||
:ID: b572b3a0-0238-470f-9bf8-63d356f68fe0
|
||
:ID: design-momo
|
||
:CREATED: [2026-05-08 Fri]
|
||
:WEIGHT: 40
|
||
:END:
|
||
|
||
Shimizu and Hitzler (2025, /Journal of Web Semantics/) argue that LLMs can significantly accelerate knowledge graph and ontology engineering — modeling, extension, population, alignment, and entity disambiguation — but /only/ if ontologies are modular.
|
||
|
||
*** The central finding: modularity is the key variable
|
||
|
||
In a complex ontology alignment task, an LLM without module information detected correct mappings for 5 of 109 alignment rules — effectively useless. When the same LLM was given the module structure of the target ontology (20 named conceptual modules), it detected correct mappings for 104 of 109 rules — 95% accuracy. The variable was modularity.
|
||
|
||
For ontology population (extracting triples from text), their best results came from prompts that included a schematic representation of a /single module/ plus one extraction example. Against ground truth, this achieved approximately 90% extraction accuracy. Without module-scoped prompting, quality degraded substantially.
|
||
|
||
The mechanism: conceptual modules scope the LLM's attention to something human-sized. The paper's central claim — "by somehow limiting the scope, we achieve a more human-like approach — and one more capable of being expressed succinctly in language" — is an independent discovery of the same principle underlying Passepartout's domain-scoped Screamer checks and per-domain cardinality policies.
|
||
|
||
*** What Passepartout should adopt
|
||
|
||
*The modular prompt pattern.* The archivist should use module-scoped prompts: a schematic representation of a domain module plus a single extraction example. Instead of a generic "extract triples" prompt, the prompt should reference the relevant module(s) and include an example triple for each relation in that module. The module provides /context/; the example provides /format/. Both improve LLM extraction quality without increasing Screamer's verification burden.
|
||
|
||
*MOMo modules as ontology scaffold.* The 50-70 gate-bootstrapped entity classes are starvation for the broader memex. MOMo's micropattern library provides a ready-made scaffold — hundreds of commonsense patterns for temporal relations, spatial relations, agent-action, organizational structure, provenance, and event participation. Loading these as initial modules — with =:policy :plural= and =:provenance :external-ontology= — would give the symbolic index a structured vocabulary for domains where the gate stack has nothing to offer. Organic growth then /extends and refines/ these modules rather than inventing them from scratch.
|
||
|
||
*Cross-source validation.* The archivist can extract facts from the user's prose, extract facts from Wikidata for the same entities, and present disagreements with provenance. This is the =:plural= cardinality policy applied at extraction time.
|
||
|
||
The paper validates three design decisions already made: (1) modularity is non-negotiable — the difference between 5% and 95% accuracy; (2) the extraction pipeline is feasible — 90% population accuracy with module-scoped prompts means the archivist /can/ extract useful facts, and the remaining 10% hallucination rate is what Screamer catches; (3) knowledge graphs are positioned as anti-hallucination infrastructure — the Passepartout thesis stated in the academic literature.
|
||
|
||
References:
|
||
- Shimizu, C., & Hitzler, P. (2025). Accelerating knowledge graph and ontology engineering with large language models. /Journal of Web Semantics, 85/, 100862.
|
||
- Shimizu, C., Hammar, K., & Hitzler, P. (2023). Modular ontology modeling. /Semantic Web, 14/(3), 459–489.
|
||
- Norouzi, S.S. et al. (2024). Ontology Population using LLMs. arXiv:2411.01612.
|