2.8 KiB
Semantic Wikipedia as Entity Backbone
The gate stack provides 50-70 entity classes — adequate for a coding agent where the domain is bounded to files, commands, and code symbols. For a general-knowledge memex, 50-70 is starvation. Your memex mentions Nabokov, Pale Fire, Kinbote, Zembla, paranoid reading, unreliable narrators, postmodernism, butterfly migration, chess problems, and the Russian exile experience. The gate stack knows none of these. Organic growth through prose extraction would take years just to cover the entities in one person's engagement with a single novel.
Wikidata has already done this work: approximately 2 million entity classes, over 100 million entities, a decade of human curation. By loading the neighborhood of your memex into the symbolic index (entities referenced in your prose, plus their N-hop property net from Wikidata), the entity recognition problem vanishes. The archivist doesn't need to discover Nabokov from your diary. It needs to connect your heading to the existing Wikidata entity. That is a simpler task — reference resolution, not knowledge extraction.
The LLM's role shrinks to three thin boundaries:
- Input translation — natural language question to structured query. "What do I think about monorepos?" →
(fact-query :entity :monorepo :relation :opinion :source :memex). Formulaic, ~100 tokens, any model sufficient. - Prose to candidate triple — for personal memex entries that have no Wikidata counterpart: your opinions, your day's events, your project plans. Proposals verified by Screamer before admission. This is the only extraction path that still requires an LLM, and its scope is limited to what Wikidata cannot provide.
- Result to prose — structured answer to readable sentence. "Your 2023 diary says 8848m. Wikidata (last edited Feb 2024) says 8849m. They disagree on height." The reasoning is done; the LLM wraps the plist in grammar. ~100 tokens, any model sufficient, purely cosmetic.
Everything else — the gate stack, the fact store, the constraint solver, the type hierarchy, the provenance tracking, the contradiction surfacing, the cross-domain comparison — is pure deterministic Lisp with zero LLM tokens.
The decisive simplification: without Wikidata, the archivist must discover entities from prose. With Wikidata loaded, the entity graph is pre-structured. The archivist's job changes from "discover that Nabokov wrote Pale Fire and lectured on Kafka" to "verify that the Nabokov referenced in heading #47 is Wikidata item Q36591."
Wikidata facts are admitted with :provenance :wikidata and cardinality policy :plural. They do not override your memex's facts. They sit alongside them. Disagreements are surfaced, not resolved.