hermes-brain/ideas/native-org-knowledge-base.org

:PROPERTIES:
:ID:       7f4e6b9a-2c1d-5e8f-9a3b-6d7c4e5f2a1b
:CREATED:  [2026-05-23 Sat]
:END:
#+title: Passepartout Native Org-Mode Knowledge Base
#+filetags: :passepartout:roadmap:knowledge:org:gbrain:

** What

Passepartout should be able to use Org-mode files directly as its
knowledge base — no pandoc conversion, no markdown intermediary.

Currently gbrain provides vector search + entity linking over markdown,
but we bridge via a conversion layer (org → pandoc → markdown → gbrain).
This loses Org-mode semantics: properties drawers become flat YAML, tag
inheritance is lost, file: links become relative markdown links, TODO
states vanish, and the tree structure (headings with content subtrees)
collapses into flat markdown headings.

** Why

Org-mode's data model is strictly richer than markdown's. A Passepartout
that can ingest, index, and query org files natively has:
- Property-based entity extraction (no separate links: frontmatter needed)
- Tag-inheritance for automatic categorization
- TODO/priority/timestamps for knowledge freshness signals
- ID-based stable cross-references (org-id) that survive file moves
- Heading-level chunking (one heading = one knowledge unit)
- The same file format for everything — no split between "authoring format"
  and "knowledge base format"

** What it replaces

The current pipeline: org file → pandoc → markdown file → gbrain import →

gbrain embed → gbrain query. This is four serial steps with a conversion
at each boundary that degrades the data model.

The target: org file → (Passepartout-native indexer) → query. Zero
conversion, zero data loss.

** Architecture sketch

A Passepartout-native knowledge module that directly ingests
ideas/*.org:

- Parser: extract each heading as a chunk. Preserve:
  - Heading path (H1 → H2 → H3) as a hierarchical path
  - Properties drawer as structured metadata
  - file: links as typed entity references
  - org-id as stable identifier
  - Tags (inherited from parent headings)
  - TODO state, priority, timestamps

- Embedder: vector-embed each heading chunk with metadata prefix

- Query: hybrid search over headings + full-text over content.
  Result includes the heading path + sibling headings for context.

- Cross-reference graph: build a typed entity graph from:
  - file: links → typed reference
  - org-id links → stable cross-doc reference
  - Tag co-occurrence → implicit relationship
  - Same-property values → attribute-based grouping

- Dream cycle: auto-discover entities from org properties and file:
  links. Enrich thin sections. Flag sections with stale timestamps.

** Priority

Below the gate stack and ACL2 planner (v1.0.0 dependencies) but above
the Lisp Machine hardware. Target: v0.8.0-v0.9.0 range, once Screamer
planner is stable enough to route queries through the knowledge base.

The short-term bridge (current) is gbrain with nightly org→md sync.
This is adequate while the gate stack and planner are the priority.
The native org module replaces gbrain entirely once built.

** See also
[[file:../../concepts/compliance-framework-mapping.org][Compliance framework mapping]]
[[file:../../ideas/passepartout-economics.org][Passepartout economics]]