roadmap: Passepartout native Org-mode knowledge base

Adds roadmap item for Passepartout to ingest and query org files directly — no pandoc/gbrain bridge. Replaces the current org→md→gbrain pipeline with native org parsing, heading-level vector embeddings, property-based entity extraction, and org-id cross-references. Target v0.8.0-v0.9.0 after gate stack and Screamer planner.
gbrain: sync converted org-mode brain files
2026-05-23 06:21:29 +00:00 · 2026-05-23 06:18:37 +00:00
4 changed files with 315 additions and 1 deletions
--- a/ideas/native-org-knowledge-base.org
+++ b/ideas/native-org-knowledge-base.org
@@ -0,0 +1,81 @@
+:PROPERTIES:
+:ID:       7f4e6b9a-2c1d-5e8f-9a3b-6d7c4e5f2a1b
+:CREATED:  [2026-05-23 Sat]
+:END:
+#+title: Passepartout Native Org-Mode Knowledge Base
+#+filetags: :passepartout:roadmap:knowledge:org:gbrain:
+
+** What
+
+Passepartout should be able to use Org-mode files directly as its
+knowledge base — no pandoc conversion, no markdown intermediary.
+
+Currently gbrain provides vector search + entity linking over markdown,
+but we bridge via a conversion layer (org → pandoc → markdown → gbrain).
+This loses Org-mode semantics: properties drawers become flat YAML, tag
+inheritance is lost, file: links become relative markdown links, TODO
+states vanish, and the tree structure (headings with content subtrees)
+collapses into flat markdown headings.
+
+** Why
+
+Org-mode's data model is strictly richer than markdown's. A Passepartout
+that can ingest, index, and query org files natively has:
+- Property-based entity extraction (no separate links: frontmatter needed)
+- Tag-inheritance for automatic categorization
+- TODO/priority/timestamps for knowledge freshness signals
+- ID-based stable cross-references (org-id) that survive file moves
+- Heading-level chunking (one heading = one knowledge unit)
+- The same file format for everything — no split between "authoring format"
+  and "knowledge base format"
+
+** What it replaces
+
+The current pipeline: org file → pandoc → markdown file → gbrain import →
+
+gbrain embed → gbrain query. This is four serial steps with a conversion
+at each boundary that degrades the data model.
+
+The target: org file → (Passepartout-native indexer) → query. Zero
+conversion, zero data loss.
+
+** Architecture sketch
+
+A Passepartout-native knowledge module that directly ingests
+ideas/*.org:
+
+- Parser: extract each heading as a chunk. Preserve:
+  - Heading path (H1 → H2 → H3) as a hierarchical path
+  - Properties drawer as structured metadata
+  - file: links as typed entity references
+  - org-id as stable identifier
+  - Tags (inherited from parent headings)
+  - TODO state, priority, timestamps
+
+- Embedder: vector-embed each heading chunk with metadata prefix
+
+- Query: hybrid search over headings + full-text over content.
+  Result includes the heading path + sibling headings for context.
+
+- Cross-reference graph: build a typed entity graph from:
+  - file: links → typed reference
+  - org-id links → stable cross-doc reference
+  - Tag co-occurrence → implicit relationship
+  - Same-property values → attribute-based grouping
+
+- Dream cycle: auto-discover entities from org properties and file:
+  links. Enrich thin sections. Flag sections with stale timestamps.
+
+** Priority
+
+Below the gate stack and ACL2 planner (v1.0.0 dependencies) but above
+the Lisp Machine hardware. Target: v0.8.0-v0.9.0 range, once Screamer
+planner is stable enough to route queries through the knowledge base.
+
+The short-term bridge (current) is gbrain with nightly org→md sync.
+This is adequate while the gate stack and planner are the priority.
+The native org module replaces gbrain entirely once built.
+
+** See also
+[[file:../../concepts/compliance-framework-mapping.org][Compliance framework mapping]]
+[[file:../../ideas/passepartout-economics.org][Passepartout economics]]
--- a/methodology/AGENTS.md
+++ b/methodology/AGENTS.md
@@ -1,3 +1,13 @@
+---
+type: methodology
+title: AGENTS.md — Development Cycle
+created: 2026-05-11
+tags:
+  - methodology
+  - common-lisp
+  - development-workflow
+---
+
 # AGENTS.md

 ## Development Cycle (every change)
@@ -86,7 +96,7 @@
    completion date. This is a separate commit on the branch:
    #+begin_src org
    :LOGBOOK:
-    - State "DONE"  from "TODO"  [YYYY-MM-DD Day]
+    - State "DONE"  from "TODO"  [2026-05-10 Sat]
    :END:
    #+end_src

--- a/methodology/CLOUDFLARE-SETUP.md
+++ b/methodology/CLOUDFLARE-SETUP.md
@@ -1,3 +1,14 @@
+---
+type: methodology
+title: Cloudflare Infrastructure Setup
+created: 2026-05-11
+tags:
+  - methodology
+  - cloudflare
+  - infrastructure
+  - networking
+---
+
 # Cloudflare Infrastructure Setup

 ## Architecture Overview
--- a/scripts/org-to-gbrain.py
+++ b/scripts/org-to-gbrain.py
@@ -0,0 +1,212 @@
+#!/usr/bin/env python3
+"""Convert brain Org-mode files to markdown + YAML frontmatter and sync into gbrain."""
+import subprocess, re, os, sys
+
+BRAIN = "/root/brain"
+GBRAIN_SRC = "/mnt/hermes/brain"
+PANDOC = "/usr/bin/pandoc"
+BUN = os.path.expanduser("~/.bun/bin/gbrain")
+
+def extract_org_properties(src_path):
+    """Extract :PROPERTIES: drawer and #+title/#+filetags from an org file."""
+    props = {}
+    with open(src_path) as f:
+        content = f.read()
+    
+    # Extract title
+    m = re.search(r'^#\+title:\s+(.+)$', content, re.MULTILINE)
+    if m:
+        props['title'] = m.group(1).strip()
+    
+    # Extract tags
+    m = re.search(r'^#\+filetags:\s+(.+)$', content, re.MULTILINE)
+    if m:
+        tags = [t.strip(':') for t in m.group(1).split()]
+        props['tags'] = tags
+    
+    # Extract ID from PROPERTIES drawer
+    m = re.search(r':ID:\s+([^\s]+)', content)
+    if m:
+        props['org_id'] = m.group(1)
+    
+    # Extract CREATED
+    m = re.search(r':CREATED:\s+\[([^\]]+)\]', content)
+    if m:
+        props['created'] = m.group(1)
+    
+    return props
+
+def strip_org_header(src_path):
+    """Strip the Org-mode header block (PROPERTIES drawer + #+ directives)
+    before feeding to pandoc, so it doesn't produce raw {=org} blocks."""
+    with open(src_path) as f:
+        lines = f.readlines()
+    
+    # Find first non-header line
+    in_properties = False
+    start = 0
+    for i, line in enumerate(lines):
+        if line.strip() == ':PROPERTIES:':
+            in_properties = True
+        if in_properties and line.strip() == ':END:':
+            in_properties = False
+            start = i + 1
+            continue
+        if not in_properties:
+            # Skip #+ lines
+            if line.startswith('#+'):
+                start = i + 1
+                continue
+            # First real content
+            if line.strip():
+                start = i
+                break
+            start = i + 1
+    
+    return ''.join(lines[start:])
+
+def pandoc_convert(clean_body):
+    """Convert org body to markdown via pandoc (stdin mode)."""
+    result = subprocess.run(
+        [PANDOC, "-f", "org", "-t", "markdown-smart"],
+        input=clean_body, capture_output=True, text=True
+    )
+    if result.returncode != 0:
+        print(f"  ERROR pandoc: {result.stderr[:200]}")
+        return None
+    return result.stdout.strip()
+
+def build_frontmatter(props):
+    """Build YAML frontmatter string from extracted properties."""
+    lines = ['---']
+    if 'title' in props:
+        lines.append(f'title: "{props["title"]}"')
+    if 'tags' in props:
+        tags_str = ', '.join(props['tags'])
+        lines.append(f'tags: [{tags_str}]')
+    if 'created' in props:
+        lines.append(f'created: {props["created"]}')
+    lines.append('---')
+    return '\n'.join(lines)
+
+def postprocess_links(md_text):
+    """Convert pandoc's markdown links to gbrain-friendly format."""
+    # Pandoc converts [[file:foo.org][desc]] to [desc](foo.org)
+    # Strip .org extensions from relative links
+    md_text = re.sub(r'\(([a-zA-Z0-9_-]+)\.org\)', r'(\1)', md_text)
+    return md_text
+
+ROUTING = {
+    # Concepts — triad architecture, security, economics theory
+    "triad-overview": "concepts",
+    "agora": "concepts",
+    "stoa": "concepts",
+    "triad-index": "concepts",
+    "domain-gate-packages": "concepts",
+    "verification-appliance": "concepts",
+    "verification-monopoly": "concepts",
+    "infrastructure-lock-in": "concepts",
+    "evaluation-harness": "concepts",
+    "collective-regression-suite": "concepts",
+    "lisp-machine-security": "concepts",
+    "common-logic-iso-24707": "concepts",
+    "self-driving-lisp-machine": "concepts",
+    "lisp-economics": "concepts",
+    "sufficiency-flip": "concepts",
+    "time-estimates": "concepts",
+    "cost-structure": "concepts",
+    "gate-rule-encoding": "concepts",
+    "biology-parallels": "concepts",
+    "comparison-with-symbolics": "concepts",
+    "upgrade-lifecycle": "concepts",
+    "ai-industry-impact": "concepts",
+    "moats": "concepts",
+    "patent-strategy": "concepts",
+    "licensing": "concepts",
+    "verified-skill-marketplace": "concepts",
+    "compute-marketplace": "concepts",
+    "agora-usernames": "concepts",
+    "pds-as-a-service": "concepts",
+    "investment-thesis": "concepts",
+    "compliance-framework-mapping": "concepts",
+    # Ideas — strategy, competitive analysis
+    "competitive-analysis-2026-05": "ideas",
+    "passepartout-economics": "ideas",
+}
+
+def main():
+    # Ensure MECE directories exist
+    for d in ["concepts", "ideas"]:
+        os.makedirs(f"{GBRAIN_SRC}/{d}", exist_ok=True)
+    
+    imported = []
+    
+    for slug, category in ROUTING.items():
+        src_path = f"{BRAIN}/ideas/{slug}.org"
+        if not os.path.exists(src_path):
+            print(f"  SKIP {slug}: not found")
+            continue
+        
+        dst_dir = f"{GBRAIN_SRC}/{category}"
+        dst_path = f"{dst_dir}/{slug}.md"
+        
+        # Extract frontmatter from org properties
+        props = extract_org_properties(src_path)
+        
+        # Strip org header and convert body to markdown
+        clean = strip_org_header(src_path)
+        md = pandoc_convert(clean)
+        if md is None:
+            continue
+        
+        md = postprocess_links(md)
+        
+        # Assemble: YAML frontmatter + markdown body
+        frontmatter = build_frontmatter(props)
+        full = frontmatter + '\n\n' + md + '\n'
+        
+        with open(dst_path, 'w') as f:
+            f.write(full)
+        
+        imported.append(f"{category}/{slug}.md")
+        print(f"  OK  {category}/{slug}")
+    
+    print(f"\nConverted {len(imported)} files.")
+    
+    # Commit to git
+    subprocess.run(["git", "-C", GBRAIN_SRC, "add", "-A"], capture_output=True)
+    subprocess.run(
+        ["git", "-C", GBRAIN_SRC, "commit", "--allow-empty",
+         "-m", "gbrain: sync converted org-mode brain files"],
+        capture_output=True, text=True
+    )
+    
+    # Import into gbrain
+    print("\nImporting into gbrain...")
+    env = {**os.environ, "PATH": f"{os.path.expanduser('~')}/.bun/bin:{os.environ['PATH']}"}
+    result = subprocess.run(
+        [BUN, "import", GBRAIN_SRC],
+        capture_output=True, text=True, env=env
+    )
+    # Show last 20 lines of stdout (skip noise)
+    out_lines = result.stdout.strip().split('\n')
+    for line in out_lines[-25:]:
+        if line.strip() and 'batch caps' not in line and 'max_batch_tokens' not in line:
+            print(f"  {line}")
+    
+    if result.returncode != 0:
+        print(f"  gbrain import exit code: {result.returncode}")
+        return
+    
+    # Embed
+    print("\nGenerating embeddings...")
+    result2 = subprocess.run(
+        [BUN, "embed", "--all"],
+        capture_output=True, text=True, env=env
+    )
+    for line in result2.stdout.strip().split('\n')[-10:]:
+        if line.strip():
+            print(f"  {line}")
+
+if __name__ == "__main__":
+    main()