Files
passepartout/docs/ROADMAP.org
Amr Gharbeia 2d18fa4525 docs: port TUI roadmap to cl-tty, mark Emacs as secondary client
v0.8.0: Information Radiator now built on cl-tty v1.1.0. Minibuffer
uses cl-tty Dialog stack. New TODO items: conversation view (ScrollBox
+ Markdown), command palette (Select), sidebar (slot system), status bar
(Box + Theme), keybindings (keymap).

v0.9.1: Emacs is now an optional secondary client, not the primary
bridge. cl-tty is the primary TUI.
2026-05-13 11:41:41 -04:00

134 KiB
Raw Blame History

Passepartout Evolutionary Roadmap

The Evolutionary Roadmap

Understanding Passepartout as a function in time is not nostalgia. It is architectural guidance. Every decision in v0.x should be made with awareness of where the system is going. Code written today becomes the substrate for the Lisp Machine. Skills designed today become the vocabulary the symbolic engine speaks tomorrow.

The probabilistic beginning is not a weakness to overcome. It is the bootstrap. The system learns the domain through probabilistic inference, and that learned knowledge becomes the seed for the symbolic engine. By the time the symbolic engine takes over, it has a rich knowledge graph to reason about, grown from thousands of probabilistic interactions.

This is how you build a reasoning machine: start with a learner, make it learn to verify by watching itself and its user, let verification become the core. Every blocked action becomes a rule. Every approved exception becomes a pattern. The symbolic layer grows at the probabilistic layer's expense. Remove the learner once it has learned enough.

Each version expands the deterministic layer. The Dispatcher writes rules from approved exceptions. Shadow mode runs trial executions. Tool permission tiers mature from simple allow/deny to nuanced context-aware policies. The agent becomes less likely to attempt dangerous actions not because it is smarter but because the guard has more complete information.

The roadmap works backwards from Neurosymbolic Maturity (v1.0.0) and Lisp Machine Emergence (v2.0.0). Each build step is one minor version — one capability, measured in lines, verified by tests. Breadth releases (TUI, tools, gateways) alternate with depth releases (fact store, Screamer, VivaceGraph, ACL2) so the system is usable at every step.

The TODO states in each version's Tasks section are the authoritative task tracker. The feature tables describe what each version delivers.

Feature releases increment the minor version (v0.X.0). Bugfix and hardening releases increment the patch version (v0.X.Y). This ensures that security patches and critical fixes are visible in the version number and can ship independently of feature work. No feature release ships without its prerequisite hardening releases resolved.

File Update Checklist

When a version's state changes (DONE → tested → released), update these locations:

  1. ROADMAP.org — mark item DONE, update LOGBOOK timestamp
  2. README.org — update version badge (line 6), update Current Capabilities table (add new Stable rows for shipped features, remove Planned rows that have shipped)
  3. ~.env.example — update version references as needed
  4. lisp/core-transport.lisp — update the make-hello-message version string
  5. passepartout (bash entry point) — update version reference
  6. CHANGELOG.org — add new version entry with DONE items, LOGBOOK release date, and feature summaries

On release:

  1. Tag the release on GitHub
  2. Extract DONE items from ROADMAP (all items with LOGBOOK timestamps since the last release tag) and use as the release notes body
  3. If a CHANGELOG.md is needed for packaging tools, auto-generate it from ROADMAP DONE items

TODO v0.8.0: Information Radiator (Foundation)

Sidebar (6 panels), sidebar overlay mode (<120 cols), command palette (Ctrl+P), TrueColor theme (8 presets), unified minibuffer panel with slash-command context menu and sub-mode navigation (wizard, settings, help) — all built on cl-tty v1.1.0.

The croatoan TUI is replaced entirely. cl-tty provides the widget set (box, text, scrollbox, select, markdown, dialog), keybinding system, and theme engine. Passepartout's job is wiring — cl-tty components call the daemon's TCP API and render its response structures.

TODO Minibuffer — cl-tty dialog stack

Replace ad-hoc overlay windows with cl-tty's Dialog stack. Typing / auto-opens a select-dialog with 25 slash commands (filtered in real time). Selecting =/wizard= transitions to a ~prompt-dialog in the same panel — cl-tty's *dialog-stack* handles push/pop, Esc dismisses. Future sub-modes (/settings, /help) slot in as additional dialog types.

  • Define *slash-commands* — the same data structure, now driving cl-tty's Select options
  • Wire select-dialog on-Enter to push the next dialog type (wizard, settings, help)
  • Implement wizard-dialog subclass — validates UUID, writes /.passepartout/config.lisp
  • Daisy-chain dialog state: wizard enters UUID → settings panel controls hotkeys/theme → help panel shows slash command reference

~80 lines (down from ~150 — cl-tty's Select+Dialog replaces custom modal dispatch).

TODO Conversation view — cl-tty ScrollBox + Markdown

  • ScrollBox with sticky-scroll (auto-follows new content, respects manual scroll-up)
  • User messages rendered as Box (role-colored left border)
  • Agent messages rendered via cl-tty's Markdown + Code + Diff renderables
  • Tool calls rendered as Select (collapsible, status-indicated: spinner running / green done / red error)
  • Gate trace as a collapsible Box within agent messages (property-drawer style)

~150 lines.

TODO Command palette — cl-tty Select

  • Ctrl+P opens a select-dialog with all daemon commands
  • Fuzzy-filtered with categories (session, memory, system, help)
  • Enter dispatches the command to the daemon via TCP, displays result in conversation

~40 lines.

TODO Sidebar — cl-tty slot system

  • 6 panels as cl-tty slot registrations (gate trace, focus, rules, context, cost, files)
  • Toggle with Ctrl+B or auto-hide on narrow terminals (<120 cols)
  • Panel data sourced from daemon's existing response plist keys (:rule-count, :focal-id, :gate-trace, etc.)

~80 lines.

TODO Status bar — cl-tty Box + Theme

  • Bottom-most line: directory, LSP status (green dot), MCP count, /status hint
  • Degraded-mode signaling (amber when *degraded-components* non-nil)
  • cl-tty theme tokens for colors — works with all 8 presets

~30 lines.

TODO Keybinding layer — cl-tty keymap

  • Global: Ctrl+P (palette), Ctrl+B (sidebar), Ctrl+Q (quit), PageUp/PageDn (scroll)
  • Prompt: Enter (send), Ctrl+C (interrupt), Up/Dn (history)
  • cl-tty's layered keymaps handle priority (global → local → input)

~40 lines.

~420 lines total.

v0.9.0: Eval Harness — Safety Net First

Every subsequent release ships with automated regression protection. The eval harness is the gate that makes self-modification safe — before any neurosymbolic component modifies the system, the harness verifies nothing broke.

TODO Internal evaluation harness — 10 tasks, regression detection

  • New skill: symbolic-evaluation.orgsymbolic-evaluation.lisp
  • deftask macro: define an eval task with :setup (create test environment), :prompt (what to ask the agent), :verify (function that checks the output), :teardown (cleanup)
  • run-eval-suite: run all registered tasks, produce score (pass count / total), per-task diagnostics
  • Initial 10 tasks: find TODOs, create Org note, search codebase, read file, query memory, list projects, run safe shell command, find definition, set TODO state, summarize session
  • Regression mode: run after each version build. Fail CI if score drops.
  • Task suite grows with codebase: every bug fix adds a regression task

~200 lines.

v0.9.1: Emacs Development Environment — Secondary Client

cl-tty is the primary TUI (v0.8.0). The Emacs major mode is an optional secondary client for users who prefer Emacs-based workflows. Both clients communicate with the same daemon over the same TCP protocol — they are interchangeable frontends, not competing architectures.

TODO Emacs major mode

  • passepartout-mode major mode for the conversation buffer
  • Message rendering as Org headlines: role prefix (:user:, :agent:, :system:), universal timestamp, content in body. Gate trace as property drawer under agent message headlines
  • Streaming insertion: LLM response chunks arrive from the daemon and insert incrementally into the buffer (insert as chunks arrive, update on each frame)
  • Read-only protection on agent response regions (editable regions only in user input area)
  • Keybindings: C-c C-c send current input, C-c C-k interrupt/stop, C-c C-a approve HITL, C-c C-d deny HITL
  • ~100 lines elisp

TODO M-x command surface

Replace croatoan's / slash-command panel with Emacs M-x dispatch. Each command is a defun passeparate-<action> with interactive completion:

  • passepartout-focus — completing-read over node IDs from the daemon's focus response. Sets the foveal context
  • passepartout-eval — eval a Lisp form in the daemon. Read from minibuffer, send as framed plist, insert result
  • passepartout-theme — completing-read over available themes. Sends /theme <name> to daemon
  • passepartout-export — export session. Prefix arg selects format (C-u M-x passeparate-export)
  • passepartout-sidebar — toggle the sidebar buffer visibility
  • passepartout-config — completing-read over config keys. Sets env var, triggers daemon reload
  • passepartout-coach — run self-diagnosis. Inserts coaching report as new message
  • passepartout-agenda — run Org agenda query. Inserts results as new message
  • passepartout-quit — close connection and kill buffer

Each command is a thin wrapper around passepartout-send (the existing TCP bridge from v0.4.0): construct the correct plist, send it, and insert the response. Completing-read, history, and docstrings are free. ~80 lines elisp.

TODO Sidebar buffer

  • passepartout-sidebar-mode — a dedicated side window (right, 42 chars) that updates on each daemon response
  • Gate Trace panel — per-gate results from the most recent agent response. Colored by state (green/yellow/red)
  • Focus panel — current foveal node ID + related node count
  • Rules panel — rule counter with session delta. When symbolic engine is active, shows sufficiency score and provenance breakdown
  • Context panel — token gauge (percentage bar + color coding)
  • Cost panel — session cost updated after each LLM call
  • Files panel — modified files list with +/- line counts
  • All data already exists in the daemon's response plist (:rule-count, :foveal-id, :gate-trace, :sufficiency-ratio). The sidebar is a formatting layer
  • Toggle via M-x passeparate-sidebar or C-c C-s
  • ~60 lines elisp

TODO Daemon lifecycle

  • passepartout-start — runs passepartout daemon in a background process, waits for port 9105, connects
  • passepartout-stop — sends shutdown signal, kills buffer
  • passepartout-restart — stop + start
  • passepartout-status — checks daemon health via /status command, displays in sidebar
  • ~20 lines elisp + bash wrappers

Total: ~260 lines elisp, persisting through v2.0.0+.

v0.10.0: Phase 0 — Type-Level Gates + Core Integrity (~75 lines)

Add :type-level metadata to the existing defgate and def-cognitive-tool macros. Before any gate predicate evaluates, the dispatcher checks structural type compatibility: a signal at type-level 5 cannot pass a gate at type-level 4 or lower. Self-modification of the safety layer becomes impossible by construction.

Rationale

The Dispatcher gate stack currently prevents self-modification through pattern matching — gate vector 2b catches writes to core-* files as a heuristic. But there is no structural guarantee preventing a request from modifying the rules that validate it. Pattern-based protection can be bypassed through indirection (an eval that constructs a write, a skill that redefines a gate function at runtime). A type-level check is not heuristic — it is a category error rejected before any predicate runs, just as PM's theory of types made self-membership syntactically invalid before any logical evaluation.

Implementation

  1. Add :type-level keyword argument to defgate (default 0) and def-cognitive-tool (default 0) in core-skills.org.
  2. Add gate-type-check to the dispatcher's run-gates function in security-dispatcher.org, executed before any gate predicate.
  3. Assign type levels to existing cognitive tools: self-build-core at 5, write-file at 3, read-file at 1, shell at 2, eval at 4.
  4. Assign type levels to existing gate vectors: self-build boundary at 5, shell safety at 3, path protection at 2, network exfil at 2, secret content at 1.
  5. Add dispatcher-check-self-termination: scan shell commands for patterns targeting the Passepartout process (kill -9 <pid>, rm -rf ~/.cache/passepartout/, sudo apt remove sbcl). Return :reject-self-termination with a diagnostic message explaining which command matched and why it would destroy the agent. Human override is possible via HITL — the gate does not prevent the human from issuing the command in a terminal. It prevents the LLM from issuing it accidentally. ~20 lines.
  6. Add integrity-verify-core-files: on heartbeat, hash the eight core files against known-good values stored at daemon startup. On mismatch, inject an integrity alert into the signal queue. ~25 lines, uses existing SHA-256 infrastructure from v0.2.0 Merkle memory.

Verification

Existing FiveAM gate tests continue to pass. New test: signal at type-level 5 targeting a gate at type-level 4 returns :reject-type-violation without evaluating the gate predicate. New test: signal at type-level 1 passing through a gate at type-level 3 proceeds to predicate evaluation. New test: kill -9 <pid> returns :reject-self-termination. New test: modified core file is detected by integrity hash check.

This is Contribution 1 from notes/passepartout-whitehead.org. Every type-level rejection emits a structured event that Phase 1 ingests as a fact. ~30 lines implement the seed of the ontology without any new dependencies. ~75 lines total, extends dispatcher, no new skill.

v0.11.0: Full Markdown Rendering

Extend the markdown renderer from v0.7.1:

  • OSC 8 hyperlinks: embed \x1b]8;;url\x1b\\ before link text and \x1b]8;;\x1b\\ after. Makes URLs clickable in supporting terminals (iTerm2, Kitty, WezTerm, Ghostty, Windows Terminal).
  • Blockquotes (> text): rendered with a colored left border (theme's :accent color), indented text.
  • Tables: aligned column text. No borders (terminal tables with box-drawing characters are noisy). Column alignment inferred from header separators.
  • Syntax highlighting for code blocks: keyword/string/function colors from theme. Regex-based (no parser dependency).
  • All markdown features degrade gracefully to plain text on terminals without attribute support. ~100 lines.

v0.12.0: Phase 0b — Layered Signal Authentication, Layer 1 (~200 lines)

Implement gate vector 0 at priority 700 — before all other gates and before any type-level checking — with Layer 1 (cryptographic authentication) active. Layers 2-4 (sensory, deterministic reasoning, probabilistic) are stubbed with :unavailable results and deferred to later phases.

Signals carry cryptographic signatures verified against a key registry stored as fact-store facts. Automated signal sources cannot impersonate the human. The human can revoke compromised keys. The authorization matrix is per-key, per-action-class.

Rationale

Authentication is layered because no single mechanism suffices. Cryptographic authentication proves key ownership but not identity. A valid key can be used by a compromised process, can sign pre-recorded frames, can be held by someone who is not who they claim to be. The four-layer design (Layer 1: crypto, Layer 2: sensory, Layer 3: deterministic reasoning, Layer 4: probabilistic) stacks evidence. Phase 0b ships Layer 1 — the foundation — with the architecture for layers 2-4 already designed.

The :source field in the signal plist is metadata — it claims origin, it does not prove it. This phase replaces it with cryptographic proof.

Implementation

Key generation and signature utilities — extends security-vault.lisp

Generate key pairs for signal sources. Canonicalize signal plists (sorted keys, stripped of the signature field). Sign with the source's private key. Verify with the public key from the key registry. ~50 lines. Uses Ironclad (already an ASDF dependency). The vault already stores credential material — key material extends the same storage with the same encryption.

Gate vector 0 — extends security-dispatcher.lisp

Registered at priority 700 (before the policy gate at 600, before all other gates). Architecture for all four layers:

(defun gate-layered-authentication (signal)
  (let ((results '()))
    (let ((crypto-result (auth-crypto-verify signal)))
      (push (cons :crypto crypto-result) results)
      (when (eq (getf crypto-result :result) :reject)
        (return-from gate-layered-authentication
          (list :result :reject :confidence nil
                :layer-results (nreverse results)))))
    (let ((sensory (if (fboundp 'auth-sensory-verify)
                       (auth-sensory-verify signal)
                       '(:result :unavailable))))
      (push (cons :sensory sensory) results))
    (let ((det (if (fboundp 'auth-deterministic-verify)
                   (auth-deterministic-verify signal)
                   '(:result :unavailable))))
      (push (cons :deterministic det) results)
      (when (eq (getf det :result) :reject)
        (return-from gate-layered-authentication
          (list :result :reject :confidence nil
                 :layer-results (nreverse results))))
    (let ((prob (if (fboundp 'auth-probabilistic-verify)
                    (auth-probabilistic-verify signal)
                    '(:result :unavailable))))
      (push (cons :probabilistic prob) results))
    (let ((confidence (aggregate-confidence results)))
      (list :result :pass :confidence confidence
            :layer-results (nreverse results))))))

Layer 1: verify cryptographic signature, check permission matrix against key registry, reject on failure. 50 lines. Layers 2-4: stubbed, return ~:unavailable.

Key registry — facts in the fact store

Key lifecycle facts are admitted in a :key-lifecycle domain with :singular cardinality. Key creation, promotion, and revocation are facts with Merkle version chains. The human's key signs new keys into existence and signs revocation. ~50 lines.

Signal provenance chain — Merkle-linked causality

When a signal triggers a downstream signal, each carries a :sigchain field with all upstream (:key-id <k> :signature <s> :auth-result <r>) entries. Tampering with any link invalidates the leaf. Revocation propagates through the chain — flagged, not deleted. ~50 lines.

Deferred Authentication Layers (2-4)
  • Layer 2 — Sensory: Active when vision/audio processing skills are loaded. Verifies liveness, cross-modal consistency. When unavailable, returns :unavailable.
  • Layer 3 — Deterministic Identity Reasoning: Active when Phase 2 (Screamer + populated fact store) is complete. Queries the fact store for identity-ruling facts.
  • Layer 4 — Probabilistic Identity Reasoning: Active when style profiles exist. Uses embedding infrastructure to compare writing style, behavioral patterns. Returns a confidence score; never rejects outright — downgrades authorization.

The gate architecture is designed with all four layers from Phase 0b. Adding a layer requires adding a skill, not modifying the gate.

Verification — ~8 FiveAM tests

  1. test-sign-verify-roundtrip — sign and verify a plist roundtrip.
  2. test-tampered-signal-rejected — modify payload after signing, verification fails.
  3. test-human-key-permits-write — human key with :write passes Layer 1 and the full gate.
  4. test-sensor-key-denied-write — sensor key proposing a write is rejected.
  5. test-revoked-key-rejected — revoked key is rejected by Layer 1.
  6. test-sigchain-invalidated-by-revocation — root signer revoked flags downstream.
  7. test-layers-2-3-4-unavailable — when Layers 2-4 are not loaded, they return :unavailable and the gate proceeds with Layer 1 only.
  8. test-layer-3-rejects-on-contradiction — deterministic reasoning (mock) detects identity-ruling contradiction, gate rejects.

~200 lines total. Depends on Phase 0 (type-level gates).

v0.13.0: Tool Execution Visualization

When the agent invokes a tool:

  • Pre-execution: [Running: 🔍 search "dispatch" ...] in :tool-running color with spinner
  • Success: ✓ search "dispatch" → 12 matches (0.3s) in :tool-success color
  • Error: ✗ shell "bad-cmd" → exit 127 (0.1s) in :tool-failure color with error output expanded below
  • Output collapsed by default to single-line summary. Tab on a tool invocation toggles full output.
  • Diff display: + (green) / - (red) coloring for file edits. 3 lines of context around changes. The :tool-output theme color provides the background.

Uses Croatoan's init-pair + color-pair for 256-color backgrounds on tool state regions. ~100 lines.

v0.14.0: Phase 1 — Minimum Viable Fact Language (~200 lines, new skill)

Ephemeral, in-memory triple store with provenance tracking and contradiction detection. No disk persistence. All facts live in a hash table and are discarded on session end. Gate outcomes are ingested as facts. The gate stack's implicit ontology is materialized as the seed fact set.

Rationale

Three reasons ephemeral is the correct first step:

  1. The fact language is unproven. Triples with provenance and grounding is a hypothesis that must be tested against real memex content before being committed to a serialization format.
  2. The ontology is emergent. Categories are created on first use. A persistent format would require a migration story for every category change. Ephemeral avoids this — facts are re-derived on each session start using the evolved ontology.
  3. Rebuildability is the safety net. Because all facts have a :grounding to an Org heading, and gate-outcome facts are regenerated from the gate stack on load, the entire symbolic index can be thrown away and rebuilt from scratch. The cost is compute, not data.

Implementation — symbolic-facts.orgsymbolic-facts.lisp (skill)

Abstract Fact Store Interface — design before implementation

Before any code is written, the five-function API must be designed and committed:

fact-assert    :: fact → store → (:admitted | :rejected | :flagged)
fact-query     :: (entity &key relation policy) → active-value-or-values
fact-history   :: (entity relation) → ordered chain of versioned facts
fact-snapshot  :: () → root-hash
fact-rollback  :: root-hash → store

This interface is load-bearing. Every consumer — the archivist, Screamer, ACL2, the planner — calls these five functions. They never access the backing store directly. In Phase 1-4, the backing store is an ephemeral hash table. In Phase 5, it is VivaceGraph + Merkle memory-object wrappers. The interface must be tested against both backends from the start. Every API function receives a FiveAM test that runs against both a hash-table mock and a VivaceGraph mock.

The interface also exposes a read-only fact-degraded-mode-p function. When Screamer is not loaded, the fact store functions with basic hash-table consistency checks (string equality, not constraint solving). When VivaceGraph is not loaded, Prolog queries are unavailable. The degraded-mode flag tells consumers (and the status bar) what is and isn't operational.

Triple store

A hash table keyed by (entity relation). Values are plists:

(:value <string-or-symbol>
 :grounding <heading-id-or-nil>
 :provenance <:gate-outcome | :human-authored | :deduced | :llm-proposed>
 :timestamp <universal-time>
 :parent-id <hash-of-predecessor>
 :policy <:singular | :dual | :plural>)

The :provenance field tracks how the fact entered the store. The :parent-id field links to the previous version in the Merkle chain — every fact has version history regardless of cardinality.

Bootstrap from gates

On skill load, scan the Dispatcher's existing data structures and produce triples:

;; From *dispatcher-protected-paths*
(:entity ".env" :relation :member-of-class :value :secret-config-file :provenance :gate-outcome)
(:entity "*id_rsa*" :relation :member-of-class :value :ssh-key-file :provenance :gate-outcome)
;; From *dispatcher-shell-blocked*
(:entity "rm -rf /" :relation :classified-as :value :catastrophic-command :provenance :gate-outcome)
;; From *dispatcher-network-whitelist*
(:entity "api.telegram.org" :relation :classified-as :value :trusted-domain :provenance :gate-outcome)

This produces 50-70 entity classes immediately. No LLM involvement. No human authoring. Mechanically extracted from existing code.

Ingest gate outcomes

Register a post-gate hook on the Dispatcher's rejection path. Every gate rejection produces a triple with :provenance :gate-outcome.

Query

(fact-query &key entity relation value source-provenance) — pure hash-table lookup. 30 lines. ~(fact-query-all &key relation value source-provenance) — returns all triples matching filter criteria. Enables "find all files classified as secrets."

Contradiction detection — policy-driven, not policy-agnostic

On every fact-assert, the system checks the fact's entity class to determine its cardinality policy. Time is universal — every fact carries a :timestamp and :parent-id link regardless of policy. The policy only governs the active set:

  • :singular: same (:entity :relation), same value → supersede (chain via :parent-id). Same pair, different value at later timestamp → supersede, chain as new leaf. Same pair, different value at same timestamp → contradiction rejected, human resolves.
  • :dual: first two values admitted as complementary, cross-referenced via :complement edge. Third value → prompt: promote to :plural or demote one? Each value has its own version chain.
  • :plural: any value admitted. Values cross-referenced when in tension. If active count drops to 1 → collapse to :singular. If active count drops to 2 and values are complementary → prompt to collapse to :dual.

The policy table maps entity classes to :singular, :dual, or :plural. Gate-bootstrapped facts default to :singular (the filesystem is physically singular). New categories default to :plural (safe — never loses information). Categories for dialectical or complementary domains are explicitly :dual.

Verification — ~9 FiveAM tests

  1. test-bootstrap-creates-facts — bootstrap produces correct triples from *dispatcher-protected-paths*.
  2. test-bootstrap-creates-shell-facts — bootstrap produces correct triples from *dispatcher-shell-blocked*.
  3. test-gate-outcome-produces-fact — a simulated gate rejection produces a triple with :provenance :gate-outcome.
  4. test-fact-query-returns-correct-value — querying by entity and relation returns the expected value plist.
  5. test-duplicate-ingestion-idempotent — asserting the same fact twice does not produce a duplicate or a contradiction.
  6. test-singular-supersedes — a fact with a later timestamp supersedes the old value, retained with :parent-id chain in the Merkle DAG.
  7. test-singular-same-time-contradiction — contradictory fact in :singular domain at same timestamp → rejection, human resolution.
  8. test-plural-admits-all — multiple values for same pair in :plural domain stores all with cross-references.
  9. test-dual-admits-two-rejects-third:dual domain admits two complementary values and rejects the third, prompting cardinality promotion.

200 lines. New skill: ~symbolic-facts.org. Depends on Phase 0b (auth).

v0.15.0: Mouse Support

Croatoan supports ncurses mouse mode via (setf mouse-enabled-p). Enable:

  • Scroll wheel: PageUp/PageDown equivalent, scrolls chat by viewport height
  • Click to position cursor in input area
  • Click on OSC 8 link to open in browser (via xdg-open)
  • Click on tool invocation to toggle expand/collapse
  • Click on gate trace line to expand/collapse trace

~40 lines.

v0.16.0: Phase 1a — Self-Preservation Mechanisms (~120 lines)

Make self-preservation active rather than architectural. The agent monitors its own integrity, quarantines failing skills, signals degradation to the user, and monitors resource pressure. The external watchdog guards the daemon process from outside the SBCL image.

Rationale

The current architecture has passive self-preservation: the self-build boundary blocks LLM-originated core modifications, memory snapshots enable rollback, and fboundp guards catch missing skills. But degradation is silent — a skill dies, the guard fires, and the agent never tells you. The status bar shows green "connected" while the symbolic reasoning layer is down.

These mechanisms are small (~20-50 lines each), leverage existing infrastructure (Merkle hashes, heartbeat, the dispatcher gate stack), and transform self-preservation from a structural property into an active behavior. They implement the Third Law for Passepartout: preserve yourself against non-human threats — LLM proposals, environmental degradation, resource exhaustion — and signal to the human when you are wounded.

Implementation

Quarantine on skill failure — extends core-skills.lisp

Track per-skill error counts in a *skill-error-counter* hash table, resetting on each heartbeat cycle. When a skill accumulates three unhandled errors within a single cycle, unload the skill, log the quarantine event, and inject a system message: "Skill 'symbolic-facts' quarantined (3 errors: consistency check nil, fact-query on missing key, Screamer timeout). Reload with /skill-reload symbolic-facts." The skill's defskill struct is flagged :quarantined and excluded from trigger resolution until explicitly reloaded. ~40 lines.

Degraded-mode signaling — extends core-reason.lisp and TUI

Maintain a *degraded-components* list populated by fboundp guards and the quarantine system. When think() assembles the system prompt, inject a DEGRADATION section: "I am operating in degraded mode. Screamer is unavailable (consistency checks disabled). VivaceGraph is unavailable (Prolog queries disabled). Core safety gates are all active."

The TUI status bar renders a second line, amber-colored, when *degraded-components* is non-empty: "⚠ Degraded: Screamer, VivaceGraph. /doctor skills for details." ~30 lines across daemon and TUI.

Resource self-monitoring — extends symbolic-events.lisp

On heartbeat, check memory pressure (sb-kernel:dynamic-usage against total), disk space on ~/.cache/ (uiop:directory-exists-p + stat), and open file descriptors. When a resource crosses a critical threshold, shed non-essential skills in order of :preservation-priority (:critical never shed, :normal shed after :low, :low shed first).

Inject a system message: "Memory critical (94% of 16GB). Unloading embedding-native (768MB), channel-discord, channel-slack. Core safety: unchanged. Essential skills retained: 18." ~50 lines.

Skill shed order is determined by a new :preservation-priority slot on defskill (default :normal). Core safety skills carry :critical and are never shed. Heavy skills (embedding-native with its model in memory, channel gateways with connection pools) carry :low.

External watchdog — extends passepartout bash entry point

The bash script spawns a watchdog subprocess that polls the daemon port every WATCHDOG_TIMEOUT seconds (default 30). If the port stops responding, the watchdog snapshots the last known-good Merkle root, kills the stale process, and restarts the daemon with --snapshot <root-hash>.

The watchdog is outside the SBCL image. A dead process cannot restart itself. ~25 lines of bash, no new Lisp code.

Verification — ~6 FiveAM tests

  1. test-quarantine-on-three-errors — a skill that errors three times in a single cycle is quarantined and removed from trigger resolution.
  2. test-degraded-mode-visible — when Screamer is not loaded, the system prompt includes a DEGRADATION section.
  3. test-resource-shed-low-priority — when memory exceeds threshold, :low priority skills are unloaded first.
  4. test-critical-skills-never-shed:critical priority skills are retained regardless of resource pressure.
  5. test-resource-recovery-reloads — when resources recover below threshold for N consecutive heartbeats, shed skills are reloaded automatically.
  6. test-quarantined-skill-relaodable — a quarantined skill can be reloaded via /skill-reload and passes sandbox validation before promotion.

~120 lines. Extends existing skills. Depends on Phase 0-1.

v0.17.0: Cost Display

  • /cost command: displays per-session and per-LLM-call cost breakdown
  • Optional sidebar cost counter: $0.12 this session, updating after each backend-cascade-call
  • Per-provider pricing table (from v0.5.0 token economics)
  • Color-coded: green under daily budget, yellow approaching, red exceeding
  • Requires token counter infrastructure from v0.5.0. ~50 lines for display; token counting is v0.5.0 infrastructure.

v0.18.0: Phase 2 — Screamer as Admission Gate (~200 lines, new skill)

Wrap Screamer (a constraint solver with non-deterministic backtracking) as a skill. Use it for consistency checking against the triple store and for deduction of new facts from existing ones. Screamer is the verification layer; VivaceGraph (Phase 5) is the storage layer.

Rationale

The "verified extraction" pattern requires a deterministic admission gate. Screamer's non-deterministic backtracking finds contradictions that simple string comparison misses. For example, if existing facts say "all config files with extension .env are classified as secrets," and the LLM proposes "app.env is not secret," Screamer finds the contradiction by substituting app.env into the existing rule. A naive string-keyed hash table comparison would miss this because "app.env" and ".env" are different strings.

Screamer also enables deduction — new facts from existing ones without any LLM involvement. If all files matching *.env are secrets, and prod.env matches *.env, then prod.env is a secret. Deduced facts carry :provenance :deduced and a :derived-from chain pointing to the facts they were derived from.

Implementation — symbolic-screamer.orgsymbolic-screamer.lisp (skill)

Wrap Screamer

Screamer is available via Quicklisp. Load at runtime via ql:quickload :screamer. Not an ASDF dependency — if Screamer is not installed, the skill degrades gracefully (no consistency checking, no deduction — the fact store still functions as a hash table with provenance tracking).

Consistency check

(screamer-consistent-p candidate-fact existing-facts) — expresses the fact store as Screamer constraint variables. The candidate fact is asserted. Screamer checks solvability. Returns :consistent, :contradiction <details>, or :redundant (the fact is already implied by existing facts).

Early-stage: the consistency check works on simple triples. As the fact store grows, rules of the form "all X are Y" (representing protected paths, shell patterns, class memberships) become Screamer constraints that new facts must satisfy.

Deduction

(screamer-deduce existing-facts) — Screamer finds implications of the existing fact set that are not already in the store. New facts are asserted with :provenance :deduced and a :derived-from list of source fact keys.

Deduction is not run on every assertion — it is a background task triggered by heartbeat or manually. The cost is compute (Screamer exploration), not tokens.

Admission gate

(screamer-admit candidate-fact existing-facts) — wraps consistency check with the cardinality policy lookup. The policy is determined by the fact's entity class (see Phase 1: :singular, :dual, or :plural).

  • :singular: same value ⇒ supersede (chain via :parent-id). Different value, later timestamp ⇒ supersede. Different value, same timestamp ⇒ contradiction rejected (human resolves).
  • :dual: first two values admitted as complementary. Third rejected (prompt cardinality promotion).
  • :plural: any value admitted with cross-references. Active count transitions trigger cardinality collapse checks.

This is the function the archivist calls before any LLM-proposed fact enters the store. It is also called on human-authored facts (which override — the human can assert facts that bypass cardinality checks). It is not called on gate-outcome facts (gates are the ground truth for security :singular domains).

Verification — ~6 FiveAM tests

  1. test-screamer-consistency-passes — a fact consistent with existing triples returns :consistent.
  2. test-screamer-contradiction-detected — "app.env is not secret" contradicts "all *.env files are secrets" and returns :contradiction.
  3. test-screamer-redundant-detected — asserting a fact already implied by existing facts returns :redundant.
  4. test-screamer-deduction-produces-new-fact — given "all *.env files are secrets" and "prod.env matches *.env", Screamer deduces "prod.env is secret."
  5. test-admission-gate-singular-supersedes — a later-timestamped value for a :singular domain fact supersedes the old value, chaining via :parent-id.
  6. test-admission-gate-dual-rejects-third — a :dual domain rejects the third value, prompting :plural promotion.

200 lines. New skill: ~symbolic-screamer.org. Depends on Phase 1 (triple store). Not an ASDF dependency — degrades gracefully.

v0.19.0: Session Export

Claude Code has /share (shareable URL). OpenCode has /export (Markdown). Hermes has trajectory export. Passepartout has no way to share what the agent did.

  • /export writes the current session as an Org file to ~/memex/exports/<session-title>-<date>.org
  • Format: each message as an Org headline with role tag, timestamp, content, gate trace as property drawer
  • /export md outputs Markdown instead of Org (for sharing with non-Org users)
  • /export json outputs the session as JSON (for programmatic consumption)

50 lines. Uses existing message vector and ~memory-object-render for Org formatting.

v0.20.0: Phase 3 — Archivist as Fact Proposer (~100 lines, extends existing archivist)

Extend the existing archivist skill (symbolic-archivist.org) with a fact extraction mode. The LLM reads prose, proposes triples, and Screamer verifies them before admission. The archivist's existing Scribe (log distillation) and Gardener (link scanning) functions are unchanged.

Rationale

The archivist already walks the entire memex (the Gardener scans for broken links and orphans). Adding fact extraction reuses the same traversal infrastructure rather than duplicating it. The extraction is gated by Screamer — the LLM is a proposer, not an extractor. Facts that fail consistency checking are discarded. Facts that pass are admitted with :provenance :llm-proposed and :grounding to the source heading.

Implementation — extends symbolic-archivist.org

Propose from prose

Given an Org heading, call the LLM with a minimal prompt (~200 tokens):

Extract triples from this text as (:entity <name> :relation <keyword> :value <value>).
Ground each triple to the heading. Return a list of triples.

The LLM returns structured triples via the existing JSON→plist structured output path from v0.4.2. The prompt is environment-aware: if the heading's file is in literature/ or has :literature: tags, the prompt includes literature-specific relations (:wrote, :published-in, :influenced). If the heading is in projects/, the prompt includes coding-specific relations (:depends-on, :tested-by).

Verify through Screamer

Each proposed triple runs through (screamer-admit candidate existing-facts) from Phase 2. Facts admitted follow the cardinality policy of their entity class (:singular, :dual, or :plural). Rejected facts are discarded with a log entry.

Provenance tracking

After each extraction run, update provenance counts:

(:total-facts 847
 :gate-outcome 312
 :human-authored 12
 :deduced 89
 :llm-proposed 434)

This is the data structure that Phase 4's sufficiency criterion reads. It is also surfaced in the TUI sidebar or /status command: "Symbolic index: 847 facts (37% from gates, 52% LLM-proposed, 10% deduced, 1% human)."

Rebuildable

Because every fact has a :grounding to an Org heading, the entire LLM-extracted subset can be discarded and re-extracted without losing gate-outcome or deduced facts. The (fact-purge :provenance :llm-proposed) function removes all LLM-proposed facts. A subsequent (archivist-extract-all) re-extracts from scratch.

This is the safety net: if the LLM produces a bad extraction that passes Screamer's consistency check (possible in the early stages when the fact store has few existing facts to check against), the extraction can be redone after the fact store has grown. The cost is compute, not data.

Verification — ~5 FiveAM tests

  1. test-archivist-extracts-triples — given a known Org heading with explicit triples in the prose, the archivist produces correct triples via LLM.
  2. test-archivist-verified-extraction — a hallucinated triple is rejected by the Screamer admission gate.
  3. test-provenance-counts-update — after extraction, the provenance breakdown is correct.
  4. test-purge-llm-facts — does not delete gate-outcome or deduced facts.
  5. test-re-extraction-idempotent — re-extracting from the same prose after purging produces the same facts.

~100 lines. Extends existing archivist skill. Depends on Phase 2 (Screamer).

v0.21.0: Tool Output Spilling

Claude Code saves tool results >30KB to ~/.claude/tool-results/ with a 200-line preview in the response. Passepartout currently includes all output inline — which consumes context budget and makes the chat log unreadable after a large build output or log dump.

  • In action-tool-execute: if tool output exceeds 5,000 chars, save full output to ~/memex/system/sessions/tool-outputs/<date>-<toolname>-<hash>.txt
  • In the response, replace full output with: [Output: 12,847 chars. Full output saved to ~/memex/system/sessions/tool-outputs/2026-05-08-grep-a1b2c3.txt. Top 2,000 chars:] followed by truncated preview
  • The LLM can read-file the full output if it needs to analyze it

30 lines in ~core-loop-act.lisp

v0.22.0: Phase 4 — Sufficiency Criterion ("The Flip") (~50 lines)

Make the architecture's central narrative arc operational: a measurable threshold for when the symbolic engine has enough non-lossy facts to bypass the LLM for extraction.

Rationale

The architecture describes "at some point, the non-lossy facts constitute a sufficient foundation that the symbolic engine can reverse the flow" but provides no criterion for "some point." The sufficiency score makes the flip computable and visible to the user.

Implementation — extends symbolic-facts.lisp

Sufficiency score

(fact-sufficiency-ratio) — returns the ratio of non-lossy facts to total facts:

(/ (+ (count-provenance :gate-outcome)
      (count-provenance :human-authored)
      (count-provenance :deduced))
   (fact-total-count))

When this ratio exceeds SUFFICIENCY_THRESHOLD (configurable env var, default 0.7), the system considers its foundation sufficient. The threshold defaults to 0.7 because below this, the majority of facts are LLM-proposed and therefore uncertain. Above 0.7, the proven foundation provides enough constraint that Screamer can reliably detect incorrect LLM proposals.

Auto-extraction toggle

When sufficiency is reached, the archivist switches from "LLM proposes, Screamer verifies" to "Screamer queries existing facts, applies category rules to the new prose, and deduces new facts directly." The LLM is bypassed for categories that have sufficient non-lossy coverage. The LLM is still used for novel categories that have no existing facts.

The switch is configurable: AUTO_EXTRACTION_ENABLED=true/false. When disabled, the system continues with LLM proposals regardless of sufficiency — useful for domains where extraction quality is prioritized over extraction determinism.

Monitor

The TUI sidebar or /status command displays:

Symbolic Index
  Total facts:    1,247
  Proven:
    Gate outcomes:     312  (25%)
    Human-authored:     47   (4%)
    Deduced:           521  (42%)
    ─────────────────────────
    Non-lossy:         880  (71%)
  LLM-proposed:        367  (29%)
  ─────────────────────────
  Sufficiency: 71% ✓  (threshold: 70%)
  Mode: AUTO-EXTRACTION (LLM bypassed for known categories)

Verification — ~3 FiveAM tests

  1. test-sufficiency-below-threshold — with 30% non-lossy facts, auto-extraction is not enabled.
  2. test-sufficiency-above-threshold — with 75% non-lossy facts, auto-extraction is enabled.
  3. test-auto-extraction-produces-same-facts-as-llm-extraction — for a category with sufficient non-lossy coverage, auto-extraction produces facts that a subsequent LLM extraction also produces (the deterministic path is consistent with the probabilistic path).

~50 lines. Extends Phase 3 (archivist).

v0.23.0: Read-Only Output Caching Within a Turn

Claude Code caches read-only tool results within a turn. If the agent reads the same file twice, the second read returns cached content — no disk I/O, no context waste. Passepartout re-executes the tool.

  • *turn-result-cache* hash table keyed by (cons tool-name args-hash), cleared at the start of each think() cycle
  • Read-only tools (read-file, search-files, find-files, list-directory, org-find-headline, org-agenda-today, lsp-*) check the cache before executing
  • Cache hit: return stored result with [cached] prefix in the response
  • Prevents redundant tool calls when the agent asks the same question twice within a reasoning step

25 lines in ~programming-tools.lisp

v0.24.0: Skin Engine + 10 Presets

  • Skin format: a plist file (~/.config/passepartout/skins/myskin.lisp) defining:

    • :colors — 40+ color slots (extends the 27 theme keys): agent colors for 8 roles, status bar colors, tool colors, spinner colors, input colors, border colors. All in hex (#RRGGBB).
    • :spinner — style (:braille, :dots, :minimal), speed (ms/frame), kawaii faces, thinking verbs
    • :branding — agent name, welcome message, goodbye message, prompt symbol, help header
    • :tool-prefix — character for tool output lines (default )
    • :tool-emojis — per-tool emoji overrides (e.g., (:shell "⚡" :search "🔎"))
    • :banner — Rich-markup ASCII art logo displayed on startup
  • Skin inheritance: (:inherit :default) — missing values cascade from parent
  • Custom skins from ~/.config/passepartout/skins/*.lisp
  • Hot-swap via /skin <name> — no restart. Skin changes take effect on next redraw (sub-frame latency).
  • Skin preview: /skin <name> with --preview flag applies temporarily; Esc or timeout reverts.
  • Built-in skins as plist data in a *skin-registry* hash table. ~250 lines.

10 presets organized by mood: gold, professional, minimal, forest, ocean, ember, mono, retro, unicorn, midnight. Each derived systematically from accent + background. ~200 lines.

NOTE: Skin Presets (10+ built-in)

Shipped as part of the skin engine release — the engine with 0 presets is unusable. See Skin Engine TODO above for the preset definitions.

v0.25.0: Phase 5 — VivaceGraph + Merkle DAG + Ontology Versioning (~400 lines, new skill)

Replace the ephemeral hash-table triple store with VivaceGraph, a Lisp-native graph database with Prolog-like queries. Add the KG type hierarchy (PM type levels applied to the knowledge layer). Define the persistence format from the fact language that survived Phases 1-4.

Rationale

By this point, the triple fact language has been battle-tested through four phases of gate outcomes, Screamer deductions, LLM proposals, and cross-domain comparisons. The facts that proved useful define the persistent schema. The ones that weren't are left behind. The serialization format is not designed upfront; it emerges from use.

The transition from ephemeral to persistent is justified when two conditions are met: (1) the fact language has stabilized (categories are being queried, not constantly refactored), and (2) accumulated deductions across sessions provide value that justifies the serialization cost.

Implementation — symbolic-vivacegraph.orgsymbolic-vivacegraph.lisp (skill)

Wrap VivaceGraph

VivaceGraph is available via Quicklisp. Load at runtime. Not an ASDF dependency. If not installed, the fact store continues as a hash table (Phase 1-4 behavior) with a log warning: "VivaceGraph not available — persistence disabled."

Prolog-like queries

Replace fact-query with graph traversals:

;; Find all files classified as secrets
(vivace-query '(:and (:entity ?e)
                     (:member-of-class ?e :secret-file)))

;; Find all files classified as secrets that were modified today
(vivace-query '(:and (:entity ?e)
                     (:member-of-class ?e :secret-file)
                     (:modified-since ?e ,(today-timestamp))))

;; Find contradictions between Wikidata and the memex
(vivace-query '(:and (:entity ?e)
                     (:has-value ?e ?v1 :source :wikidata)
                     (:has-value ?e ?v2 :source :memex)
                     (:not-equal ?v1 ?v2)))
KG type hierarchy

Every entity in the graph carries :pm-type-level metadata. Queries cannot return entities whose type level equals or exceeds the querying function's type level. A fact-finding query at type-level 2 cannot return facts at type-level 3 or higher. Self-referential knowledge — "this fact defines its own type" — becomes structurally impossible because the type level is assigned at creation and cannot be modified by a fact of the same or higher level.

This is Contribution 1 (type-level gates) applied to the knowledge layer rather than the execution layer. The dispatcher prevents self-referential actions; the KG prevents self-referential facts.

Persistence format

The fact language that survived Phases 1-4 defines the format. Each entity is a node; each triple is an edge with properties (:grounding, :provenance, :timestamp). The format is not a new design — it is the triple schema evolved through use, serialized by VivaceGraph's native persistence.

If the fact language later evolves to n-ary relations, VivaceGraph's graph model accommodates this natively — edges can carry arbitrary property plists. The triple form is a special case of the general graph model.

Load on startup, save on interval

On daemon start, (vivacegraph-load) reads the last saved graph. On heartbeat, (vivacegraph-save) persists the graph in its native format to ~/.cache/passepartout/facts.vg. The interval matches the existing *memory-auto-save-interval*. The save is atomic: write to a temp file, rename on success. Corruption-safe.

Merkle DAG version chains

Each (:entity :relation) pair forms an independent Merkle chain. Facts hash over SHA-256(value || provenance || timestamp || parent-hash || grounding). The :parent-id pointer forms the chain. Tampering with any version breaks all downstream hashes.

The chains form a DAG, not a single list. Facts about .env evolve independently from facts about Nabokov. :dual and :plural facts cross-reference via :complement and :contradiction edges but maintain independent ancestor chains. The Merkle DAG rests on the existing memory-object infrastructure from v0.2.0 — the fact store is a new occupant of existing housing. 50 lines to bridge the fact schema into ~memory-object wrappers.

Ontology versioning

The category hierarchy itself is a Merkle tree. Every entity class definition hashes over its superclasses, cardinality policy, relations, and description. The aggregate hash of all active class definitions is the :ontology-version — a Merkle root of the current worldview.

Every fact stores its :ontology-version at the time of assertion (a single 64-hex-char field). When categories change, the new hash flags affected facts for re-verification (Screamer re-evaluates each against the new category definitions). Re-verification outcomes are :survived (deduction still holds), :incoherent (premises don't translate under new categories, flagged for human review), or :reclassified (valid but under different classification).

Queries accept an optional :ontology-version parameter. The default is :active (current worldview). Specifying a version returns facts as they were under that worldview: "Under my 2024 security model, this file was a secret. Under my 2025 model, it is an auth-secret." ~40 lines on top of VivaceGraph persistence.

Verification — ~8 FiveAM tests

  1. test-vivacegraph-roundtrip — save and load preserves all facts with provenance metadata.
  2. test-prolog-query-returns-results — a query for all secret files returns the bootstrapped gate facts.
  3. test-prolog-query-cross-domain — a query for contradictions between Wikidata and memex provenance returns correct results.
  4. test-type-level-prevents-self-reference — a query from a type-level-2 function cannot return type-level-3 facts.
  5. test-fact-store-fallback-without-vivacegraph — when VivaceGraph is not loaded, the hash-table fallback functions identically to Phase 1-4 behavior.
  6. test-merkle-chain-tamper-detected — modifying a fact's value breaks the hash chain, detectable by re-walking the :parent-id spine.
  7. test-ontology-version-query — querying with an old :ontology-version returns facts as they were under that worldview, not the current one.
  8. test-reverification-flags-on-category-change — changing a category definition sets :re-verify-status :pending on all affected facts.

400 lines. New skill: ~symbolic-vivacegraph.org. Depends on Phase 4 (sufficiency). Not an ASDF dependency — degrades to hash-table fallback.

v0.26.0: Hooks on defskill — Lifecycle Interception

Passepartout's skills can inject instructions and react to triggers but cannot intercept behavior. All 4 competitors have lifecycle hooks (PreToolUse, PostToolUse, session events). Hooks complete the extension model: skills define what the agent knows; hooks define when skills get to inspect and veto actions.

  • Add :pre-tool-hook and :post-tool-hook slots to the defskill struct
  • :pre-tool-hook receives (action context), returns :allow, :deny, or :ask. Called before tool execution in the Dispatcher pipeline (new vector between shell-safety and network-exfil).
  • :post-tool-hook receives (action context result), returns (values modified-result modified-context) or nil to leave unchanged. Called after tool execution. Useful for logging, auto-commit, notification.
  • :on-session-start, :on-heartbeat, :on-compact lifecycle hooks for maintenance skills
  • Hooks run in skill priority order. A :deny from any hook short-circuits the chain.

50 lines in ~defskill macro + core-perceive.lisp

v0.27.0: Phase 6 — ACL2 Structural Verification (~200 lines, new skill)

Wrap ACL2 as a skill. Prove structural properties of the KG type hierarchy and rule sets. Not for empirical claims.

Rationale

ACL2 is often positioned as verifying LLM-proposed facts, but many facts are empirical ("this command is destructive on Linux"), not logical. The right role: structural verification. ACL2 proves that the type hierarchy has no cycles, that the rule set is non-contradictory, and that the gate-to-fact bootstrap preserves the Dispatcher's intent. These are structural properties that can be formally verified, not empirical claims that depend on external reality.

Implementation — symbolic-acl2.orgsymbolic-acl2.lisp (skill)

Type consistency proofs

(acl2-verify-type-hierarchy facts) — prove that the KG type hierarchy has no cycles: no entity of type-level 3 depends on an entity of type-level 5, no parent category has a child that subsumes it, no category is its own ancestor via the child-of relation. These are structural properties of the graph, independent of what the facts say.

Rule set consistency

(acl2-verify-rule-consistency rules) — prove that the accumulated Dispatcher rules (from HITL approvals) are non-contradictory: no rule allows a command that another rule blocks, no rule permits a path access that another denies. If the rule set is contradictory, ACL2 identifies the contradictory subset with the provenance of each rule. The human resolves the contradiction.

Extraction verification

(acl2-verify-bootstrap-preservation) — prove that the gate-to-fact bootstrap (Phase 0-1) preserves the Dispatcher's intent: every blocked pattern in the gate stack maps to a fact in the store; every fact with :provenance :gate-outcome is grounded in a specific gate vector; no gate-bootstrapped fact contradicts another gate-bootstrapped fact.

Not in scope

ACL2 does not verify that ~rm -rf / is destructive. That is an empirical claim about Linux. Screamer handles empirical consistency (does this new claim contradict existing observations?). ACL2 handles structural consistency (does this reasoning structure have formal flaws?). The boundary is: empirical claims → Screamer; structural claims → ACL2.

Verification — ~4 FiveAM tests

  1. test-acl2-type-hierarchy-no-cycles — a synthetic KG with a type-level cycle is detected and reported.
  2. test-acl2-rule-set-contradiction-detected — two Dispatcher rules that contradict each other produce a contradiction report with provenance.
  3. test-acl2-bootstrap-preservation — the bootstrap extraction from the gate stack is verified to have no missing or extra facts.
  4. test-acl2-not-loaded-graceful-degradation — when ACL2 is not installed, the skill loads but returns ":ACL2 not available — structural verification disabled" without crashing.

200 lines. New skill: ~symbolic-acl2.org. Depends on Phase 5 (VivaceGraph). Not an ASDF dependency — degrades gracefully.

v0.28.0: Prompt Templates / Output Styles

Claude Code has "output styles" (default, Explanatory, Learning). Hermes has agent profiles. Passepartout has a single hardcoded system prompt. Users should be able to change how the agent works, not just how it looks.

  • Output styles are Org files in ~/.config/passepartout/styles/ with a plist frontmatter: #+STYLE: explanatory, #+DESCRIPTION: Teaches while doing
  • Three built-in styles:

    • default — current behavior, direct and efficient
    • explanatory — agent explains implementation choices, provides educational insights with ★ Insight blocks. Claude Code's Explanatory output style
    • learning — agent pauses to ask user to write small code pieces (2-10 lines), uses ● Learn by Doing blocks. Claude Code's Learning output style
  • /style <name> TUI command to switch at runtime. Injects a STYLE section into the system prompt between IDENTITY and TOOLS.
  • Style changes are immediate (next think() call). Survive restarts via config persistence.

~100 lines (~60 prompt templates + ~40 TUI integration).

v0.29.0: Skill Auto-Detection — File-Watch Hot-Reload

Passepartout's image-based Lisp model enables hot-reload — redefine a function without restarting. No competitor has this. Claude Code plugins require manual /reload-plugins. Passepartout can auto-detect changes.

  • Daemon watches org/ and ~/.config/passepartout/skills/ with inotify (Linux) or kqueue (macOS). On .org file change:

    1. Wait 200ms debounce (multiple writes within 200ms coalesce)
    2. Tangle the changed org file: (org-tangle-file "org/skill-name.org")
    3. Compile the tangled lisp: (compile-file "lisp/skill-name.lisp")
    4. Reload: (load (compile-file-pathname "lisp/skill-name.lisp"))
    5. TUI shows system message: "Skill 'skill-name' reloaded (23 defuns, 0 errors)"
  • Respects SELF_BUILD_MODE — core files require HITL before reload. Skills reload automatically.
  • On compile error: keep the old version loaded, log the error, show TUI warning: "✗ Skill 'skill-name' failed to compile — old version retained."

80 lines in a new ~symbolic-file-watch.org skill.

v0.30.0: Heavy Thinking Skill — Parallel Reasoning + Sequential Deliberation

The HeavySkill paper (arXiv:2605.02396v1) demonstrates that a two-stage pipeline — K independent reasoning trajectories followed by a critical deliberation step — consistently outperforms majority voting and approaches Pass@K. The authors distill it into a readable skill file that works across any agent harness. Passepartout's Merkle tree makes this auditable, rewoundable, and cross-session comparable.

  • New skill: org/heavy-thinking.org — a readable skill document loaded at startup. The agent follows a defined protocol when facing complex reasoning tasks:

    1. Activation: triggers when the complexity classifier detects a STEM/reasoning/code-generation task. Dormant for simple factual queries or casual conversation
    2. Parallel reasoning: spawns K independent think() calls (default K=3, HEAVY_THINKING_WIDTH env var). Each call solves the same problem from scratch without access to other trajectories. Encourages diverse strategies
    3. Sequential deliberation: a second model call reads all K trajectories (pruned to essential thinking content to stay under context budget). Critically evaluates each — not voting, but re-reasoning. Produces a synthesized final answer with a deliberation trace: "Trajectories 1,3 converged on answer X. Trajectory 2 had error Y. Synthesized answer: X."
    4. Output: returns the synthesized answer with [Heavy-thinking: 3 parallel, 1 deliberate] annotation in the response metadata
  • Merkle advantage: each trajectory is stored as a content-addressed node. The deliberation trace is permanent and auditable — users can see WHY one answer was chosen
  • Iterative deliberation optional (capped at 2 — the paper shows iterations 3+ degrade HP@K)
  • Cost model: 3 parallel × 1 deliberation = 4 API calls for complex tasks (vs 1 normally). HEAVY_THINKING_COST_MULTIPLIER env var for cost-aware auto-activation

100 lines as a skill (~60 prompt template + ~40 orchestration in ~symbolic-heavy-thinking.org).

v0.31.0: Adaptive Layout (3 Tiers)

  • ≥ 120 columns: Full layout. Sidebar visible with all 6 panels. Chat area left of sidebar.
  • 80119 columns: Compact layout. Sidebar hidden (toggle via /sidebar or Ctrl+X+B, rendered as overlay). Status bar 2 lines. Full markdown rendering.
  • < 80 columns: Minimal layout. Single-column chat. Status bar reduced to 1 line (model, ctx%, duration). Markdown reduced to bold + code blocks only. Input height clamps to 1-2 lines.

Re-renders on terminal resize (already handled via KEY_RESIZE). Content re-flows — not truncated. The layout remembers per-terminal-size preference. ~80 lines.

v0.32.0: Spinner Personality

Configurable spinner style per skin:

  • :braille — ⠋⠙⠹⠸⠼⠴⠦⠧⠇⠏ cycling at 80ms (default)
  • :dots — ·✢✳✶✻✽ cycling (macOS style, Claude Code default)
  • :kawaii — (。◕‿◕。) (◕‿◕✿) ٩(◕‿◕。)۶ cycling with wing decorations ⟪⚔ ... ⚔⟫
  • :minimal — single ● dot blinking at 2000ms
  • :none — static prompt symbol

Stall indication: when no response for 10s, spinner color interpolates from theme color → error red (Claude Code pattern). Reduced motion preference: spinner replaced with slow-pulse ●. ~50 lines.

v0.33.0: Progress Bar

For measurable operations (file processing, test runs with known count, batch operations), render a progress bar using Unicode block characters:

[████████░░░░░░░░░░░░] 42% (5/12 tests passed)

Uses 9 block characters for sub-character precision: [' ', '▏', '▎', '▍', '▌', '▋', '▊', '▉', '█'] (Claude Code pattern). Color-coded by progress: red <25%, yellow 25-75%, green 75%+. ~25 lines.

v0.34.0: Live Timestamps

  • Relative timestamps on messages: "just now" (< 30s), "2m ago", "1h ago", "yesterday"
  • Absolute timestamp on hover/focus (via Tab navigation to message)
  • Status bar shows session duration: Session: 3h 12m
  • Timestamps update live (per-minute recalculation, not per-frame)

~40 lines.

v0.35.0: Context-Sensitive Help

Press ? to show available actions in current context:

  • In chat: list of navigation keys, command shortcuts
  • In sidebar: sidebar-specific bindings
  • In HITL prompt: approval/denial bindings
  • In command palette: palette navigation bindings

Rendered as a dim help bar at the bottom of the screen (above input). Dismisses on any key or after 5 seconds. ~40 lines.

v0.36.0: Phase 7 — 10-80-10 Planner (~500 lines, new skill, last phase)

The final neurosymbolic phase: a planning engine built on the mature symbolic index. Screamer expresses task planning as a constraint satisfaction problem. ACL2 verifies plans for structural soundness. The LLM handles the I/O boundaries (natural language → structured goal ← natural language response). The symbolic engine handles the reasoning.

Rationale

This is the culmination — it requires a populated, queried, and trusted symbolic index. The full planner is useless without a mature ontology and a proven deducer. By the time Phase 7 begins, Phases 0-6 have accumulated months of gate outcomes, Screamer deductions, verified LLM proposals, and human-authored facts. The symbolic index has achieved sufficiency. The ontology has stabilized through use. The planner is built on a foundation, not a speculation.

Implementation — symbolic-planner.orgsymbolic-planner.lisp (skill)

Task decomposition as constraint satisfaction

The user specifies a goal: "refactor the authentication module to support OAuth2." The LLM translates this to a structured goal plist. Screamer expresses the planning problem:

  • Variables: subtasks (write OAuth2 client, add token store, update auth middleware, write tests, update documentation)
  • Constraints: dependency ordering (tests depend on implementation), resource limits (one file write at a time), safety invariants (no modification of core-* files)
  • Objective: find an ordering that satisfies all constraints

Screamer returns a viable plan or reports unsolvability with the conflicting constraints.

Plan verification

ACL2 proves that the plan contains no deadlocks (two subtasks waiting on each other), no dependency cycles (A depends on B depends on C depends on A), and no safety violations (no plan step requires a gate-blocked operation).

If verification fails, ACL2 identifies the failing subtask and the violated constraint. The planner re-decomposes the problematic branch (the existing ROADMAP's branch pruning, v0.61.0, but symbolically rather than neurally).

Neuro-symbolic boundary

The LLM handles the I/O boundaries:

  • Input (10%): natural language → structured goal plist. "Refactor auth for OAuth2" → (:goal :refactor-component :target :auth-module :add-feature :oauth2). Small prompt, formulaic translation, ~100 tokens.
  • Reasoning (80%): Screamer plans. ACL2 verifies. VivaceGraph provides the facts about file structure, dependencies, and gate constraints. Zero LLM tokens.
  • Output (10%): structured plan → natural language response. The verified plan plist is formatted as "I'll refactor the authentication module in 5 steps: 1) Create the OAuth2 client (depends on: nothing, modifies: auth/client.lisp) 2) Add the token store…" Small prompt, formulaic translation, ~150 tokens.
TUI visualization

The plan is rendered as an Org headline tree in the TUI, with each subtask as a node showing its terminal state (todo, next-action, in-progress, done, blocked, stuck), its constraints, and its verified properties. This is the same task tree visualization planned for v0.61.0, but with the addition of Screamer constraint annotations and ACL2 verification badges.

Verification — ~6 FiveAM tests

  1. test-goal-plist-from-natural-language — natural language input produces correct structured goal plist (LLM-dependent but formulaic; tested with deterministic mock).
  2. test-screamer-plan-satisfies-constraints — Screamer produces a plan that satisfies all specified dependencies and safety constraints.
  3. test-screamer-report-unsolvable — Screamer reports unsolvability when constraints are contradictory.
  4. test-acl2-verifies-plan-no-cycles — ACL2 verifies a valid plan has no dependency cycles.
  5. test-acl2-rejects-cyclic-plan — ACL2 detects a dependency cycle in an invalid plan.
  6. test-plan-to-natural-language — structured plan plist produces readable natural language output.

500 lines. New skill: ~symbolic-planner.org. Depends on Phase 6 (ACL2) + all prior phases.

v0.36.1: Phase 8+ — Semantic Wikipedia Integration (TBD lines, optional acceleration)

Load Wikidata entities referenced in the memex into the symbolic index. Every entity the user's prose mentions gets its Wikidata property graph — type hierarchy, relations, dates, citations — as triples with :provenance :wikidata.

Rationale

The gate stack provides 50-70 entity classes — adequate for a coding agent. For a general-knowledge memex containing literature, philosophy, history, science, and daily life, 50-70 is starvation. Organic growth through prose extraction (Phase 3) would take years to cover the entities mentioned in a single reading of Pale Fire. Wikidata has already done this work at scale.

The LLM's role in extraction shrinks dramatically. Without Wikidata, the archivist must discover that Nabokov wrote Pale Fire, lectured on Kafka, and emigrated from Russia — extracting each triple from prose. With Wikidata, the Nabokov entity is pre-structured. The archivist's job changes from "discover entities" to "connect your heading to the existing entity."

Implementation sketch

  1. Index referenced entities. Scan memex prose for entity names (capitalized noun phrases, names in Org links, headings in literature/ directories). For each, attempt Wikidata entity resolution (string match, disambiguation via context).
  2. Load N-hop property net. For each resolved entity, load its Wikidata properties: instance-of, subclass-of, authored, published-in, influenced-by, birth-date, death-date, etc. Load the same for entities directly connected to it (1-hop neighbors). Optionally expand to 2-hop for deeply connected domains.
  3. Admit with plural policy. Wikidata facts are admitted with :provenance :wikidata and cardinality policy :plural. They do not override your memex's facts. Disagreements are surfaced, not resolved.
  4. Cross-domain query. "What does my memex say about Nabokov that Wikidata doesn't?" "Where does my memex disagree with Wikidata?" "What entities in my memex have no Wikidata counterpart?" These queries are pure VivaceGraph traversals — zero LLM tokens.

Not a Phase 0 prerequisite

Semantic Wikipedia integration is an accelerator, not a prerequisite. Phases 0-7 work without it. Wikidata compresses the timeline for the broad domain but does not change the architecture. The admission gate (Screamer), contradiction policies, provenance tracking, and neuro-symbolic boundary are identical with or without it.

Open question

How much Wikidata is the right amount? Loading entities referenced in the memex is the minimum. Loading all entities within N hops of those references expands the graph exponentially. The right N depends on the memex's breadth and the user's query patterns. A memex focused entirely on software engineering may need only 1 hop. A memex spanning literature, history, philosophy, and science may need 3-4 hops.

TBD lines. New skill. Depends on Phase 5 (VivaceGraph).

v0.37.0: Priority-Queue Signal Processing

Replace the linear process-signal call chain with a priority-ordered signal queue. The queue is a sorted plist-list consumed by the main loop. Priority tiers:

  • :user-input / :chat-message — highest priority (the user is waiting)
  • :approval-required — high (HITL re-injections need quick resolution)
  • :tool-output — medium (feedback from tool execution, needs LLM assessment)
  • :interrupt — medium-high (shutdown signal)
  • :heartbeat / :cron / :delegation — low (background maintenance)
  • Coalesce duplicate heartbeats: if the queue already contains a :heartbeat signal when a new one arrives, discard the older one (no value in processing stale ticks). Keep at most one pending heartbeat at any time.
  • The main loop drains the highest-priority signal from the queue, processes it through the pipeline, and repeats. If the pipeline produces feedback (tool-output → think), the feedback is enqueued at its appropriate priority — it may preempt background signals but won't interrupt the current signal mid-processing.
  • Add telemetry: average queue depth by priority tier, max wait time per tier.
  • TUI /reconnect command: when the connection-loss detection from v0.3.3 fires, the user can reconnect without restarting the TUI. The command closes the stale socket, re-runs connect-daemon with its retry backoff, and restores the :connected state on success.

80 lines in ~core-pipeline.lisp + ~30 lines TUI.

v0.38.0: MVCC Memory Concurrency

  • Replace *memory-store* (mutable global hash table) with a versioned Merkle-root pointer. The root is an (or null merkle-node) struct containing the tree and a monotonic version counter.
  • Read threads snapshot the root before beginning their pipeline cycle. All object lookups dereference through the snapshot — they see a consistent view of memory regardless of concurrent writes. Reads never block.
  • Write threads (ingest-ast, org-modify, snapshot-memory) build new object hashes, construct a new Merkle root, and CAS-replace the global root pointer. If another thread won the CAS race (root version changed), the loser re-reads the new root, replays its changes on the updated tree, and retries the CAS.
  • Conflict probability is near-zero because concurrent signals almost never touch the same Org headline. The replay-on-conflict path exists for correctness but is rarely exercised. Lock contention is eliminated — the only atomic operation is the CAS on the root pointer.
  • Remove the single-threaded pipeline assumption: previously, process-signal was safe because nothing else wrote to *memory-store* during its execution. With MVCC, multiple signals can process concurrently because each has its own snapshot. The *loop-interrupt-lock* becomes *signal-queue-lock* (protecting only the queue, not the memory).
  • Test: concurrent ingest-ast from two threads writing to different memory objects, verify both commits succeed without corruption.

60 lines in ~core-memory.lisp.

v0.39.0: Structured Output Enforcement

  • Add a plist validation step between markdown-strip and read-from-string in think(). Before attempting to parse, validate: (a) the output starts with ( or [, (b) it contains balanced delimiters (count opens vs closes), (c) it doesn't contain #. (redundant after v0.3.1 *read-eval* nil but defense-in-depth).
  • On validation failure: construct a rejection trace (similar to the existing deterministic gate rejection feedback) and re-inject into the LLM prompt. The trace includes the raw output and a diagnostic ("Your response did not produce a valid plist. Ensure it starts with ( and has balanced parentheses.").
  • Configurable LLM_OUTPUT_RETRIES (default 2). After exhausting retries, fall through with the raw text as a :MESSAGE action (current behavior).
  • Track parse-failure rate per provider in telemetry. Use to guide provider cascade ordering: a provider with 20% parse-failure rate falls behind one with 2%.
  • If retries are exhausted without a parseable plist, the TUI renders the raw LLM output in a dimmed, collapsible region labeled "Parse failure — could not interpret this response." The user can inspect what the model produced.

40 lines in ~core-reason.lisp.

v0.40.0: Doom-Loop Detection

OpenCode detects 3 consecutive identical tool calls and prompts the user. Without this, Passepartout could loop forever on a stuck tool — burning tokens and producing no progress.

  • Track last 3 tool calls (name + args plist) in a ring buffer
  • Before executing a tool, compare against the 3 previous calls
  • If all 3 have the same name and equal args (using equalp), inject a HITL prompt: "The agent has attempted 'grep defun' 3 times without progress. Continue or abort?"
  • Resets on any different tool call or successful output

15 lines in ~core-loop-act.lisp

v0.41.0: Busy-Mode — Queue on Interrupt

When the agent is processing a turn and the user types a message, the current behavior is undefined. Hermes has interrupt/queue/steer. Passepartout should at minimum support queue mode.

  • BUSY_INPUT_MODE env var: interrupt (default, stop current turn), queue (process after current turn)
  • In queue mode: user messages arriving during an active turn are enqueued. When the current turn's tool chain completes, the queued message is injected as the next turn's user input — no HITL approval needed (it's user input).
  • /busy interrupt / /busy queue TUI commands to toggle at runtime
  • The priority queue (above) naturally supports this — user input queued during a turn has higher priority than heartbeats, lower than the active turn

20 lines in ~core-pipeline.lisp

v0.42.0: CLI / Non-Interactive Mode

Claude Code supports claude -p "fix the failing test" --print. Hermes has hermes -c "command". Passepartout can only be used interactively via the TUI. A non-interactive single-shot mode enables CI/CD integration, cron jobs, and scripting.

  • passepartout ask "what's the status of project X?" — sends a framed message to the daemon, waits for response, prints to stdout
  • Daemon-side: process-one-shot handler — inject :user-input signal, run through full pipeline (perceive → reason → act → loop until stop), return final agent message
  • --json flag outputs the full response plist for programmatic consumption
  • --timeout N flag (default 120s) limits execution time
  • Uses the existing wire protocol — no new protocol, just a CLI wrapper around the framed TCP message format

80 lines in ~passepartout bash script + ~50 lines daemon handler.

v0.43.0: Provider Health Tracking

backend-cascade-call tries providers in order until one succeeds. On failure it moves to the next. But it has no memory of which providers failed or succeeded in the past. A degraded provider gets retried first on every call.

  • *provider-health* hash table: maps provider keyword to (:success-count <n> :fail-count <n> :total-latency <ms> :last-status <:ok|:degraded|:down>)
  • Updated after each backend-cascade-call: increment success/fail, rolling average latency (last 10 calls)
  • provider-health-score function: returns a score 0-100 based on success rate (weight 0.6) and latency vs baseline (weight 0.4)
  • /provider-status TUI command: displays a table of all providers with status indicators (● Up, ◐ Degraded, ○ Down) and recent history
  • Telemetry: provider health data feeds the session telemetry system

60 lines in ~neuro-provider.lisp + ~30 lines TUI.

v0.44.0: Cost-Based Provider Routing

backend-cascade-call currently tries providers in registration order. With cost tracking (v0.5.0) and provider health (above), the cascade can be sorted by cost-effectiveness.

  • COST_ROUTING env var (default true): when enabled, sort the cascade by (provider-health-score * 0.3 + cost-savings-score * 0.7)
  • cost-savings-score: cheap providers score high. Free providers (Ollama local) score 100. Expensive providers (GPT-4) score 10.
  • Health override: a provider with score < 20 (degraded) is demoted below healthy providers regardless of cost
  • /routing TUI command: displays current cascade order with scores and reasons

40 lines in ~core-reason.lisp

v0.45.0: Intelligent Provider Fallback — Per-Task-Type Routing

Current fallback is "try the next provider." But different providers excel at different tasks. DeepSeek is strong at code generation. Groq is fast for simple queries. Claude is better at reasoning. The cascade should adapt to the task.

  • *task-provider-scores* hash table: maps (task-type keyword) → (provider keyword → score)
  • Task types: :chat (conversation), :code (code generation/editing), :plan (multi-step planning), :search (information retrieval), :summary (compaction), :reflex (deterministic lookup)
  • Scores updated after each call: if the response was accepted (no rejection retry), increment that provider's score for that task type
  • When the primary provider fails, the fallback picks the highest-scored provider for the current task type (not just the next in line)
  • Bootstrap from defaults: GPT-4/Claude for reasoning, DeepSeek for code, Groq for chat, local Ollama for reflex

60 lines in ~neuro-router.lisp

v0.46.0: Autonomous Certification Badge

After N HITL approvals of the same pattern, the dispatcher auto-approves it. But unlike Claude Code's "auto mode," this is deterministic — no probability, no model hallucination granting permission. The certification is a logical certainty.

  • When a pattern crosses DISPATCHER_RULE_THRESHOLD, the dispatcher writes the rule to rules.org AND grants a certification entry: "Certified: shell commands targeting ~/memex/projects/* with git status are deterministically safe. 47 approvals, 0 denials."
  • The sidebar Rules panel shows: [Rules: 47 | Certified: 12] — learned rules vs certified patterns
  • /certifications TUI command: lists all certified patterns with approval counts, last-used timestamps, and the gate vector that checks them
  • Certification downgrade: if a certified pattern is later denied by the user, the certification is revoked and the pattern returns to HITL
  • This is the operational realization of "the more you use it, the cheaper it gets" — each certification represents a category of actions that will never cost another HITL prompt

60 lines in ~security-dispatcher.lisp + sidebar rendering reuse.

v0.47.0: Certification Progress Bar

The certification badge grants permanent auto-approval. Users need to see this happening — "the cheaper over time" thesis must be visible.

  • Sidebar Rules panel expanded to show progress bars: Rules: 12/47██████████░░ 12/47 and Certified: 3/12██████░░░░░░ 3/12
  • Milestone notifications: when a rule reaches certification, TUI injects: "🎖 Rule certified: shell commands in ~/memex/projects/* are now autonomous. 47 approvals, 0 denials. /certifications to review."
  • Certification velocity: "+2 certified this week" trend indicator in sidebar

~30 lines on top of existing sidebar rendering.

v0.48.0: Update Mechanism + Migrations

No update mechanism exists. Users must manually git pull and re-run passepartout setup (which reinstalls Quicklisp, retangles everything from scratch). Claude Code has claude update, Hermes has hermes update. Passepartout needs an incremental update path.

  • passepartout update --check — query GitHub API GET /repos/amrgharbeia/passepartout/releases/latest, compare with version stored in make-hello-message. Report: "v0.5.1 available. 47 changes."
  • passepartout update (git-based) — git fetch --tags && git checkout v0.5.1, incremental tangle (only org files changed since previous tag, via git diff --name-only v0.5.0..v0.5.1 -- org/*.org), recompile changed lisp files, restart daemon
  • Migration hooks: ~/memex/system/migrations/ — ordered Lisp scripts run after tangle, before daemon restart. migrate-v051.lisp upgrades memory format, config schema, package names. Tracked by *migration-version* in ~/.config/passepartout/version.lisp
  • Post-update verification: run internal eval suite, verify skill count ≥ 10, smoke test daemon port 9105. On failure: passepartout update --rollbackgit checkout v0.5.0 → re-tangle → restart
  • Binary update path (when v0.63.0 ships): download binary from GitHub Releases, verify SHA-256, replace, restart

~80 lines bash + ~50 lines Lisp.

v0.49.0: Self-Configuration — Agent Proposes and Applies Config Changes

Passepartout's config is text files (`.env`, `.lisp`) — the same format the agent already edits. No competitor can self-configure because their config requires runtime restart or schema validation after file write. Passepartout can edit `.env` → daemon detects change → reloads → takes effect without restart.

  • passepartout config set <key> <value> CLI command: writes to `.env`, triggers daemon reload. ~20 lines bash.
  • Runtime config reload: daemon watches `.env` with inotify (reuses file-watch from v0.8.2). On change: re-reads env vars, reloads provider cascade, updates gate thresholds. No restart needed.
  • Config validation before write: agent verifies provider names exist (against neuro-explorer registry), ports are valid numbers, thresholds are integers, file paths are within memex. On invalid value, proposes correction.
  • Config change audit: every change writes to Merkle tree: "Agent changed DISPATCHER_RULE_THRESHOLD from 3 to 5. HITL approved." Gate trace records the decision.

~40 lines daemon + ~30 lines config validation.

Three tiers of self-configuration:

  1. Config Query (v0.7.2) — "What providers do I have?" → answered from system prompt CONFIG section. Already implemented.
  2. Config Suggest (v0.49.0) — "Should I use a cheaper model?" → agent analyzes telemetry, proposes specific config change with estimated savings. User decides.
  3. Config Apply (v0.49.0) — "Add @credentials to privacy tags" → agent proposes change → HITL review → writes `.env` → daemon reloads → change takes effect within one think() cycle.
  4. Config Optimize (v0.49.0) — "Make yourself cheaper" → agent analyzes cost patterns across all sessions, proposes multi-key optimization. User approves full batch.

v0.50.0: Self-Diagnosis Coach — /coach Command

Telemetry data plus the agent's self-knowledge enables coaching: the agent detects workflow anti-patterns and suggests improvements.

  • /coach — analyzes telemetry from the last N sessions, produces a coaching report with 3-5 actionable tips. Coaching is opt-in (privacy-respecting — no data leaves the machine).

~50 lines in telemetry skill + ~30 lines TUI rendering.

v0.51.0: Failure Attribution — Tag Task Failures with Probable Component

AHE (arXiv:2604.25850v2) shows that evolution loops work when failures are attributed to specific harness components, not just "the task failed." Passepartout's telemetry records task outcomes but doesn't classify failures by root cause.

  • In telemetry skill: when a session ends with a task failure, classify as: :tool-failure, :gate-overblock, :gate-underblock, :reasoning-error, :context-overflow, :timeout
  • Classification is deterministic: if last action was blocked by dispatcher → gate-overblock. If last action was a tool error → tool-failure. If last action was a successful tool call but wrong output → reasoning-error.
  • Feeds the Skill Creator (v0.57.0) — the agent knows which component to fix, not just that something went wrong

~20 lines in telemetry skill.

v0.52.0: MCP Native Client

  • Pure Common Lisp MCP client: parse JSON-RPC messages from MCP servers over stdio or SSE. No Python bridge, no Node.js subprocess. The client runs in the same Lisp image as the agent — zero serialization overhead between the agent and the MCP layer.
  • Implement the MCP protocol lifecycle: initialize handshake, list tools, call tool, handle notifications. Each MCP server registers its tools as entries in Passepartout's *cognitive-tool-registry* at connection time — the LLM's tool belt prompt automatically expands to include them.
  • MCP_SERVERS env var: comma-separated paths to MCP server config files (JSON). Each config specifies the server command, args, and env vars. Example: MCP_SERVERS=~/.config/passepartout/mcp/filesystem.json,~/.config/passepartout/mcp/git.json.
  • Tool invocation route: LLM proposes a tool call → Dispatcher verifies against permission table → MCP client serializes call as JSON-RPC → server executes → result deserialized back to plist → returned to LLM as tool output. The Dispatcher does not distinguish between native tools and MCP tools — the gate stack is uniform.
  • Register the MCP client as a skill (defskill~:passepartout-mcp-client~) so it can be hot-reloaded. The MCP client is not core infrastructure — it is a skill that extends the tool ecosystem.

200 lines as a new skill ~mcp-client.org.

v0.53.0: Web Search + Web Fetch Tools

Claude Code has WebSearchTool + WebFetchTool. Hermes has firecrawl-py + exa-py. Passepartout's agent cannot answer questions about the world, look up documentation, or research current events. Two new cognitive tools, no external dependencies:

  • search-web — POST query to a search API (SearXNG public instance as default, configurable via WEB_SEARCH_URL env var). Returns title + URL + snippet for top 10 results. Dispatcher's network-exfiltration gate (vector 8) provides free safety — search queries are already vetted.
  • fetch-web — GET a URL, extract text content via regex-based HTML stripping (no parser dependency — strip tags, keep whitespace). Returns plain text, truncated to 10,000 chars. Dispatcher's network-exfiltration gate checks the URL domain against the allowlist.
  • Both register via def-cognitive-tool as read-only tools (auto-approve via v0.7.2 safe-tool allowlist)

150 lines as a new skill ~programming-web.org. No external Python/Node.js process.

v0.54.0: LSP Integration

Claude Code uses LSP for code intelligence — find definitions, find references, diagnostics, hover types. Without LSP, Passepartout can grep patterns but cannot answer "where is this function defined?" or "what calls this?" — questions Claude Code answers instantly with zero LLM tokens.

  • LSP client as a skill (lsp-client.org). Communicates with language servers via stdio JSON-RPC (same pattern as MCP client, different protocol).
  • Three cognitive tools: lsp-definition (go to definition), lsp-references (find references), lsp-diagnostics (get errors/warnings for file)
  • Read-only tools — auto-approve via v0.7.2 safe-tool allowlist
  • Supported languages: any language with an LSP server (TypeScript, Python, Rust, Go, C/C++, Java, etc.) — not Lisp-specific
  • LSP servers installed by the user (e.g., npm install -g typescript-language-server). Passepartout auto-discovers installed servers via PATH.

~200 lines. Register as read-only cognitive tools. No daemon protocol changes — LSP is a background process, not a rendering concern.

v0.55.0: debug-inspect Cognitive Tool

Lisp enables live state inspection that no TypeScript/Python agent can match. Claude Code has no REPL. Passepartout can inspect and modify its own running state.

  • debug-inspect cognitive tool: evaluates a Lisp form in the running image and returns the result as a structured plist. Parameters: code (Lisp form string), package (optional).
  • Read-only tool: auto-approve via v0.7.2 safe-tool allowlist. No side effects — inspection only.
  • Use cases: (hash-table-count *memory-store*), (inspect memory-object-by-id "node-42"), (map 'list #'car *skill-registry*)
  • The agent can introspect its own state to answer meta-questions: "How many objects are in memory?" "What skills are loaded?" "What was the last HITL decision?"

30 lines in ~programming-repl.lisp (extends existing repl-eval with safety guard).

v0.56.0: Session Transcripts — /memex/system/sessions/

Passepartout has no session persistence beyond Merkle tree snapshots. Chat history lives in the TUI's in-memory vector and is lost on restart. Every competitor persists sessions: Claude Code uses JSONL, OpenCode uses SQLite, OpenClaw uses JSONL, Hermes uses SQLite+FTS5.

  • Auto-save on every message (user and agent): append to ~/memex/system/sessions/<date>-<title>.org as an Org file
  • Format: each message as an Org headline with role tag (:user:, :agent:, :system:), universal timestamp, content in body. Gate trace as a property drawer under the agent message headline.
  • Session title derived from the first user message (first 60 chars, sanitized for filename). Override with /rename <title>
  • Auto-save is automatic — no /export needed. The /export command delegates to the same function with format options (Org/Markdown/JSON)
  • Location: /memex/system/sessions/ — under system/, not daily/, no clutter
  • Survives daemon restarts. Resume via /resume <date-title> (existing session resume from v0.7.2)

80 lines in ~core-transport.lisp (append on message send) + reuse existing Org rendering.

v0.57.0: Auto-Memory Extraction — Learnings from Sessions

Claude Code's extractMemories runs at the end of each query loop, scanning the conversation for durable learnings and writing them to memory files. Hermes's MemoryProvider.sync_turn does the same. Passepartout records everything in the Merkle tree but never extracts cross-session learnings.

  • After each think() cycle that produces a final response (no tool calls pending), run extract-session-memory: a lightweight LLM call (50 tokens of prompt) that asks "What should I remember from this session?" and writes the result to ~~/memex/system/memory/<project>/<date>.org
  • The extraction uses a forked LLM call (separate from the main response) with the session transcript as context
  • Auto-memory files are injected into the CONTEXT section of future think() calls as "Session memory: [learnings from prior sessions about this project]"
  • Extracted memories include: decisions made, patterns observed, preferences expressed, errors encountered and fixed, codebase facts learned
  • Opt-out via AUTO_MEMORY=false env var. Extraction frequency capped at one per minute to prevent runaway API costs.

80 lines in ~core-reason.lisp + reuse session transcript for context.

v0.58.0: Universal Cross-Project Org Query

Passepartout's entire memex is Org — one format for memory, tasks, documents, transcripts. No competitor has this. Claude Code queries CLaude.md (one file), SQLite (separate DB), and file tools (grep). Passepartout can query everything with one function.

  • (org-query :tag "@urgent" :state "TODO" :since "-7d" :path "~/memex/projects/") — scans all projects in memex, returns matching Org headlines as memory objects. Zero LLM tokens, ~2ms execution.
  • (org-query :property "DEADLINE" :before "-1d") — overdue items. Feeds /agenda command.
  • (org-query :where "dispatch" :in-title-p t) — search headlines containing a term across all projects.
  • (org-query :limit 20 :sort :priority) — sorted, capped results.

150 lines in ~programming-org.lisp (extends existing Org manipulation primitives).

v0.59.0: Skill Creator — LLM-Drafted, Verified Skills

  • LLM drafts complete skill org-file from natural language description.
  • Mandatory pipeline: (a) syntax validation via lisp-syntax-validate, (b) sandbox-load in temporary jailed package (v0.3.2), (c) run registered trigger function against mock contexts, (d) run registered deterministic gate against mock proposals, (e) on pass, promote to live registry under passepartout.skills.<name>.
  • Required :repl-verified flag on all defun forms — the existing Dispatcher lint check warns on writes without verification. The Skill Creator enforces this at creation time.
  • Skills are the primary extension mechanism for users. The Skill Creator makes skill authoring accessible to non-Lisp-programmers: describe what you want in English, the LLM drafts the Org file, the system verifies it, and the skill is live.

150 lines as a new skill ~symbolic-skill-creator.org.

v0.60.0: Change Manifest — Skills Ship with Falsifiable Predictions

AHE (arXiv:2604.25850v2) shows that harness edits work better when each edit ships with a self-declared prediction, verified by next-round outcomes. Passepartout's Skill Creator should do the same — every new or modified skill carries predictions that telemetry verifies.

  • When the Skill Creator generates a skill, it also generates a #+PREDICTION: block in the Org frontmatter.
  • Over the next 10 sessions, telemetry compares actual outcomes against predictions. The verification result is appended to the skill file.
  • Disproven predictions flag the skill for review.
  • The change manifest persists in the skill's Org file — every skill carries its own evidence ledger.

~40 lines in Skill Creator + telemetry integration.

v0.61.0: Long-Horizon Planning (Task Tree DAG)

  • Decompose complex tasks into Org-mode headline trees. Each task node is a memory-object with terminal states: :todo:next-action:in-progress:done / :blocked / :stuck.
  • The LLM generates the initial task tree from the user's request. The REASONING tier processes each leaf task sequentially, updating node states as it progresses.
  • Parent nodes summarise child results: when all children of a node reach :done, the parent is promoted to :done with a synthesised summary. When any child reaches :stuck, the parent is promoted to :blocked with the blocking child's diagnostic.
  • Branch pruning: if a child is :stuck after three retries with different LLM providers, the parent re-plans the branch — the LLM generates alternative decomposition paths for the blocked sub-task.
  • Task trees persist as Org headlines in /memex/system/tasks/. Survive restarts. Visible to the user as editable Org files.
  • TUI task tree visualization: a collapsible Org headline tree rendered in the chat area. Each node shows its terminal state with a colored indicator ( todo, next-action, in-progress, done, blocked, stuck). Nodes expand/collapse on Enter. The tree updates in real time as the agent progresses through subtasks.

~200 lines.

v0.62.0: Tier Classifier Fix

  • Invert the current classifier: :REFLEX = deterministic lookups only (memory query, file-exists-p, check time, list TODOs by tag). :COGNITION = text processing, summarization, simple Q&A, note formatting. :REASONING = planning, code generation, multi-step task execution, dangerous operations.
  • Track classifier accuracy via telemetry: for each classified action, record whether the classification was appropriate.
  • The classifier function is overrideable via *tier-classifier*, allowing users or skills to customize routing.
  • The classifier should be a skill, not core infrastructure — reloadable and replaceable without restart.

~40 lines.

v0.63.0: SWE-Bench Harness

  • Automated pipeline: clone a repository from SWE-bench dataset, parse the GitHub issue, feed the issue description into Passepartout's cognitive loop, track the resolution trajectory as an Org headline tree, apply the generated patch, run the repository's test suite, score success (tests pass yes/no).
  • Trajectory persistence: each benchmark run produces an Org file under /memex/system/benchmarks/ recording every think() call, every tool invocation, every Dispatcher decision, and the final test result.
  • Regression mode: run the same benchmark after each version release. Track score trends. A version that regresses on SWE-bench does not ship.
  • Target: competitive score with Claude Code and OpenClaw on SWE-bench-verified by v1.0.0.

~200 lines.

v0.64.0: Computer Use / Vision

  • Screenshot capture: X11 (xwd / import) and Wayland (grim) bridge.
  • Vision model integration: send screenshot to a vision-capable model (GPT-4V, Claude 3.5, Gemini 2.0 Flash).
  • Coordinate-based interaction: xdotool / ydotool for click and type commands. Dispatcher approval gate applies — screen interaction requires HITL by default.
  • Use case: "open Firefox, search for the Passepartout GitHub repo, and star it."

~100 lines.

v0.65.0: Telemetry / Observability

  • Structured event log as JSONL in ~/.local/share/passepartout/telemetry/ (one file per session + aggregate)
  • Event types: :session-start, :think-call (tokens in/out, provider, model, duration), :tool-execution (name, duration, success/error), :gate-decision (gate name, result, pattern), :hitl-decision (approved/denied, pattern, session count), :context-snapshot (tokens used, foveal node, pruned count), :session-end (total tokens, total cost, tool calls, HITL count)
  • Aggregate keys tracked as a hash table: HITL approval rate, average context usage, most-blocked gate, tokens saved by foveal pruning vs full context
  • /telemetry TUI command: displays aggregate stats + per-session breakdown
  • Feeds the evaluation harness (SWE-bench trajectory data comes from the same telemetry system)

200 lines as a new skill ~symbolic-telemetry.org. No daemon protocol changes.

v0.66.0: Consensus Loop

  • Multi-provider parallel inference for critical decisions. When the action's impact score exceeds a threshold, the system sends the same prompt to 23 independent providers.
  • Disagreement detection: compare structured outputs. If all providers agree, proceed with highest-confidence result. If they disagree, flag for HITL approval.
  • Cost-aware: consensus mode doubles/triples cost. Only trigger when impact exceeds cost threshold. Configurable via CONSENSUS_THRESHOLD.
  • TUI consensus display: collapsible region listing each provider, its model, its proposal, and its confidence score. ✓ 3/3 providers agree in green; ✗ 2/3 agree in yellow.

~80 lines.

v0.67.0: GTD Integration

  • Full GTD cycle: capture → process → clarify → organize → reflect → engage.
  • Org properties: :TRIGGER: (what context), :BLOCKER: (what must complete first).
  • Weekly review: agent scans all projects and tasks, surfaces stalled items, suggests next actions. Produced deterministically — zero LLM tokens.
  • TUI agenda view: /agenda command renders Org-agenda as formatted scrollable region within the chat area.

~150 lines.

v0.68.0: Deep Emacs Integration

  • Phase II — Interpreter: ELisp compatibility layer runs inside Passepartout's Common Lisp image. Key Emacs packages (Org-mode, Magit) run natively without an Emacs process.
  • Org-agenda awareness: agent queries agenda view, incorporates agenda context into planning.
  • Clock time tracking: agent starts/stops clocks on Org headlines, produces clock tables.
  • Refile and archive: agent refiles headlines between Org files and archives completed items.

~300 lines.

v0.69.0: Save-Lisp-and-Die Binary

  • The setup binary (passepartout-setup) is a save-lisp-and-die executable (~100MB: SBCL runtime + core Lisp code + native embedding inference from v0.4.0 + 23MB embedding model). No SBCL install required. No Quicklisp. No bash script. The user runs one file.
  • Deterministic path (default, always runs first): the same distro detection, package installation, and configuration logic from today's bash script, reimplemented in Lisp. Handles Debian and Fedora families.
  • LLM-assisted path (optional, activates on deterministic failure): downloads Qwen2.5-0.5B (~500MB GGUF, pinned by hash). The model classifies success/failure/recoverable-error and selects the next corrective action from a constrained decision tree.
  • Model hash verification: the GGUF file is pinned by SHA-256 hash.
  • After setup completes, the binary exits. The user runs passepartout daemon to start the full system (a live SBCL process, not a sealed binary — REPL, hot-reload, self-modification all available).
  • Add FiveAM test: the deterministic path succeeds on a system with all dependencies pre-installed; the LLM-assisted path correctly classifies 10 common package-manager error messages.

~200 lines Lisp + build configuration.

v0.70.0: Channels + Providers — Match OpenClaw on Demand

The daemon protocol is client-agnostic hex-framed plists over TCP. Every new channel is a new client that speaks the same protocol. OpenClaw's 23+ channels are trivially copyable — each platform needs a poll loop + send function, 30 lines each. LLM providers are a row in ~*provider-cascade* — a new entry in neuro-provider.lisp with API endpoint + token pricing. Neither deserves its own release.

  • Channels: match OpenClaw's 23+ channels on demand. The Emacs bridge (already done, v0.4.0) proves the pattern. Each new platform (WhatsApp, iMessage, Matrix, IRC, etc.) is a skill that registers a poll-fn + send-fn. ~30 lines per channel.
  • Providers: match OpenClaw/Hermes on provider count. Adding a new provider is a table entry in neuro-provider.lisp: name, API endpoint, model list, pricing. ~20 lines per provider.
  • Voice: STT + TTS are REST wrappers (whisper / elevenlabs / espeak). Already spec'd as a skill. ~50 lines.

No separate releases. Done when needed, shipped when ready.

v0.71.0: Lish Shell

  • plist-returning commands: (ls :path "~/memex/projects/") → structured result
  • Pipe as function composition: (pipe (ls ...) (filter :state 'TODO))
  • Org-buffer output: shell output rendered as Org headlines
  • External bash compatibility: (bash "npm run build") → plist with exit code, stdout, stderr

~500 lines CL. Useful immediately for the agent.

v0.72.0: Buffer-as-CLOS Prototype

  • buffer class: source (file path or Org AST), content, cursor, marks, overlays
  • Key editing primitives: insert, delete, move, search, replace
  • Org-AST-backed: editing mutates the AST, text rendering is a view

~300 lines CL. No display dependency.

v0.73.0: EQL5 Feasibility

  • Add EQL5 to Quicklisp dependencies (optional, like croatoan)
  • Compile and verify on Linux (primary target)
  • Single QML window: "Passepartout" title, 800x600, dark background
  • Verify event loop integration with SBCL threads

~100 lines QML + build config.

v0.74.0: EQL5 TCP Client

  • QML window with terminal widget, input area, status bar
  • Connects to daemon via existing framed TCP protocol
  • Renders agent responses, gate trace, sidebar panels as QML components
  • Lives alongside croatoan TUI (two clients, one daemon)

~300 lines QML + ~200 lines CL.

v0.75.0: Minibuffer Prototype

  • Universal command line at bottom of Qt window
  • /chat /edit /shell /eval dispatch
  • Goes through same gate stack as agent actions

~200 lines CL.

v1.0.0: Neurosymbolic Maturity

v1.0.0 is where the agent achieves symbolic-first reasoning in the 10-80-10 architecture. The probabilistic engine (LLM) handles 10% input translation and 10% output formatting. The symbolic engine (VivaceGraph + Screamer + ACL2) handles 80% of reasoning — task planning, fact retrieval, constraint solving, and formal verification. Zero LLM tokens for the reasoning core.

Hallucination becomes structurally impossible because the symbolic engine will not accept a fact that contradicts its knowledge graph. Safety becomes provable because ACL2 can prove properties about the system's behavior. Self-improvement becomes stable because the agent modifies skills that are then verified before execution.

The system is benchmarked against SWE-bench (competitive score with Claude Code and OpenClaw), verified under concurrent load (MVCC from v0.38.0), and validated by the eval harness (v0.9.0). The 10-80-10 planner operates on a mature symbolic index seeded from months of gate outcomes, Screamer deductions, LLM-proposed facts with provenance, and human-authored facts.

The TUI at v1.0.0 is competitive: streaming responses, gate trace visualization, sidebar with 10 panels, skin system with 10+ presets, adaptive layout, full markdown, mouse support, spinner personality, and progress bars. The sidebar's gate trace, focus map, rule counter, sufficiency score, and provenance breakdown are capabilities no competitor can replicate — Passepartout's permanent UX differentiator.

v1.0.0 is the brain at maturity. The symbolic engine reasons. The probabilistic engine translates. The gate stack verifies. The Merkle tree preserves provenance. The eval harness guards against regression.

v2.0.0: Lisp Machine Emergence

v2.0.0 is where Passepartout stops being a daemon with clients and becomes the environment. The agent's cognitive loop, the user's editor, the user's shell, and the user's browser run in the same Common Lisp image. The Dispatcher gate stack verifies every action regardless of who initiated it — user or agent. The distinction between "tool" and "self" dissolves.

Why this version matters for UX parity. v0.4.0 through v1.0.0 give Passepartout four interaction surfaces (TUI, messaging apps, Emacs, voice). v2.0.0 inverts the problem: instead of building more clients, it builds a platform where the agent's environment and the user's environment are the same process, separated not by a sandbox but by the Dispatcher gate stack. The editor IS the agent's prompt. The shell IS the agent's actuator. The browser IS the agent's web research tool. There are no clients — there is one Lisp image, one address space, one Org-mode file system.

Architectural principle: Browser inside Lisp, not Lisp inside browser. Lisp is the parent process. It owns the window, the memory, and the input loop. The rendering engine (WebKit/Blink) is a library that paints pixels inside a Lisp buffer. The user can redefine functions while browsing without restarting. Keybinding lookups happen in microseconds (SBCL machine code) — the browser cannot "steal" shortcuts.

Qt/QML via EQL5 — the rendering surface

  • Qt/QML (via EQL5) is the UI framework. EQL5 exposes the full Qt C++ API from Common Lisp. QML is declarative — it matches Lisp's generation model.
  • Desktop: native look and feel on Linux, macOS, and Windows.
  • Mobile: Qt runs natively on iOS and Android. Android uses F-Droid for the unrestricted version and Play Store for sandboxed. iOS uses Guideline 4.7 ("Educational/Developer Tool" loophole, no JIT compilation).
  • Safety Bridge for mobile: Lisp code can manipulate browser/files but cannot touch hardware (GPS, camera, contacts) without standard permission pop-ups.
  • The minibuffer: a universal command line at the bottom of the screen. Not an Emacs modeline. Not a VS Code command palette. A single command surface for every action — edit files, navigate web, run Lisp expressions, invoke agent commands. M-x for everything.

Lish — the Common Lisp editor

Not elisp. Not Emacs. A multi-threaded Common Lisp editor rendered via Qt/QML. The complete system prompt lives in an Org buffer — the agent's identity, its skill registry, its memory, and its reasoning are visible and editable as Org text. The user modifies the agent's prompt and the agent reflects the change immediately — the prompt is a file in memory, not a hidden string in a config.

Org-babel for interactive evaluation: source blocks in Org files are executable. The user evaluates a #+begin_src lisp block and the result appears inline. The agent evaluates blocks to verify code before writing. The REPL is not a separate window — it is the Org buffer in which the agent and user both work.

The editor and the agent share the same Lisp image. The editor is not a client that connects to a daemon — it IS the daemon process. The TUI from v0.x is the editor's rendering surface.

Nyxt — the Common Lisp browser (three erosion stages)

The browser is not a one-time feature. It is a multi-year erosion of the rendering stack toward pure Lisp:

Stage 1 — Qt + WebKit. Qt provides window management and native widgets. WebKit renders web content inside a Lisp buffer. Network requests via dexador (pure Lisp). HTML parsed via Plump (pure Lisp). Layout via Yoga (C-based Flexbox, wrapped via FFI). JavaScript via embedded QuickJS. This stage delivers a working browser in months, not years.

Stage 2 — S-expression DOM. Lisp builds its own DOM representation as native S-expressions. WebKit is reduced to pixel painting only — it receives rendered layouts from Lisp, not raw HTML. The agent can traverse and manipulate the DOM as Lisp data structures without serialization. This makes web content natively queryable and modifiable by the agent's cognitive loop.

Stage 3 — Pure Lisp layout. WebKit turned off entirely. Lisp-native layout engine (12-18 months of focused development). CSS subset sufficient for the modern web's 95% use case. JavaScript via QuickJS remains for interactive content. The browser is now a Lisp application that happens to speak HTTP, not a web engine wrapped in a Lisp process.

Lish — the Lisp shell

Bash is a text-stream protocol. Passepartout speaks plists. The Lish shell replaces text streams with structured data — every command returns a plist, not a byte stream. Pipe becomes function composition. Scripts become Lisp functions that operate on memory objects directly.

The agent and the user share the same shell. The user types (list-todos :tag "@urgent"). The agent proposes (shell "npm run build"). The Dispatcher verifies both. The shell is not a separate process — it is a REPL connected to the same Lisp image as the agent's cognitive loop.

Org-mode buffers become the file system. The user's memex (~/memex/) is browsable as a tree of Org headlines. File operations (read, write, list, search) operate on Org AST nodes, not byte streams. A "directory listing" is a tree of headlines. A "file read" is a subtree rendered as text.

Bash remains available as a backend for running external commands, but it is not the primary interface.

Emacs migration — three phases

The Emacs bridge (v0.4.0) is Phase I. The deep integration is three phases, not one:

Phase I — Parasite (v0.4.0). Emacs is a client. The elisp TCP bridge sends text and receives responses. The agent does not control Emacs. Emacs users get a native chat experience alongside the TUI.

Phase II — Interpreter (v2.0.0). An ELisp compatibility layer runs inside Passepartout's Common Lisp image. Key Emacs packages (Org-mode, Magit) run natively without an Emacs process. The compatibility layer does not aim for 100% coverage — it targets the packages the agent's workflows depend on.

Phase III — Successor (v2.0.0 and beyond). Native Common Lisp implementations of Org-mode workflows and Git integration read/write the same file formats. Total independence from Emacs. Emacs users who prefer Emacs keep the bridge. New users get the native experience.

Strategic timeline

v0.4.0 Emacs bridge (Phase I Parasite) → v1.0.0 Neurosymbolic Maturity → v2.0.0 Lish editor + Nyxt browser (Stage 1) + Emacs Phase II/III + mobile. The Qt/QML surface enables gradual erosion of the rendering stack without rewriting the application logic. The three-phase Emacs migration ensures Lisp users are never abandoned — the bridge works from day one, the native experience grows under it.

v3.0.0+: Cannibalization — Eat Your Dependencies

v3.0.0 begins the erosion of external dependencies — the system that was bootstrapped on Qt, WebKit, C runtime, and Linux starts replacing them piece by piece with native Lisp components. This is the realization of the Lisp Machine: not built from scratch, but arrived at through gradual replacement of a working system.

v3.0.0: Single-Process Convergence

  • TCP bridge between daemon and EQL5 client becomes an internal function call
  • One SBCL image: daemon + editor + shell + browser share one address space
  • The wire protocol becomes nil — all communication is plist exchange in memory

v3.1.0: Lisp-Native Layout Engine

  • Replace QML layout with Lisp layout (Yoga FFI as intermediate step)
  • CLOS-based widget tree with computed dirty regions
  • Diff-based redisplay: only changed cells re-render

v3.2.0: Browser Stage 2 — S-Expression DOM

  • Lisp builds its own DOM as native s-expressions
  • WebKit reduced to pixel painting only
  • Agent traverses and manipulates DOM as Lisp data without serialization

v3.3.0: Browser Stage 3 — Pure Lisp Browser

  • Lisp-native layout engine handles CSS subset
  • JavaScript via QuickJS remains
  • WebKit turned off entirely
  • The browser is now a Lisp application

v3.4.0+: Qt/QML Erosion

  • Replace QML components with Lisp-native widgets (one at a time)
  • Window management via Lisp-native X11/Wayland bindings
  • Font rendering via HarfBuzz FFI → Lisp replacement
  • Event loop: Qt's → SBCL's native thread scheduler
  • Each replacement is verified by the eval harness; the system remains usable at every step

v3.6.0: Stage0 Lisp Bootstrap

  • 500-byte hex bootstrap → self-hosting Lisp
  • Replace Linux bootloader
  • The Lisp machine runs on bare metal

v4.0.0: Native Inference

LLM inference moves in-process. No external servers. No API keys required for inference.

Lisp as Sovereign Governor, not as Math Engine. The weights themselves are not stored as Lisp objects — this would waste 50% memory on type tags and destroy cache locality through pointer-chasing. Instead, the entire tensor is tagged as a single Lisp object (macro-tag). The Lisp image holds a pointer to optimized flat binary (GPU-friendly, FPGA-compatible). The tag is checked once. After that, all math happens in the optimized backend.

Native inference (FFI binding to llama.cpp)

  • FFI binding to llama.cpp via CFFI: load GGUF models, run inference, manage KV cache. Single SBCL image, zero process boundaries. The agent and the model share memory.
  • Speculative safety: the Dispatcher gate stack intercepts token generation in real time. A token that would produce a blocked action is preemptively suppressed before generation. No external inference API supports this.
  • Foveal-peripheral compute: the model skips pruned context nodes during attention computation. External APIs compute full attention regardless of what you send. In-process inference makes the sparse-tree rendering pay off at the compute level, not just the token level.

Live surgery on cognition

With in-process inference, the agent's internal state becomes inspectable:

  • Pause inference mid-stream. Inspect hidden states and activations as Lisp variables.
  • Modify a vector, change a sampling parameter, resume.
  • Detect when the agent is likely to hallucinate by comparing current activation patterns against historical baselines.
  • The REPL becomes a surgical instrument for the agent's own cognition — not just for verifying code, but for inspecting and correcting the neural process that generates it.

DSL-compiled model architectures

Model architectures are described as Lisp DSL:

  • (defmodel passepartout-reasoning :type 'transformer :heads 32 :dim 4096 :layers 32)
  • The DSL compiles to machine code for the target backend (GPU via CUDA, FPGA via VexRiscv, CPU via llama.cpp).
  • Python interprets at runtime. Lisp compiles once. Model architecture changes are treated the same as code changes — edited, verified, hot-reloaded.

v5.0.0: Hardware — Tagged Lisp Architecture

The Lisp machine becomes physical. RISC-V with tagged architecture, hardware-enforced type checking, and FPGA prototype for the symbolic core.

Not a from-scratch processor. Use RISC-V as the skeleton, add custom Lisp extensions. RISC-V provides the carrier architecture (standard instruction set, existing toolchain, LLVM support). Lisp extensions provide tagged computation (type checking in hardware, parallel garbage collection, S-expression traversal as atomic operations).

The macro-tag approach

  • Top 48 bits of every memory word = Type Tag. Hardware checks tags in parallel with ALU operations. Trap on type mismatch.
  • A tensor (70B weights) is one macro-tagged Lisp object — a pointer to flat binary. The tag is checked once. Math happens at native speed. This replaces "weights as sexps" (which wastes 50% memory on per-weight tags and destroys cache locality).
  • Custom instructions: TADD (tagged add), LISP.CAR, LISP.CDR — Lisp primitives as single-cycle hardware operations.

Phase migration: Host → Co-processor → Self-hosted

  1. Parasitic. Lisp card (FPGA) is a PCIe co-processor. Host CPU (Intel/AMD, Linux/Windows) handles "dirty" I/O — networking, display, file systems. Lisp card handles tagged computation and the agent's cognitive loop. If Lisp crashes, host survives. Reset card, reload. Memory mapping: the card can see the host's memory. The Lisp environment reaches out and inspects data.
  2. Functional Hijacking. Lisp UI runs on the card, displays through the PC's GPU. The agent indexes Linux files into Lisp objects. The host becomes an I/O server for the Lisp card.
  3. Driver Cannibalization. Point the agent at C drivers. Ask it to generate native Lisp drivers for the hardware the card controls directly. PCIe Passthrough for direct hardware access.
  4. Self-Hosting. Replace the Linux bootloader with Stage0 Lisp (a bootstrap from 500 bytes of hex to a self-hosting Lisp). Cut the umbilical cord. The Lisp machine runs on bare metal.

Concrete prototyping milestones

Stage Hardware Cost What it delivers
TinyTapeout Custom silicon (130nm) ~$5001,000 8-bit tagged toy processor with Lisp primitives
Shuttle Multi-project wafer ~$10,00020,000 Tagged RISC-V core at 100300MHz
FPGA Terasic DE10-Nano / Xilinx KCU105 ~$200500 VexRiscv with custom Lisp extensions, PCIe card form factor
Industrial Commercial foundry (5nm) ~$10M100M+ Competes with modern CPUs on tagged workloads

Start at TinyTapeout. Validate the tagged architecture works. Move to FPGA. Validate at speed. Only then consider silicon.

Garbage collection in hardware

Dedicated bus master (Scavenger) runs background garbage collection while the main CPU executes code. No "GC pause." The scavenger traverses the heap in parallel with computation, freeing unreachable objects without stopping the agent.

Persistent single-address-space memory

NVRAM for the entire heap. Turn on the machine — state is exactly where you left it. No "booting." No "loading memory from disk." The agent's Merkle-tree memory, skill registry, knowledge graph, and induced functions survive restarts as a contiguous hardware state.

Why this is not "Lisp inside browser"

Most Lisp-on-hardware attempts fail because they try to compete with Intel on raw math. That's the wrong axis. The tagged architecture doesn't need to beat a GPU at matrix multiplication. It needs to beat a CPU at symbolic computation — graph traversal, constraint solving, theorem proving, garbage collection. These are the v3.0.0 symbolic engine's workload. Hardware that makes them single-cycle is the differentiator, not hardware that runs matrix math faster.

v6.0.0: True Agency

World models, temporal reasoning, goal persistence across restarts.

  • World models: Predictive models of user behavior, project dynamics, system state.
  • Temporal reasoning: Scheduling, deadlines, elapsed duration awareness.
  • Goal persistence: Goals survive restarts. Long-term projects in memory-objects.

Neurosymbolic Phase Reference

Each phase has a detailed implementation spec in its version section above. Summary of what is and isn't built:

Phase Component Lines Release
0 PM-type-level gates + core integrity ~75 v0.10.0
0b Layered auth — Layer 1 (cryptographic) ~200 v0.12.0
1 Triple fact store + abstract API ~200 v0.14.0
1a Self-preservation mechanisms ~120 v0.16.0
2 Screamer admission gate ~200 v0.18.0
3 Archivist as fact proposer ~100 v0.20.0
4 Sufficiency criterion — the flip ~50 v0.22.0
5 VivaceGraph + Merkle DAG + ontology ver ~400 v0.25.0
6 ACL2 structural verification ~200 v0.27.0
7 10-80-10 planner ~500 v0.36.0
8+ Semantic Wikipedia integration TBD v0.36.1+
Total ~2045

What Is NOT Built by the Neurosymbolic Phases

  1. A separate knowledge graph serialization format before the ephemeral phase proves what facts are useful. Premature format commitment is the ontology problem writ small. Let use determine the format.
  2. ACL2 verification of empirical claims. Apple is red. rm -rf / is destructive. These are observations, not theorems. Screamer handles empirical consistency. ACL2 handles structural verification.
  3. VivaceGraph before Screamer. The admission gate is the critical path. The persistence layer is an optimization of a working system.
  4. A per-fact ontology designed upfront. Extract from the gate stack, extend from deductions and observations, prune through contradiction detection. The ontology is a garden, not a building.
  5. New core ASDF components. Every phase is a skill. A corrupted symbolic engine degrades reasoning but does not kill the agent. Satisfies the self-repair criterion.
  6. A "complete" symbolic index for the broad domain. The neural index is the permanent gateway to the richness of prose. The symbolic index handles what can be mechanically verified. The boundary is permanent, not transitional. The neuro is the brain. The symbolic is the education.

Competitive Advantage Analysis

Phase 0-1: Deterministic safety, now with type-level guarantees

The existing Dispatcher gate stack already provides 0-LLM-token safety verification. Phase 0 adds structural guarantees: no heuristic bypassing of the type hierarchy. A request to modify the dispatcher's own rules is impossible by construction, not just caught by pattern matching. No competitor has this — their equivalent of "core file protection" is a prompt instruction, not a type system.

Phase 0b: Layered signal authentication — verified origin, not claimed origin

No competitor verifies who issued a signal. Every agent harness accepts signals from any source that speaks its protocol. A compromised dependency can impersonate any signal source. Passepartout's four-layer authentication gate makes signal source spoofing impossible at Layer 1 (cryptographic), detectable at Layers 2-3 (sensory + deterministic reasoning), and probabilistically flagged at Layer 4 (style analysis). The key registry has Merkle-hashed provenance — key creation, promotion, and revocation are auditable, versioned, and survivable across restarts.

Phase 2-3: Verified extraction — the symbolic index grows without corruption

No competitor verifies extracted facts against an existing knowledge base. Their memory systems (Claude Code's extractMemories, Hermes's MemoryProvider, OpenClaw's session transcripts) record what the LLM said happened, not what the system proved happened. Passepartout's Screamer-gated admission makes the symbolic index a monotonic, verified structure. Facts are admitted because they are consistent, not because the LLM generated them.

Phase 4-5: Self-accelerating knowledge — the downward cost curve

The sufficiency criterion makes Passepartout's "cheaper over time" thesis measurable. As the ratio of non-lossy facts grows, LLM calls for extraction decrease. At sufficiency, extraction of known categories becomes deterministic. The downward cost curve is not a marketing claim — it is a structural property of the architecture, visible through the sufficiency score.

Phase 6-7: Provable plan soundness

No competitor verifies task plans against formal constraints. Claude Code plans in a single LLM call with no post-hoc verification. Hermes decomposes tasks into subtasks but does not prove them non-contradictory. Passepartout's ACL2-verified plans are structurally guaranteed to have no deadlocks, no dependency cycles, and no safety violations. The verification is a proof, not a prompt.

Phase 0-1a: Self-preservation — the agent knows when it is wounded

No competitor detects its own degradation. Claude Code, OpenCode, and Hermes all fail silently when a tool crashes or a dependency is missing — the agent keeps running, producing degraded output, never telling the user. Passepartout's quarantine system detects failing skills, unloads them automatically, and displays a degraded-mode indicator in the status bar. The external watchdog restarts the daemon if the process dies. The integrity monitor detects corrupted core files. The agent refuses to execute commands that would destroy its own runtime, explaining why and redirecting to the safe termination path.

Semantic Wikipedia: Entity coverage at zero marginal cost

No competitor has a general-knowledge entity graph because no competitor has a symbolic engine to populate. Claude Code knows codebases; it doesn't know that Nabokov wrote Pale Fire and lectured on Kafka. Passepartout with Wikidata loaded knows both, and the entity knowledge costs zero LLM tokens — it is loaded once as structured data and queried via VivaceGraph traversals.

The permanent competitive advantage

The competitive advantage is not any single feature. It is the architecture's ability to accumulate verified knowledge from four independent sources (gates, deduction, verified LLM proposals, human authoring) and to make that knowledge queryable with provenance. Competitors accumulate chat transcripts. Passepartout accumulates a provenanced, self-verifying knowledge graph. Transcripts become stale and unreliable. The knowledge graph becomes richer and more trustworthy with every session.

Design rationale is in:

  • notes/passepartout-neurosymbolic-design-decisions-and-options.org — design rationale for every decision
  • notes/passepartout-symbolic-engine-exploration.org — original architecture exploration
  • notes/passepartout-whitehead.org — Whitehead's four concrete contributions
  • docs/ARCHITECTURE.org — current pipeline architecture
  • docs/DESIGN_DECISIONS.org — foundational architectural decisions