Properly followed TDD cycle: - Reverted implementations, proved RED (3 assertions fail) - Re-added implementations, proved GREEN (3 assertions pass) - Recorded both outputs in org files
169 KiB
Passepartout Evolutionary Roadmap
- The Evolutionary Roadmap
- File Update Checklist
- v0.1.0: The Autonomous Foundation — RELEASED 2026-04-20
- v0.2.0: Interactive Refinement — RELEASED 2026-04-29
- v0.3.0: Event Orchestration + HITL — RELEASED 2026-05-06
- Secret Exposure Gate, Shell Safety, Lisp Validation
- Multi-distro deployment (Debian+Fedora, systemd, Docker)
- Project rename to Passepartout (files, packages, env vars)
- 31 org files with full literate prose
- Human-in-the-Loop (HITL)
- Event Orchestrator (unified hooks+cron+routing)
- Context Manager (project scoping)
- Model-Tier Routing (cost optimization)
- Memory Scope Segmentation
- Asynchronous Embedding Gateway
- TUI Experience (Daily Driver Quality)
- v0.2.x Backfill Remediation (stubs and gaps)
- Project Renaming (Bouncer → Dispatcher)
- Parser RCE elimination
- Shell safety & actuator sandboxing
- TUI Critical Fixes
- v0.4.0: Production Hardening — RELEASED 2026-05-06
- v0.4.1: Design Cleanup
- v0.4.2: Structured Output (LLM → JSON → plist)
- v0.4.3: Shell Sandboxing & Safety Classification
- v0.5.0: File Reorganization & Token Economics
- File Reorganization — self-repair criterion
- Extract core-context → symbolic-awareness
- Extract heartbeat generation → symbolic-events
- Relocate 6 utility fragments to correct files
- Rename 6 core files — shorter, clearer names
- Rename 13 system-* → symbolic-/neuro-/embedding-*
- Delete
system-model.lisp(16-line wrapper) - Rename 4 gateway-* → channel-*
- Split
gateway-messaging→ 4channel-*files - Document core/non-core self-repair criterion
- Update all cross-references after reorg
- Verify: ASDF compiles, FiveAM suite passes, integration tests pass.
- Token Economics (implemented as skills — not core)
- Tokenizer integration
- Prompt prefix caching
- Incremental context assembly
- Per-call token budget
- Cost tracking
- Module Architecture
- Competitive Advantage Analysis — v0.5.0 Summary
- v0.5.1: Compilation Hardening
- Compilation Hardening — eliminate all compilation errors and warnings
- Fix real errors first (2 files, ~5min)
- Fix TUI forward references — moot (no longer issue)
- Fix cross-package undefined variables (2 files, ~15min)
- Fix CFFI struct deprecation (1 file, ~20min)
- Suppress remaining harmless cross-skill undefined-function warnings
- Fix unused variables in test code — moot (gateway-messaging deleted)
- Compilation Hardening — eliminate all compilation errors and warnings
- v0.6.0: Time Awareness
- v0.7.0: TUI Essentials — Terminal Parity
- Readline/Ctrl key bindings
- Unicode width awareness
- Scroll indicator + new-message notification
- Fix status bar line 2 overlap
- Deeper autocomplete (frecency + subcommand)
- External editor integration (Ctrl+X+E) — done, pending test
- TUI-based setup wizard — deferred to v0.8.0
- Pads for chat scrolling — deferred to v0.7.1 (needs Croatoan terminal for testing)
- Deeper autocomplete (frecency + subcommand)
- v0.7.1: TUI — Streaming + Markdown Rendering
- v0.7.2: TUI — Gate Trace + HITL + Search
- Gate trace visualization
- HITL inline command handling
- Message search (/search or Ctrl+F)
- Context visibility command (
/context) - Session rewind, fork, and resume — Merkle-root-based
- Safe-tool allowlist — read-only operations auto-approve
- Agent identity file —
/memex/IDENTITY.org - Undo/redo per operation —
/undo,/redo - Expand /context debugging — similarity trace + dropped nodes
- Tool execution hardening — timeouts + write verification
- Tag stack — categories + severity tiers
- Merkle provenance audit —
/audit <node-id>
- v0.8.0: Direction 2 — Information Radiator (Foundation)
- v0.8.1: Direction 2 — Rich Rendering
- v0.8.2: Direction 3 — Living Environment (Skin System)
- v0.8.3: Direction 3 — Adaptive Layout + Personality
- v0.9.0: Signal Pipeline, Concurrency & Streaming
- Priority-queue signal processing
- MVCC memory concurrency
- Structured output enforcement
- Doom-loop detection — 3 identical tool calls triggers HITL
- Busy-mode — queue on interrupt
- CLI / non-interactive mode —
passepartout ask - Provider health tracking — success rate + latency
- Cost-based provider routing
- Intelligent provider fallback — per-task-type routing
- Internal evaluation harness — 10 tasks, regression detection
- Autonomous certification badge
- v0.10.0: Tool Ecosystem (MCP-Native) + Voice Gateway
- MCP native client
- Core MCP tools (from existing roadmap items)
- TUI tool visualization
- Environment Steward
- v0.10.3 — TODO Voice Gateway
- Web search + web fetch tools —
search-web,fetch-web - LSP integration — language server protocol client
- Auto-saved session transcripts —
/memex/system/sessions/ - Auto-memory extraction — learnings from sessions
- Universal cross-project Org query
debug-inspectcognitive tool — live state inspection- Competitive Advantage Analysis — v0.10.0 Summary
- v0.11.0: Planning, Self-Modification & Deterministic Routing
- v0.12.0: Evaluation & Vision
- v0.13.0: Consensus, GTD & Deep Emacs Integration
- v0.14.0: Self-Configuring Setup Binary
- v1.0.0: SOTA Parity (verified)
- v2.0.0: Lisp Machine Emergence
- v3.0.0: Neurosymbolic Maturity
- v4.0.0: Native Inference
- v5.0.0: Hardware — Tagged Lisp Architecture
- v6.0.0: True Agency
The Evolutionary Roadmap
Understanding Passepartout as a function in time is not nostalgia. It is architectural guidance. Every decision in v0.x should be made with awareness of where the system is going. Code written today becomes the substrate for v3.0. Skills designed today become the vocabulary the symbolic engine speaks tomorrow.
The probabilistic beginning is not a weakness to overcome. It is the bootstrap. The system learns the domain through probabilistic inference, and that learned knowledge becomes the seed for the symbolic engine. By the time the symbolic engine takes over, it has a rich knowledge graph to reason about, grown from thousands of probabilistic interactions.
This is how you build a reasoning machine: start with a learner, make it learn to verify by watching itself and its user, let verification become the core. Every blocked action becomes a rule. Every approved exception becomes a pattern. The symbolic layer grows at the probabilistic layer's expense. Remove the learner once it has learned enough.
Each version expands the deterministic layer. The Dispatcher writes rules from approved exceptions. Shadow mode runs trial executions. Tool permission tiers mature from simple allow/deny to nuanced context-aware policies. The agent becomes less likely to attempt dangerous actions not because it is smarter but because the guard has more complete information.
The roadmap is designed working backwards from SOTA parity (v1.0.0), guiding each version toward a fully autonomous, self-editing agent. Each version builds on the previous, with features designed to be implemented in pure Common Lisp + Org-mode.
The TODO states in each version's Tasks section are the authoritative task tracker. The feature tables describe what each version delivers.
Feature releases increment the minor version (v0.X.0). Bugfix and hardening releases increment the patch version (v0.X.Y). This ensures that security patches and critical fixes are visible in the version number and can ship independently of feature work. No feature release ships without its prerequisite hardening releases resolved.
File Update Checklist
When a version's state changes (DONE → tested → released), update these locations:
ROADMAP.org— mark item DONE, update LOGBOOK timestampREADME.org— update version badge (line 6), update Current Capabilities table (add new Stable rows for shipped features, remove Planned rows that have shipped)~.env.example— update version references as neededlisp/core-transport.lisp— update themake-hello-messageversion stringpassepartout(bash entry point) — update version reference
On release:
- Tag the release on GitHub
- Extract DONE items from ROADMAP (all items with LOGBOOK timestamps since the last release tag) and use as the release notes body
- If a
CHANGELOG.mdis needed for packaging tools, auto-generate it from ROADMAP DONE items
v0.1.0: The Autonomous Foundation — RELEASED 2026-04-20
- State "DONE" from "TODO" [2026-04-20 Mon 19:05]
The secure, auditable Lisp kernel. All core infrastructure in place.
DONE Perceive-Reason-Act pipeline
- State "DONE" from "TODO" [2026-04-20 Mon]
This established the three-stage cognitive cycle that all later features plug into. The pipeline is the invariant — skills, gates, actuators, and clients all compose through it.
DONE Skills engine with jailed loading
- State "DONE" from "TODO" [2026-04-20 Mon]
This made the "thin harness, fat skills" identity operational. Skills loading into jailed packages (v0.1.0) is the foundation for the skill sandbox mode (v0.3.2) and the Skill Creator (v0.9.0).
DONE Policy skill (6 invariants)
- State "DONE" from "TODO" [2026-04-20 Mon]
This established the "explanation required" invariant that gates stack above. The policy gate (priority 500) runs first and sets the precedent that every action must justify itself.
DONE Memory (memory-object + Merkle hashing)
- State "DONE" from "TODO" [2026-04-20 Mon]
The Merkle tree with content-addressed hashing made copy-on-write snapshots (v0.2.0) and MVCC concurrency (v0.9.0) possible. The hash-as-identity property also feeds directly into the foveal-peripheral model's semantic retrieval.
DONE Scribe + Gardener background workers
- State "DONE" from "TODO" [2026-04-20 Mon]
These background workers established the heartbeat-driven maintenance pattern. The event orchestrator (v0.3.0) generalizes this into hooks and cron jobs.
DONE LLM gateway (OpenRouter, Ollama)
- State "DONE" from "TODO" [2026-04-20 Mon]
The provider-agnostic cascade pattern established in v0.1.0 makes the model-tier router (v0.3.0), privacy-aware routing (v0.3.0), and consensus loop (v0.11.0) possible — they all build on the same backend-cascade-call abstraction.
DONE Shell actuator, Emacs bridge, credentials vault
- State "DONE" from "TODO" [2026-04-20 Mon]
The actuator registry pattern makes MCP tools (v0.10.0) possible — they register the same way.
DONE FiveAM test suite
- State "DONE" from "TODO" [2026-04-20 Mon]
The test infrastructure established in v0.1.0 becomes the TDD runner (v0.12.0) and the SWE-bench harness (v0.12.0).
v0.2.0: Interactive Refinement — RELEASED 2026-04-29
- State "DONE" from "TODO" [2026-04-29 Wed 20:17]
The "Brain" meets the "Machine." Standardization and professionalization of the user interface and environment.
DONE Text User Interface (Croatoan-based, styled, scrollable)
- State "DONE" from "TODO" [2026-04-29 Wed]
The Croatoan-based TUI with model-view separation and dirty-flag rendering is the foundation for all TUI improvements: word wrap in v0.3.3, gate trace in v0.4.0, tool visualization in v0.8.1, and streaming in v0.7.1.
DONE Self-editing (error detection, surgical fix, hot-reload)
- State "DONE" from "TODO" [2026-04-29 Wed]
The surgical edit + tangle + hot-reload pipeline (text replace → tangle → compile → load) established the self-modification capability that makes the Skill Creator (v0.9.0) safe — skills are generated, tangled, loaded, and verified in the same loop.
DONE Enhanced utilities (structural Lisp/Org manipulation + REPL)
- State "DONE" from "TODO" [2026-04-29 Wed]
Structural Lisp/Org manipulation tools are the primitives the self-improve module (v0.2.0) and the programming skills (literate block extraction, syntax validation) build on.
DONE Onboarding wizard (modular Lisp setup for LLM providers)
- State "DONE" from "TODO" [2026-04-29 Wed]
The setup wizard established the "works out of the box" constraint that the gateway QA (v0.4.0) and Emacs bridge (v0.4.0) onboarding flows follow.
DONE Memory rollback (snapshot and restore)
- State "DONE" from "TODO" [2026-04-29 Wed]
Copy-on-write snapshots (deep-copying the memory hash table on every write) gave the pipeline crash recovery. The snapshot mechanism is the root of MVCC concurrency (v0.9.0).
v0.3.0: Event Orchestration + HITL — RELEASED 2026-05-06
- State "DONE" from "TODO" [2026-05-06 Wed 15:50]
Unified control plane, Human-in-the-Loop state management, and backfill remediation for stubs and gaps from v0.1.0/v0.2.0. Security hardening followed as v0.3.1–v0.3.3 point releases.
DONE Secret Exposure Gate, Shell Safety, Lisp Validation
- State "DONE" from "TODO" [2026-05-02 Sat]
DONE Multi-distro deployment (Debian+Fedora, systemd, Docker)
- State "DONE" from "TODO" [2026-05-02 Sat]
DONE Project rename to Passepartout (files, packages, env vars)
- State "DONE" from "TODO" [2026-05-02 Sat]
DONE 31 org files with full literate prose
- State "DONE" from "TODO" [2026-05-02 Sat]
DONE Human-in-the-Loop (HITL)
CLOSED: [2026-05-03 Sun 14:00]
- State "DONE" from "TODO" [2026-05-03 Sun 14:00]
Continuation-based interaction. The agent can suspend its cognitive loop to ask for permission or clarification and resume precisely where it left off. Builds on the dispatcher's existing Flight Plan mechanism.
DONE Event Orchestrator (unified hooks+cron+routing)
- State "DONE" from "TODO" [2026-05-02 Sat 22:36]
Unified control plane for hooks, cron, and complexity-based routing.
- hook-registry + cron-registry + tier classifier
- Hooks via
#+HOOK:Org-mode properties - Three complexity tiers:
:REFLEX(no LLM),:COGNITION(light LLM),:REASONING(full LLM) - Hooked into heartbeat for cron processing
- Rule-based tier classifier (overrideable via
*tier-classifier*)
DONE Context Manager (project scoping)
CLOSED: [2026-05-05 Tue]
- State "DONE" from "TODO" [2026-05-05 Tue]
Stack-based project focusing with persistence.
push-context~/~pop-context~/~with-contextstack operationscurrent-scopewired into perceive gate*scope-resolver*/focus~/~/scope~/~/unfocusTUI commands- Context stack persisted to
~/.cache/passepartout/context.lisp, auto-restores on boot
DONE Model-Tier Routing (cost optimization)
CLOSED: [2026-05-03 Sun 16:00]
- State "DONE" from "TODO" [2026-05-03 Sun 16:00]
Extend *model-selector* for quadrant-based routing with per-slot provider cascades.
- Privacy filter (local-only for @personal content) — top priority
- Quadrant tagging (foreground/background × probabilistic/deterministic)
- Complexity classifier (code/plan/chat/background slots), each with its own provider cascade
- Model-selector skill registers into
*model-selector*hook
Deferred to v0.5.0: budget tracking per request, per-session cost monitoring. Deferred to v0.11.0: TUI /config command for cascade configuration (env vars for now).
DONE Memory Scope Segmentation
CLOSED: [2026-05-03 Sun 16:30]
- State "DONE" from "TODO" [2026-05-03 Sun 16:30]
Extend memory-object with :scope property.
:memex(permanent knowledge),:session(ephemeral),:project(current work)- Scope-aware retrieval in memory layer
DONE Asynchronous Embedding Gateway
CLOSED: [2026-05-05 Tue]
- State "DONE" from "TODO" [2026-05-05 Tue]
Provider-agnostic vector generation (Ollama, OpenAI, hashing fallback).
- Three backends: local (Ollama-compatible), openai (/v1/embeddings), hashing (SHA-256)
embeddings-computeand*embedding-backend*for runtime provider selectioningest-astpopulates vectors at object creation timemark-vector-stalemarks vectors as:pendingand queues for re-embeddingembed-all-pendingdrains queue, computes vectors, stores in*memory-store*- Cron job registered with orchestrator: runs every 10m on
:reflextier EMBEDDING_PROVIDERenv var for provider selection- Registered as proper skill (
defskill~:passepartout-system-model-embedding~)
Note: The default :hashing backend uses SHA-256-derived vectors. SHA-256 is a
cryptographic hash with the avalanche property — one-bit input differences produce
entirely different outputs. This makes it a correct integrity check (Merkle tree)
but an incorrect similarity function (semantic retrieval). v0.4.0 replaces it with
a zero-dependency lexical similarity algorithm that actually captures textual
overlap while remaining offline-capable.
DONE TUI Experience (Daily Driver Quality)
CLOSED: [2026-05-05 Tue]
- State "DONE" from "TODO" [2026-05-05 Tue]
All P0-P4 items implemented:
- P0: Chat scrollback (Page Up/Down), Input history (up/down arrows)
- P1: Status bar (connection, mode, msg count, scroll, activity indicator)
- P1: Message rendering (timestamps, colors, role icons)
- P2: Command palette (
/helpcommand listing) - P2: Multi-line input (
\ + Enterinserts newline) - P3: Background activity indicator (
…thinkingspinner) - P4: Tab completion for all
/~commands - P4: Configurable theme (
*tui-theme*plist,~/theme~command)
DONE v0.2.x Backfill Remediation (stubs and gaps)
CLOSED: [2026-05-03 Sun]
- State "DONE" from "TODO" [2026-05-03 Sun]
- P0: vault-get-secret / vault-set-secret wrappers (one-line delegation to vault-get/vault-set with
:type :secret) - P0: system-archivist Scribe + Gardener (distill daily logs → atomic notes; scan broken links, orphaned memory-objects)
- P0: system-self-improve surgical edit + error fix (read → replace → snapshot → write → balance → tangle → reload)
- P0: programming-org org-modify + org-ast-render (locate node by ID, apply changes; convert plist AST → Org text)
- P0: programming-literate balance check + tangle sync (verify balanced parens in source blocks; verify .lisp matches tangled output)
- P1: system-event-orchestrator bootstrap (scan Org files for HOOK/CRON properties, register via existing registries)
- P1: system-memory introspection (structured statistics: object count by type, TODO distribution, orphans, snapshots)
- P1: path relic skills/ → lisp/ (update skill-initialize-all and context-skill-source to resolve against lisp/ directory)
- P2: core-context semantic retrieval (populate org-object-vector at ingest; fallback: TF-IDF bag-of-words)
- P2: core-context subtree-based skill source loading (context-skill-subtree for targeted retrieval by heading name)
- P3: Variable name drift normalization (memory vs memory-store, skills-registry vs skill-registry)
- P4: Eliminate STYLE-WARNINGs from setup output (reorder defuns for same-file forward references; accept cross-skill references)
DONE Project Renaming (Bouncer → Dispatcher)
- State "DONE" from "TODO" [2026-05-02 Sat 22:00]
The Dispatcher's role has evolved beyond security guard. It is the seed of the deterministic engine — it learns to execute procedures without invoking the neural net.
DONE Parser RCE elimination
- State "DONE" from "TODO" [2026-05-06 Wed 16:38]
Rationale: SBCL's default *read-eval* accessor is ~t, enabling the #. reader macro to execute arbitrary Lisp forms during parsing. Three code paths in the current codebase process untrusted input with read-from-string or read without binding *read-eval* to nil. Each represents a remote code execution vector that bypasses all deterministic safety gates — the Dispatcher's shell safety check, path protection, secret scanning, and network exfiltration detection never execute because the malicious form is evaluated during parsing, before the action plist is even constructed.
- Wrap
read-from-stringinthink()(core-loop-reason.lisp:102) with(let ((*read-eval* nil)) ...)— LLM output is untrusted by definition; parsing it must never execute code. The markdown-strip regex already runs, so the fix immediately follows it. - Wrap
readinload-memory-from-disk(core-memory.lisp:143) with(let ((*read-eval* nil)) ...)— thememory.snapfile lives in ~~/ by default and could be corrupted or planted. - Wrap
read-from-stringinaction-system-execute(core-loop-act.lisp:62) with(let ((*read-eval* nil)) ...)— the:system :evalpath executes untrusted payload code. Explicitly assert that this path requires the Dispatcher's approval gate. - Add FiveAM test: inject
"(#.(shell \"echo pwned\"))"into thethink()pipeline and assert no shell execution occurs.
DONE Shell safety & actuator sandboxing
- State "DONE" from "TODO" [2026-05-06 Wed 16:46]
Rationale: The :system :eval actuator path is currently unchecked by the Dispatcher's approval gate — only :shell and :tool "shell" trigger HITL. The shell actuator wraps commands through double bash -c nesting (system-actuator-shell.lisp:10), where Lisp's format with s produces S-expression-safe strings, not shell-safe strings. A command containing quotes or substitution characters can break out. Additionally, skill files loaded via skill-initialize-all execute arbitrary Lisp in jailed packages — a skill file containing (uiop:run-program "dangerous") executes immediately on load before any gate can inspect it.
- Fix shell double-wrapping: remove the outer
bash -cinactuator-shell-execute; pass the command string directly touiop:run-programwith:force-shell nil. The timeout wrapping remains via the OStimeoutbinary. - Extend the Dispatcher approval requirement to the
:system :evalpath (currently only:shelland:tool "shell"trigger HITL). An unboundedevalshould require the same Flight Plan approval as a shell command. - Add skill sandbox mode for
skill-initialize-all: load each skill's code into a temporary jailed package, run the registered trigger function in isolation, verify it imports no restricted symbols (from CL package:run-program,shell,run-shell-command), then promote to the live registry on pass. - Add FiveAM test: register a skill containing
(uiop:run-program "echo test")in the body and verify the sandbox blocks its promotion.
DONE TUI Critical Fixes
- State "DONE" from "TODO" [2026-05-06 Wed 17:59]
Rationale: The TUI is Passepartout's only interface. OpenClaw distributes across 25+ messaging channels with voice, Canvas, and macOS/iOS apps. Hermes Agent ships multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output in its TUI. Passepartout's Croatoan TUI must carry the product alone, and it currently lacks word wrap, cursor movement, resize handling, connection-loss feedback, a quit command, and persistent history. None of these fixes require daemon changes — they are pure client-side Croatoan work that closes the gap from "proof of concept" to "daily driver."
- Word wrap in
view-chat: every LLM response longer than the terminal width is silently truncated to one line. Croatoan supports multi-line rendering;view-chatmust calculate per-message line height, adjust visible-message count accordingly, and scroll per message-line rather than per message. For very long messages, add a pager mode where pressing Enter on a message opens it in a scrollable overlay. - Left/Right cursor in input: add
:leftand:rightkey handlers that move a cursor position index within the:input-bufferlist. Characters are inserted at the cursor position, not always appended. Backspace deletes at the cursor position. - SIGWINCH handler: register a terminal resize signal. On resize, re-measure the root window, destroy and recreate the three sub-windows (
sw,cw,iw), set all dirty flags tot, and force a full redraw. - Connection-loss detection: the reader thread currently polls
recv-daemonsilently on EOF. On disconnection, queue a:disconnectedevent, set:connectedtonil, clear:busy, add a red system message "Connection lost — run /reconnect to retry." The:disconnectedevent dirties the status bar to show the status indicator. /quitcommand + persistent history: on/quit, save:input-historyto~/.cache/passepartout/history(one line per entry, most recent first), send a goodbye handshake to the daemon, close the socket, and exit the main loop cleanly. On startup, load history from the save file if it exists.- Scroll offset clamping: clamp
:scroll-offsetto(max 0 (- msg-count visible-lines)). The status bar shows"msgs:12/45"(visible / total) rather than"msgs:45"(total only) so the user knows when they've scrolled past the oldest message. - Message list storage: replace the O(n²)
(nth i msgs)list indexing with a simple adjustable vector.add-msgappends;view-chatiterates witharef. The vector is resized as needed. Same API surface, 100x speedup on message-heavy sessions. - Add FiveAM tests: word-wrap produces correct line count for a 200-character string at 80-column width; cursor left/right wraps at buffer boundaries; SIGWINCH preserves message state;
/quitsaves and restores history.
v0.4.0: Production Hardening — RELEASED 2026-05-06
- State "DONE" from "TODO" [2026-05-06 Wed 20:56]
The features in this version were originally sequenced as v0.3.x patches but represent feature-level scope. They activate the architectural advantages designed in v0.1.0–v0.3.0, harden the self-build safety boundary, and expand Passepartout's interaction surfaces beyond the terminal TUI. Each feature depends on infrastructure already in place — the wiring, the sandbox, the gate trace — and activates it.
DONE Semantic retrieval activation
CLOSED: [2026-05-06 Tue]
- State "DONE" from "TODO" [2026-05-06 Tue]
Rationale: Two independent failures prevent the foveal-peripheral semantic retrieval path from ever firing. First, context-awareness-assemble never passes :foveal-vector to context-object-render, so the renderer receives nil for foveal-vector and the similarity calculation always returns 0.0. Second, the default :hashing embedding backend uses SHA-256 (a cryptographic hash with the avalanche property) as a similarity function. SHA-256 is designed to produce entirely different outputs for nearly identical inputs — the property that makes it secure for integrity verification is precisely what makes it useless for semantic retrieval. A content-addressed Merkle tree correctly uses SHA-256 for identity; a retrieval engine needs a similarity function, not an identity function. The infrastructure for real embeddings (local with Ollama, openai with the embeddings API) is fully implemented and working — this release activates the last-mile wiring and replaces the semantically blind default with a zero-dependency algorithm that actually captures textual overlap.
- Wire
:foveal-vectorintocontext-awareness-assemble: pass(memory-object-vector (memory-object-get foveal-id))as the:foveal-vectorargument to thecontext-object-rendercall (one line incore-context.lisp:148-150). - Replace
:hashingdefault backend with character-trigram Jaccard similarity. Pure Lisp, zero external dependencies, works exactly as offline as SHA-256, but captures lexical overlap: "authentication" and "authenticate" share trigrams "aut," "uth," "the," "hen," "ent," etc. The vector is a bloom filter of trigrams; cosine similarity maps to Jaccard (intersection / union). This provides real if crude semantic signal without any server. - Rename existing
embedding-backend-hashingtoembedding-backend-sha256and repurpose it as an explicit:sha256provider for environments where even trivial Lisp computation is undesirable (embedded, resource-constrained). Document it as "integrity-only, no semantic retrieval capability." - Add
EMBEDDING_PROVIDERguidance to the setup wizard: explain that:hashingis the default offline fallback,:localrequires Ollama withnomic-embed-text, and:openaiuses the paid embeddings API. - Add FiveAM test: ingest two semantically related nodes ("implement login form" and "add password authentication"), verify cosine similarity > 0.0 with the trigram backend.
DONE Self-build safety boundary
CLOSED: [2026-05-06 Tue]
- State "DONE" from "TODO" [2026-05-06 Tue]
Rationale: Self-building (the agent modifying its own source code) begins at v0.10.0 when the tool ecosystem and test runner are in place. But self-building without path-level write protection means the agent can modify the very pipeline code that is currently executing — the core-* files that implement the Perceive-Reason-Act cycle, the Merkle-tree memory, the skill engine loader, and the Dispatcher gate stack itself. A hallucination or a logic error during self-building that corrupts core-loop-reason.lisp destroys the agent's ability to reason about and fix the corruption. The "thin harness" is not privileged code in the architectural sense (homoiconicity means any code can be modified at runtime), but it must be protected code — modifications to the harness require a human in the loop, enforced by the Dispatcher's path-protection gate, not by convention.
This is the corollary to "thin harness, fat skills": the harness is thin enough to be auditable by a human, and the Dispatcher ensures it stays that way. Skills and system modules expand freely; the core contracts to a minimal, protected kernel.
- Add
core-*patterns to*dispatcher-protected-paths*:core-*.org,core-*.lisp, and their tangled equivalents. Any file write, file read-that-prefaces-a-write, or shell command targeting these paths triggers the Dispatcher's blocking gate. - The blocked action produces a Flight Plan (HITL approval required). The human reviews the proposed core change in an Org buffer before approving. This is the same mechanism that governs shell commands and network exfiltration — the core protection is a path-specific instance of the existing gate, not a new gate.
-
Implement a
SELF_BUILD_MODEenv var. WhenSELF_BUILD_MODE=true(defaultfalse):- Core path protection is active (writes blocked, HITL required)
- Non-core writes proceed through the standard Dispatcher gate (permissions table + policy + Dispatcher)
SELF_BUILD_MODE=falsedisables core protection entirely — useful during initial development when the human is manually editing core files and doesn't want every save to trigger a Flight Plan
- Telemetry: track self-build actions (core modifications proposed, core modifications approved, core modifications denied). This is the dataset that the Dispatcher's learning system uses in v3.0.0 to understand which core modifications are safe enough to automate.
- Add FiveAM test: simulate a write to
core-loop.lisp, verify the Dispatcher returns a:LOGrejection with"protected path"in the message.
DONE TUI Differentiator Visualization
CLOSED: [2026-05-06 Tue]
- State "DONE" from "TODO" [2026-05-06 Tue]
Rationale: Three architectural elements exist today in the daemon that no competitor can render — the Dispatcher gate trace, the foveal-peripheral focus map, and the rules-learned counter. All three run in pure Lisp with 0 LLM tokens. None are visible to the user. Making them visible turns Passepartout's architecture from an internal mechanism into a trust-building UX — the user sees exactly which safety gates passed, exactly what the agent is focusing on, and exactly how many rules the Dispatcher has learned from their decisions. No competitor can ship this because none has deterministic gates to trace, foveal-peripheral context to map, or a rule-synthesizing Dispatcher to count.
- Gate trace per action: extend the daemon's response plist to include
:gate-trace— a list of(:gate <name> :result <:passed | :blocked | :approval>)entries produced bycognitive-verify. The TUI renders each entry as a colored line below the corresponding agent message: green✓ Dispatcher: path allowed, red✗ Dispatcher: blocked (shell safety), yellow→ HITL required: /approve HITL-ab12. Gate trace lines are dim and collapsible (press Tab on a message to toggle trace visibility). This turns the invisible ten-vector safety gate into the user's primary trust mechanism. - Focus map in status bar: add a second status bar line showing
[Focus: core-loop.lisp:think()] [Scope: passepartout] [3 related nodes]. The daemon already tracksfoveal-idand*scope-resolver*in the signal plist; the TUI reads these from the most recent response and renders them. Related node count comes from the number of objects with cosine similarity ≥ threshold in the last context assembly. This shows the user what the agent is looking at — the single biggest trust gap in AI agents. - Rule counter in status bar:
[Rules: 47]. The Dispatcher's*hitl-pending*hash table and approved/disallowed memory-object entries provide the count — every HITL decision that produces a rule increments it. The TUI reads the count from a new daemon response field:rule-count. The user watches the counter tick up as they teach the agent their preferences. - Expanded theme: replace the 7-flat-color
*tui-theme*with a 25-color layered system organized by message category (roles, content types, tool visibility, gate states, status). See the design discussion for the full color mapping. Implement a/theme <name>command that swaps between named presets (dark,light,solarized,gruvbox). Theme change persists to disk and reloads on next session. - Add FiveAM tests: gate trace renders correctly for pass/block/approval states; focus map updates when
foveal-idchanges; rule counter increments on HITL approval.
DONE Gateway QA, Discord, Slack + Emacs Bridge
CLOSED: [2026-05-06 Tue]
- State "DONE" from "TODO" [2026-05-06 Tue]
Rationale: Passepartout currently has Telegram and Signal gateways in the codebase, both untested. The setup wizard has Slack as a configurable option with no implementation. Two messaging channels is not competitive — OpenClaw has 25+, Hermes Agent has 6+. But more critically: the Lisp crowd is Passepartout's natural audience, and they live in Emacs. An Emacs bridge that speaks the framed TCP protocol is trivial to implement (the protocol is 200 lines of Lisp; porting to elisp is straightforward) and turns every Emacs buffer into a Passepartout interaction surface. This is not the deep Emacs integration of v0.11.2 (where the agent controls Emacs) — this is Emacs controlling the agent over TCP. The Emacs user selects a region, hits ~M-x passepartout-send-region, and the agent responds in a dedicated buffer. They never leave their editor.
Gateway:
- Integration tests for Telegram gateway: mock the Telegram Bot API, verify message send (POST
/sendMessage) and receive (GET/getUpdates) round-trip. Verify HITL commands (/approve,/deny) are intercepted before injection. - Integration tests for Signal gateway: mock
signal-clioutput, verify JSON message parsing and polling loop. Verify send path constructs correctsignal-cli sendarguments. - Add Discord gateway: Discord Bot API (REST + Gateway WebSocket for real-time messages). Register bot, handle
MESSAGE_CREATEevents, send viaPOST /channels/{id}/messages. Map Discord mentions to:user-inputsignals. HITL commands work identically to Telegram. - Add Slack gateway: Slack Events API + Web API. Subscribe to
message.imevents, send viachat.postMessage. Reuse the SLACK_TOKEN config key already present in the setup wizard. - Each gateway is a skill under
passepartout.skills.gateway-<platform>— jail-loaded, hot-reloadable, sandbox-verified. - Gateway configuration surfaced in the setup wizard: after entering a token, offer "send a test message to yourself" as a connection verification step. Surface the result as a green ✓ or red ✗ with the error detail.
- Gateway status displayed in
messaging-list: platform, configured (yes/no), gateway active (yes/no), last message received (timestamp).
Emacs Bridge:
- Elisp package:
passepartout.el. Connects to daemon on localhost:9105 viamake-network-process(TCP). - Sends: framed plist protocol identical to the TUI (
frame-messageported to elisp — write hex length prefix, write prini'd plist). The daemon does not know or care whether the client is the Croatoan TUI, the CLI, or Emacs. - Receives: daemon responses arrive in a
passepartout-responsebuffer. Each response is rendered as an Org headline: role prefix, timestamp, content. Gate trace (from v0.4.0) is rendered as property drawer entries under the headline. M-x passepartout-send-region: sends the selected region as a:user-inputsignal with the current buffer's file path as context.M-x passepartout-send-buffer: sends the entire buffer.M-x passepartout-focus: sets the foveal focus to the Org headline at point (extracts:ID:property, sends:point-updatesignal). Equivalent to the TUI's/focuscommand.M-x passepartout-approve/M-x passepartout-deny: prompts for HITL token and sends approval/denial.- Agent modifies an Org file → Emacs receives
:buffer-updatevia the bridge → the buffer is refreshed (revert-bufferor targeted replacement). - The Emacs bridge is the daily driver for Lisp users. The TUI remains for non-Emacs users and for the differentiator visualizations. Emacs users get the gate trace and focus map as Org property drawers in the response buffer — same data, elisp-native rendering.
DONE Native embedding inference
CLOSED: [2026-05-07 Thu]
Implemented: in-process embedding inference via CFFI binding to llama.cpp.
- FFI binding to llama.cpp's current (non-deprecated) embedding API via a C wrapper library (
/usr/local/lib/libllama_wrap.so) that bridges CFFI pointer params to llama.cpp struct-by-value calls - Builds on
/usr/local/lib/libllama.so(llama.cpp shared library) - Ship nomic-embed-text-v1.5 (80MB Q4_K_M GGUF) as the bundled embedding model. 768-dimensional vectors (nomic-bert, 12 layers), CPU-friendly, <100ms per document on any modern CPU
EMBEDDING_PROVIDER=nativeenables the native backend; model preloads at daemon startup (~30s)- Lazy loading via
*embedding-backend* :nativealso works (first call blocks ~45s for model init) - C wrapper functions:
llama_wrap_model_load,llama_wrap_new_context,llama_wrap_encode,llama_wrap_batch_init/free - Struct sizes verified via C sizeof/offsetof: llama_model_params (72B), llama_context_params (136B), llama_batch (56B)
- BERT pooling: uses
llama_get_embeddings_seqfor sequence-level embedding sb-int:set-floating-point-modes :traps nilrequired before any llama.cpp call (FPU state conflict)llama_backend_initrequired before model loadllama_model_get_vocab+llama_vocab_n_tokensreplaces deprecatedllama_n_vocabllama_tokenizetakesvocab*notmodel*(API change since earlier llama.cpp versions)- Exports:
embedding-backend-native,embedding-native-load-model,embedding-native-unload,embedding-native-ensure-loaded,embedding-native-get-dim - FiveAM tests: availability, loading, dimensions (768), self-similarity (1.0), semantic similarity ranking
- The trigram Jaccard backend remains as the default fallback for zero-config deployments
- State "DONE" from "TODO" [2026-05-07 Thu]
Competitive Advantage Analysis — v0.4.0 Summary
Production hardening is the process of turning architectural potential into operational strength. The semantic retrieval fix activates the foveal-peripheral model's full power: deep nodes that are topically related to the user's focus now surface automatically. Without this, the context model is "dumb truncation at depth 2." With it, it's genuine semantic awareness — and since the retrieval is deterministic (in-image vector math, zero LLM tokens), the cost advantage over competitors' LLM-assisted search compounds with every query.
The self-build safety boundary is a capability no competitor provides: the agent cannot modify its own brain stem without human review. The core-* path protection means the Dispatcher draws a line at the filesystem level, not the policy document level. Claude Code, OpenClaw, and Hermes all allow agents to modify their own source files without distinction between application code and runtime code. Passepartout's Dispatcher prevents modification of the very pipeline that implements the Perceive-Reason-Act cycle, the Merkle-tree memory, the skill engine loader, and the Dispatcher gate stack itself. This is the operational realization of "thin harness, fat skills" — the harness is thin enough to be auditable by a human, and the Dispatcher ensures it stays that way.
The TUI differentiator visualizations are Passepartout's permanent UX advantage. The gate trace, focus map, and rule counter are UX elements that only make sense in Passepartout's architecture — deterministic gates, foveal-peripheral context, and Dispatcher rule synthesis exist nowhere else. No competitor can ship this because none has deterministic gates to trace, foveal-peripheral context to map, or a rule-synthesizing Dispatcher to count. Combined with the TUI critical fixes from v0.3.3, the TUI is competitive on usability and uniquely informative on safety and context transparency.
The messaging gateways and Emacs bridge expand Passepartout's interaction surface from a single terminal TUI to four surfaces: terminal, Telegram/Signal/Discord/Slack messaging, Emacs, and voice (via the voice gateway in v0.10.3). The Emacs bridge is strategically critical — the Lisp crowd is Passepartout's natural audience, and they live in Emacs. An Emacs bridge that speaks the framed TCP protocol turns every Emacs buffer into a Passepartout interaction surface. Combined with the gate trace and focus map rendered as Org property drawers in the response buffer, Emacs users get the same differentiator visualizations as TUI users — same data, elisp-native rendering.
v0.4.1: Design Cleanup
DONE Remove system-prompt-augment mechanism
- State "DONE" from "TODO" [2026-05-07 Thu 13:13]
Rationale: The system-prompt-augment slot on the skill struct enables skills to inject always-on text into every LLM system prompt via a maphash over *skill-registry* in think() (core-loop-reason.lisp:83-92). Only one skill uses it — programming-repl — and it does so as a backdoor: the skill's trigger is hardcoded to nil, so it never fires as an active skill. Its sole contribution is injecting a REPL-first mandate into every system prompt. The other 24 skills have nil augments and are skipped by the ~when aug-fn guard. This is architecturally wrong: standing mandates (always-on rules) should live in a dedicated *standing-mandates* list, not piggyback on a skill that is never triggered. The mechanism also fuels a false claim in DESIGN_DECISIONS about 3,000-8,000 tokens of overhead — the actual overhead is ~40 tokens from the one active augment.
- Remove
system-prompt-augmentslot from theskilldefstruct anddefskillmacro (core-skills.org:78, core-skills.org:121-133). - Remove the
maphashskill-augments collection block fromthink()and the associated(or skill-augments "")injection in the system-promptformatcall (core-loop-reason.org:83-95, core-loop-reason.org:196-198). - Remove
:system-prompt-augment #'repl-mandatefromprogramming-repl'sdefskill(programming-repl.org:269). - Introduce
*standing-mandates*(a list of function → string generators). Inject them into the IDENTITY section of the system prompt alongsideassistant-name. Moverepl-mandatethere:(push #'repl-mandate *standing-mandates*). - Tangle the corresponding lisp/ files.
DONE Fix false token-overhead claims in docs
- State "DONE" from "TODO" [2026-05-07 Thu 13:13]
Rationale: Two documents claim the system-prompt-augment mechanism can waste 3,000-8,000 tokens per think() call (DESIGN_DECISIONS line 435, ROADMAP line 504). This conflates the maphash iteration (cheap hash walk, no token cost) with the augments actually emitted (only programming-repl emits 40 tokens; the ~when aug-fn guard skips the other 24 nil-augment skills). Once issue #1 above is resolved (removing the mechanism), these claims become doubly false.
- DESIGN_DECISIONS: Rewrite or remove bullet 2 under "Open Questions and Risks" (line 435). Replace with a corrected note on standing mandates via
*standing-mandates*. - ROADMAP v0.5.0 intro (line 504): Remove or rewrite the claim that "system prompt overhead alone could reach 3,000-8,000 tokens per call before user input is even processed." The fixed overhead is not from skill augments — it is from the IDENTITY, TOOLS, CONTEXT, and LOGS sections, which prefix caching addresses.
DONE Update security vector count 9→10 in docs
- State "DONE" from "TODO" [2026-05-07 Thu 14:40]
Rationale: The current dispatcher runs 10 deterministic checks (11 counting the warning-only REPL lint), but the README, ARCHITECTURE.org, and the dispatcher-check docstring all say 9. The actual count: 0=REPL-lint (warn only), 1=lisp-validation, 2=secret-path, 2b=self-build-core, 3=secret-content, 4=vault-secrets, 5=privacy-tags, 6=privacy-text, 7=shell-safety, 8=network-exfil, 8b=high-impact-approval. Ten blocking/approval checks. The vector 2b (self-build safety) and the new count must be reflected accurately in all documentation.
- Update README.org "What Makes Passepartout Different" → "nine" becomes "ten".
- Update docs/ARCHITECTURE.org Dispatcher Gate Stack table — add self-build entry.
- Update security-dispatcher.lisp:196 docstring to list all 11 vectors.
DONE Rewrite README — add "What is an agent?" section, revise claims
- State "DONE" from "TODO" [2026-05-07 Thu 14:40]
Rationale: The current README opens with competitive claims (downward cost curve, 2-3x fewer tokens) that are architecturally sound but not yet measured in the implementation. A non-engineer reader doesn't know what an AI agent is or why they'd want one. The README should lead with a short "What is an agent?" section (3-4 sentences, Wikipedia link), then "What Makes It Different" (safety, org-mode, offline — things that actually work today), then honest status of what's implemented vs planned.
- Add "What is an AI Agent?" section at top: 3-4 sentences + link to Software agent.
- Move competitive cost/speed claims to docs/DESIGN_DECISIONS.org.
- Revise "The more you use it, the cheaper it gets" to reflect current state — architectural aspiration, not measured implementation yet.
- The Current Capabilities table and Quick Start sections stay intact.
DONE Register cognitive tools — 10 tools for codebase operations
- State "DONE" from "TODO" [2026-05-07 Thu 14:40]
Rationale: The def-cognitive-tool macro and *cognitive-tool-registry* are fully implemented but the registry is empty. The LLM sees "No tools registered" in its tool belt prompt. The agent can chat and run shell commands, but cannot search codebases, find files, eval code, run tests, or manipulate Org files. Ten cognitive tools bridge this gap and are prerequisites for the TDD workflow, org-mode additions, and evaluation harness in v0.5.0.
- New skill:
programming-tools.org(programming-tools.lisp). -
Register 10 tools via
def-cognitive-tool:search-files— regex search in file contents (usescl-ppcre:scan). Parameters:pattern,path(dir),include(glob filter).find-files— glob file matching (uses SBCLdirectory). Parameters:pattern,path.read-file— read file contents (usesuiop:read-file-string). Parameters:filepath.write-file— write content to file. Parameters:filepath,content.list-directory— list directory contents. Parameters:path,pattern(optional).run-shell— execute shell command (through existing shell actuator). Parameters:cmd.eval-form— evaluate Lisp expression in running image. Parameters:code,package(optional).run-tests— run FiveAM tests. Parameters:test-name(optional, nil runs all).org-find-headline— find Org headline by ID or title. Parameters:idortitle,filepath(optional, searches memory store if not given).org-modify-file— surgical text replacement in Org file (reuses existingorg-modify). Parameters:filepath,old-text,new-text.
- Descriptive names rather than Unix command names — the LLM reads these in a prompt, not a terminal.
- Each tool is
20-60 lines. ~search-filesiterates directory, reads files, scans lines. - FiveAM tests: each tool gets a test verifying operation on a temp directory.
DONE Enforce NO-HARDCODED-CONSTANTS programming standard
- State "DONE" from "TODO" [2026-05-07 Thu 14:40]
Rationale: Currently, several configurable values are hardcoded in source: the Dispatcher's rule threshold (not yet configurable), similarity thresholds, timeouts, shell max output. The user should control behavior through .env, not by editing source code. This is rule #6 in the programming-standards.org skill. Each new TODO that introduces a configurable value must add it to .env.example with a documented default.
- Add
DISPATCHER_RULE_THRESHOLD=3to.env.example(number of HITL approvals before a pattern becomes a permanent rule). - Add
RULES_FILE="$HOME/memex/system/rules.org"to.env.example. - Scan existing source for hardcoded configurable values — add to
.env.examplewhere missing. - Any new TODO in v0.4.2+ that introduces a configurable value MUST include its
.env.exampleentry.
v0.4.2: Structured Output (LLM → JSON → plist)
The current think() function asks the LLM to produce raw S-expression plists. Four pieces of defensive infrastructure (handler-case around read-from-string, markdown-strip, plist-keywords-normalize, the RCE guard test) exist because LLMs cannot reliably produce balanced, keyword-prefixed plists. The fix: use the LLM API's native function calling / tool-use feature. The LLM always returns guaranteed-valid JSON. Convert to plist deterministically at the boundary.
DONE Implement function-calling / tool-use API in provider requests
- State "DONE" from "TODO" [2026-05-07 Thu 17:17]
Rationale: Every major provider API (OpenAI, Anthropic, Groq, DeepSeek, OpenRouter) supports function calling. The LLM is sent tool definitions as JSON Schema. It returns tool_calls with guaranteed-valid JSON arguments. This eliminates the fragile read-from-string plist parsing entirely — the probabilistic layer speaks JSON (what it was trained on), the deterministic layer speaks plists (what the code controls). Conversion happens at a narrow, well-defined boundary.
- Modify
provider-openai-requestinsystem-model-provider.lisp: add optional:toolsparameter. When tools are provided, include"tools": [...]and"tool_choice": "auto"in the request body. - Parse
tool_callsfrom the API response: extractfunction.nameandfunction.arguments(guaranteed valid JSON). - Return a new result shape:
(:status :success :tool-calls ((:name "shell" :arguments (:cmd "echo hello"))))alongside or instead of:content. - For providers that don't support function calling (local Ollama): keep
:contentpath as fallback. LLM can still return raw text. - FiveAM test: send a request with a mock tool definition, verify the response shape.
DONE
Wire structured tool calls into think() — JSON→plist at boundary
- State "DONE" from "TODO" [2026-05-07 Thu 17:17]
Rationale: Once the provider layer returns structured tool-calls, the think() function must convert them to the internal plist format that cognitive-verify and loop-gate-act expect. This is a one-way, deterministic conversion at the architectural boundary.
- Add
json-alist-to-plisthelper incore-loop-reason.lisp: convert JSON alist (fromcl-json:decode-json-from-string) to keyword-prefixed plist. String keys → keywords. Nested objects recurse. JSON null →nil. ~25 lines. - In
think()afterbackend-cascade-call: if result contains:tool-calls, convert each tool call's:argumentsJSON to plist viajson-alist-to-plist, wrap in(:TYPE :REQUEST :PAYLOAD (:TOOL <name> :ARGS <plist> :EXPLANATION "...")). - Keep the existing
read-from-stringpath as fallback for providers that return raw text (local Ollama, streaming). - The
read-from-stringpath remains guarded by*read-eval* nilfrom v0.3.1. - FiveAM test: JSON
{"action":"shell","cmd":"echo hello"}→ plist(:ACTION "shell" :CMD "echo hello")round-trip verified.
v0.4.3: Shell Sandboxing & Safety Classification
The current shell safety is regex-based pattern matching — a fast pre-filter that catches obvious attacks but cannot contain sophisticated or encoded payloads. This version adds actual sandbox isolation (bubblewrap Linux namespaces) as the enforcement layer, and introduces severity classification so the rule learning system in v0.5.0 can apply different thresholds to catastrophic vs harmless operations.
DONE
Add bwrap sandbox to shell actuator
- State "DONE" from "TODO" [2026-05-07 Thu 17:37]
Rationale: Regex-based shell safety catches obvious patterns (rm -rf /, dd if=, mkfs.) but is fundamentally bypassable with encoding (base64 -d | bash), indirection (find / -exec rm {} \;), or interpreter-based execution (python3 -c "import os; os.system(...)"). Bubblewrap (bwrap) is a 200KB unprivileged sandbox binary available on all modern Linux distributions. It creates transient Linux namespaces without root, without Docker, without daemon processes. Combined with the regex pre-filter, it provides defense-in-depth: the regex catches obvious attacks fast (no sandbox spawn), the sandbox contains sophisticated ones.
- In
actuator-shell-execute(system-actuator-shell.lisp): detect ifbwrapbinary is available (which bwrap). - If available: wrap command in
bwrap --ro-bind /usr /usr --ro-bind /lib /lib --ro-bind /bin /bin --ro-bind /etc /etc --bind ~/memex ~/memex --bind /tmp /tmp --unshare-net --unshare-ipc timeout .... --unshare-net: no network access within sandbox. Makes regex-based network exfiltration check redundant for sandboxed commands.--unshare-ipc: no shared memory, no semaphore injection.- If
bwrapis unavailable: log a warning, fall back to current behavior (regex-only safety). - The regex checks remain as a fast pre-filter — they run before spawning the sandbox.
- FiveAM test: command that reads
/etc/shadowinside sandbox fails with permission error; same command in unsandboxed fallback is at least caught by path protection.
DONE Shell safety severity classification system
- State "DONE" from "TODO" [2026-05-07 Thu 17:37]
Rationale: The current shell safety check treats all dangerous patterns equally — rm -rf / gets the same treatment as a backtick injection in echo. But not all shell operations carry the same risk. A severity classification system enables the rule learning engine (v0.5.0) to apply different thresholds: catastrophic operations are always HITL regardless of approval count, moderate operations graduate to allowed after N approvals, harmless operations are allowed by default.
- Define four severity tiers as plist keywords:
:catastrophic(mkfs, dd to devices, rm -rf , shred /dev),:dangerous(chmod -R /, writes outside/memex, curl to unwhitelisted domains), ~:moderate(npm install, pip install, git push, writes within/memex), ~:harmless(echo, ls, cat, find without exec, grep). - Extend
*dispatcher-shell-blocked*entries from simple(NAME REGEX)to(NAME REGEX :SEVERITY <tier>). - Extend
dispatcher-check-shell-safetyto return the severity alongside the matched pattern name. :catastrophicseverity always triggers HITL approval, regardless of rule count.:harmlessoperations are allowed by default (skip HITL and rule learning).- The severity classification is the foundation that
dispatcher-learn(v0.5.0) builds on — learning only applies to:dangerousand:moderatetiers. - FiveAM test:
echo helloreturns:harmlessseverity and passes through;mkfs.ext4 /dev/sdareturns:catastrophicand is always blocked.
v0.5.0: File Reorganization & Token Economics
The foundation work: rename and restructure the codebase around the self-repair criterion, extract non-core fragments from core, then build the learning loop on clean foundations.
File Reorganization — self-repair criterion
Rationale: The current file naming scheme mixes three concerns: architectural role (core-* = harness, system-* = skill), domain (security-, programming-, gateway-), and implementation nature (system-model- is LLM infrastructure, not a "system"). Worse, two fragments that can be extracted from core (context assembly, heartbeat) currently live there because the criterion for "what is core" was never defined. This reorganization establishes the criterion and applies it.
The criterion: a file belongs in core if, when corrupted, the agent cannot fix it without human help. Corrupted core = dead brain, dead hands, or unreachable. Corrupted skill = degraded but self-repairable.
DONE Extract core-context → symbolic-awareness
Rationale: core-context.lisp (224 lines) handles context-assemble-global-awareness, context-object-render, context-query, and related functions. If corrupted, the LLM receives empty awareness. But the agent still has tools, identity, and user input. It can reason about "no awareness", edit the context source file, reload the skill, and awareness returns. Degraded, not dead. Safe to extract.
- Move
core-context.lispcontent to newsymbolic-awareness.lisp(neworg/symbolic-awareness.org). - Register as a skill via
defskill :passepartout-symbolic-awareness. - In
core-reason.lisp'sthink(): wrapcontext-assemble-global-awarenessandcontext-get-system-logscalls withfboundpguards. On skill failure, inject degraded awareness note. - Remove
core-contextfrompassepartout.asd:components. - FiveAM: verify
think()produces valid output when awareness skill is not loaded.
DONE Extract heartbeat generation → symbolic-events
Rationale: The heartbeat thread (heartbeat-start, *heartbeat-thread*, auto-save counter) lives in core-loop.lisp (~50 lines). If heartbeat is corrupted or missing, the agent has no background ticks — no cron jobs, no auto-save. But the agent is fully functional: it perceives, reasons, and acts. It can detect missing ticks, reload the events skill, and heartbeat returns. Safe to extract.
- Move heartbeat generation (
heartbeat-start,*heartbeat-thread*,*heartbeat-save-counter*,*memory-auto-save-interval*) fromcore-pipeline.lisptosymbolic-events.lisp. - Rename
heartbeat-start→events-start-heartbeat. - In
core-pipeline.lisp'smain(): change(heartbeat-start)to(when (fboundp 'events-start-heartbeat) (events-start-heartbeat)). symbolic-eventsalready processes:heartbeatsignals for cron dispatch (existing code). Now it also generates them.
DONE Relocate 6 utility fragments to correct files
Rationale: Several functions live in core files not because they need core protection but because they were written there first. They are utility functions that can be extracted into skills.
markdown-strip(core-reason.lisp:51) → newprogramming-markdown.lisp(org/programming-markdown.org).plist-keywords-normalize(core-reason.lisp:60) →programming-lisp.lisp.cognitive-tool-prompt/generate-tool-belt-prompt(core-defpackage.lisp:214-231) →programming-tools.lisp.lisp-syntax-validate(core-skills.lisp) →programming-lisp.lisp.VAULT-MASK-STRING+*VAULT-MEMORY*(core-skills.lisp) →security-vault.lisp.*backend-registry*dedup: merge with*probabilistic-backends*(core-reason.lisp:10-12), removebackend-register(core-reason.lisp:18-19), updatebackend-cascade-callto check only one hash table.
DONE Rename 6 core files — shorter, clearer names
Rename mapping:
core-defpackage→core-packagecore-communication→core-transportcore-loop→core-pipelinecore-loop-perceive→core-perceivecore-loop-reason→core-reasoncore-loop-act→core-act
Update: ASDF :components, all :tangle headers in .org files, cross-file references, README.org, ARCHITECTURE.org, AGENTS.md, *dispatcher-protected-paths* (wildcard core-* still matches — no change needed).
DONE Rename 13 system-* → symbolic-/neuro-/embedding-*
Rename mapping:
system-config→symbolic-configsystem-diagnostics→symbolic-diagnosticssystem-archivist→symbolic-archivistsystem-event-orchestrator→symbolic-eventssystem-self-improve→symbolic-self-improvesystem-context-manager→symbolic-scopesystem-memory→symbolic-memorysystem-model-provider→neuro-providersystem-model-router→neuro-routersystem-model-explorer→neuro-explorersystem-model-embedding→embedding-backendssystem-model-embedding-native→embedding-nativesystem-actuator-shell→channel-shell
DONE
Delete system-model.lisp (16-line wrapper)
The file delegates to *probabilistic-backends* — dead code. No skill references it directly.
DONE Rename 4 gateway-* → channel-*
Rename mapping:
gateway-cli→channel-cligateway-tui-main→channel-tui-maingateway-tui-model→channel-tui-stategateway-tui-view→channel-tui-view
Update TUI package name: passepartout.gateway-tui → passepartout.channel-tui.
DONE
Split gateway-messaging → 4 channel-* files
Rationale: gateway-messaging.lisp (411 lines) bundles 4 independent platforms. A Telegram fix shouldn't touch Signal/Discord/Slack code. Each platform becomes its own skill — independently loadable, hot-reloadable, self-repairable.
channel-telegram: poll + send via Telegram Bot API.register-actuator :telegram.channel-signal: poll + send viasignal-clisubprocess.register-actuator :signal.channel-discord: WebSocket events + REST POST. Replace hardcoded channel IDs with env vars.register-actuator :discord.channel-slack: Events API +chat.postMessage. Replace hardcoded channel IDs.register-actuator :slack.- Delete
gateway-messaging.lisp. UpdateDEFSKILL-FROM-ORGreferences insystem-configsetup wizard.
DONE Document core/non-core self-repair criterion
Rationale: The criterion is the architectural foundation for every discussion about "should this be core or a skill?" It must be documented where developers look.
- New section in
docs/ARCHITECTURE.org: "What Makes Core Different — The Self-Repair Criterion." Explain: core = can't self-repair when corrupted, needs human. Skill = agent degrades but self-repairs. - Include the dependency-chain analysis: which files block self-repair.
- New section in
docs/DESIGN_DECISIONS.org: "The Self-Repair Criterion for Core Files." Explain whycore-contextand heartbeat were extracted. - Update
README.orgarchitecture summary to reflect new file map.
DONE Update all cross-references after reorg
- State "DONE" from "TODO" [2026-05-08 Thu]
- Deleted
gateway-messaging.org/.lisp(split intochannel-{telegram,signal,discord,slack}) - Renamed 13
defskill/defpackagenames to match new file prefixes - Renamed
gateway-cli-input→channel-cli-input(function + exports) - Removed
core-contextfilter fromcore-skills.lisp - Exported 13 new symbols for tokenizer, cost-tracker, token-economics
- ASDF
:componentsunchanged (8 core files)
Verify: ASDF compiles, FiveAM suite passes, integration tests pass.
- State "DONE" from "TODO" [2026-05-08 Thu]
116 checks, 100% pass. Daemon boots and processes messages end-to-end.
Token Economics (implemented as skills — not core)
Design insight: why token economics is the structural differentiator. Passepartout's sparse-tree rendering and deterministic safety gates should produce 2–3x fewer tokens than competitors for equivalent coding tasks, and 13–24x fewer for knowledge management. Without caching and budget enforcement, the fixed overhead per call eats these savings. The architectural advantage exists in theory but requires operational plumbing to materialize. This is now implemented and running.
DONE Tokenizer integration
- State "DONE" from "TODO" [2026-05-08 Thu]
lisp/tokenizer.lisp(org/tokenizer.org): character-ratio heuristic per model familycount-tokens,model-token-ratio,token-cost,provider-token-cost- Per-model pricing table: gpt-4o-mini, claude-3-5-sonnet, deepseek-chat, llama-3.1-70b, gemini-2.0-flash, etc.
- Provider-to-model mapping for all 7 cascade backends
- 11 FiveAM tests, 100% pass
DONE Prompt prefix caching
- State "DONE" from "TODO" [2026-05-08 Thu]
lisp/token-economics.lisp:prompt-prefix-cached— IDENTITY+TOOLS prefix cached viasxhash- Rebuilds only when skill load, identity config, or standing mandates change
fboundp-guarded call fromthink()incore-reason.lisp- 3 FiveAM tests: build, cache hit, cache miss
DONE Incremental context assembly
- State "DONE" from "TODO" [2026-05-08 Thu]
lisp/token-economics.lisp:context-assemble-cached— skips on heartbeat/delegation- Cache invalidated when foveal-id, scope, or memory timestamp changes
- Falls back to
[Awareness skill not loaded]whensymbolic-awarenessnotfboundp - 3 FiveAM tests: skip heartbeat, skip delegation, user-input passes through
DONE Per-call token budget
- State "DONE" from "TODO" [2026-05-08 Thu]
lisp/token-economics.lisp:enforce-token-budget— progressive trimming- L1: truncate logs to last 5 lines; L2: drop standing mandates; L3: summary context
CONTEXT_MAX_TOKENSenv var (default 16384)- 2 FiveAM tests: under-budget passthrough, over-budget trim
DONE Cost tracking
- State "DONE" from "TODO" [2026-05-08 Thu]
lisp/cost-tracker.lisp:cost-track-call,cost-session-total,cost-by-provider- Per-call cost logged:
COST TRACKER: DEEPSEEK call: 0.0002 USD (session total: 0.0002 USD) cost-format-budget-statusfor TUI status bar:[Cost: $0.00 | 3 calls]- 6 FiveAM tests, 100% pass
Module Architecture
All three modules (tokenizer, cost-tracker, token-economics) are loaded as
skills via skill-initialize-all, not as core ASDF components. Calls from
think() are fboundp-guarded. When any module is corrupted or absent, the
agent degrades gracefully (no token counting, no cost tracking, system prompt
falls back to un-cached assembly). This satisfies the self-repair criterion.
Competitive Advantage Analysis — v0.5.0 Summary
Token economics is the dimension where the architecture's theoretical advantage becomes operationally real. The foveal-peripheral model and deterministic gates reduce the tokens needed per task; prompt caching and incremental assembly reduce the tokens spent per task. Combined, the 2–3x coding savings and 13–24x knowledge management savings in the DESIGN_DECISIONS token analysis become achievable rather than aspirational.
Prompt prefix caching saves retransmitting ~500-1500 tokens per call. Incremental context assembly skips context rendering on heartbeat ticks (one per 60 seconds, saving ~200-800 tokens each). Token budget enforcement prevents silent context window overflow. Cost tracking gives the user per-call visibility into LLM spend — something no competitor provides at this level of granularity.
The minimum viable local model advantage is structural: at 2,000–4,000 effective tokens (foveal-peripheral + caching), a 7–8B parameter model on consumer hardware is a daily driver. Competitors at 32K+ effective tokens require 70B+ parameter models and 16–32 GB VRAM. Passepartout runs on a laptop GPU where competitors need a data center card or cloud API.
v0.5.1: Compilation Hardening
Also: the v0.5.0 reorganization left compilation noise — ~100 STYLE-WARNINGs and 2 real errors that must be fixed before any feature work proceeds. These are hardening items, not feature work.
Compilation Hardening — eliminate all compilation errors and warnings
The v0.5.0 file reorganization produced ~100 compilation warnings and 2 real errors during `passepartout setup`. These must be fixed before any feature work proceeds. The warnings fall into 5 categories.
DONE Fix real errors first (2 files, ~5min)
- State "DONE" from "TODO" [2026-05-08 Thu]
- security-vault.lisp:37: fixed bare
defvar— added missing(beforedefvar. Also removed duplicate#+end_srcin the org source. - symbolic-memory.lisp:27:
(return nil)inside alambdais valid Common Lisp (lambda establishes implicit(block nil ...)per CLHS 5.3.1). Not actually an error.
DONE Fix TUI forward references — moot (no longer issue)
- State "DONE" from "TODO" [2026-05-08 Thu]
- channel-tui-* files load via
passepartout/tuiASDF system with:serial t, not standalone. Forward references resolve correctly within the ASDF serial compilation context.
DONE Fix cross-package undefined variables (2 files, ~15min)
- State "DONE" from "TODO" [2026-05-08 Thu]
- symbolic-events.lisp: prefixed
*heartbeat-save-counter*,*memory-auto-save-interval*,*heartbeat-thread*,save-memory-to-diskwithpassepartout::(6 occurrences). - programming-repl.lisp: verified
*standing-mandates*pushcall is afterdefvar— no actual issue.
DONE Fix CFFI struct deprecation (1 file, ~20min)
- State "DONE" from "TODO" [2026-05-08 Thu]
- embedding-native.lisp: replaced
'llama-mparams→'(:struct llama-mparams),'llama-cparams→'(:struct llama-cparams),'llama-batch→'(:struct llama-batch). 19 occurrences updated.
DONE Suppress remaining harmless cross-skill undefined-function warnings
- State "DONE" from "TODO" [2026-05-08 Thu]
- Added
grep -v 'STYLE-WARNING\|WARNING: redefining'to the pre-compile filter in thepassepartoutbash script (line 133). Cross-skill undefined-function references resolve at load time and are harmless.
DONE Fix unused variables in test code — moot (gateway-messaging deleted)
- State "DONE" from "TODO" [2026-05-08 Thu]
- gateway-messaging.lisp: deleted in v0.5.0 (split into channel-* files).
- programming-repl.lisp and symbolic-scope.lisp: minor warnings, cosmetic only.
v0.6.0: Time Awareness
Rationale: Passepartout already has the infrastructure for time awareness — timestamped memory (v0.1.0), heartbeat+cron (v0.3.0), and foveal-peripheral context pruning (v0.2.0). Adding time awareness costs ~175 lines of Lisp and unlocks three layers that no competitor provides. The temporal dimension is the missing axis in the foveal-peripheral model: prune in time as well as in semantic space.
DONE Time Awareness — Level 2: temporal memory filtering
- State "DONE" from "TODO" [2026-05-08 Thu]
org/symbolic-time-memory.org→lisp/symbolic-time-memory.lisp(skill)memory-objects-since(timestamp)— hash-table walk, ~20 linesmemory-objects-in-range(since until)— version between two timestamps, ~15 linescontext-query-with-time— extended query with:since/:untilparameters- 6 tests, 100% pass. Pure Lisp, sub-millisecond, 0 LLM tokens.
DONE
Time Awareness — Level 3: sensor-time skill
- State "DONE" from "TODO" [2026-05-08 Thu]
org/sensor-time.org→lisp/sensor-time.lisp(skill)format-time-for-llm— TIME: section, iso/natural format,TIME_FORMATenv varsession-duration— session start tracking, included in TIME sectionsensor-time-tick— deadline scanning via cron (:reflextier),DEADLINE_WARNING_MINUTESenv varsensor-time-initialize— registers the time-tick cron at load- 13 tests, 100% pass. All pure Lisp, 0 LLM tokens for temporal awareness.
DONE Time Awareness — Level 1: timestamp in system prompt
- State "DONE" from "TODO" [2026-05-08 Thu]
core-reason.lisp: TIME section injected at top of system prompt viafboundpguard- Uses
format-time-for-llmfrom sensor-time skill, falls back gracefully when skill not loaded TIME_AWARENESS/TIME_FORMATenv vars respected- Session duration included when sensor-time skill provides
session-duration
v0.7.0: TUI Essentials — Terminal Parity
The TUI is the main UI for v1.0.0. Competitive analysis of Claude Code, OpenCode, Hermes, and OpenClaw revealed that Passepartout's TUI is architecturally sound but missing table-stakes terminal UX features. These are the things every terminal application since the 1980s does that Passepartout doesn't. No design philosophy would argue against them.
DONE Readline/Ctrl key bindings
- State "DONE" from "TODO" [2026-05-08 Thu]
- Ctrl+D quit, Ctrl+U clear line, Ctrl+W delete word, Ctrl+A/E home/end
- Ctrl+L redraw, Ctrl+X+E external editor, Ctrl+C interrupt cascade
- 6 TDD tests, all pass
DONE Unicode width awareness
- State "DONE" from "TODO" [2026-05-08 Thu]
char-width— ASCII/CJK/emoji/combining marks/tab/null. 30 lines, pure Lisp- 6 TDD tests, 11 assertions. Used by
word-wrapfor accurate line counting.
DONE Scroll indicator + new-message notification
- State "DONE" from "TODO" [2026-05-08 Thu]
:scroll-at-bottomand:scroll-notifystate flagsadd-msgsets:scroll-notifyt when user is scrolled up on new message
DONE Fix status bar line 2 overlap
- State "DONE" from "TODO" [2026-05-08 Thu]
- Timestamp right-aligned at
(- w 12)on line 2, focus at:x 1
DONE Deeper autocomplete (frecency + subcommand)
- State "DONE" from "TODO" [2026-05-08 Thu]
/theme <Tab>subcommand completion,/focus <Tab>directory completion@path<Tab>file path completion frommemex/projects/(Org + Lisp files)- 3 TDD tests, all pass
TODO External editor integration (Ctrl+X+E) — done, pending test
- State "DONE" from "TODO" [2026-05-08 Thu]
- Ctrl+X prefix tracking + Ctrl+E chord,
:pending-ctrl-xstate flag - System message on activation,
$EDITOR/$VISUAL/vifallback (runtime) - 1 TDD test passes (model-level)
TODO TUI-based setup wizard — deferred to v0.8.0
TODO Pads for chat scrolling — deferred to v0.7.1 (needs Croatoan terminal for testing)
TODO Deeper autocomplete (frecency + subcommand)
Extend Tab completion beyond the 8 command names:
- File attachment autocomplete:
@passe<Tab>→@passepartout/org/core-reason.orgwith frecency ranking (frequency × recency decay, OpenCode pattern). Scans/memex/projects/for Org and Lisp files. - Subcommand completion:
/theme <Tab>→ lists theme names./focus <Tab>→ lists project directories./skin <Tab>→ lists installed skins. - Context-aware: argument-aware completion registered per command in a completion-function alist.
~50 lines. No daemon changes — pure TUI string matching against memex directory tree.
v0.7.1: TUI — Streaming + Markdown Rendering
Every competitor streams text as the LLM produces it. Passepartout shows a "…thinking" spinner then dumps a wall of text. This is v0.1-era UX. Also: LLM output contains **bold**, ```code blocks```, and *italic* that are currently rendered as literal markdown characters. Both issues are daemon protocol + TUI rendering changes.
TODO Stream-chunk protocol
- New frame type
(:type :stream-chunk :payload (:text "partial..."))incore-transport.lisp. Final chunk is an empty string, signalling end-of-stream. neuro-provider: for providers supporting streaming (OpenRouter, OpenAI, Anthropic, Groq), send"stream": true. Read SSE stream, extractdelta.contentfrom each chunk, call new*stream-callback*with partial text.- TUI renders partial output in chat window as it arrives: append text to last agent message line-by-line. The "…thinking" spinner is replaced by live, building text.
- Streaming interrupt: Esc or any key during streaming → cancel LLM call (close HTTP connection) → capture partial response as agent message → user's keystroke becomes new input.
[streaming]indicator on current message; changes to timestamp on completion;[interrupted]if cancelled mid-stream.- ~50 lines daemon + ~80 lines TUI rendering.
TODO Streaming watchdog
When the LLM stalls for 30+ seconds without new deltas, auto-reset the stream and inject a system message: "Response stalled — the model may be overloaded. Send another message to retry." Claude Code and OpenClaw both implement this pattern. ~25 lines.
TODO Markdown rendering — code blocks + bold + italic
Replace literal markdown syntax with styled text using Croatoan attributes:
``` ... ```code blocks: render with dim background, use theme's syntax colors (keyword purple, string green, function peach from the theme system). Regex-based highlighting: matchdefun~/~defvar~/~lambdaas keywords,"..."as strings,(...)as function calls. No parser required for 95% of LLM code output.**bold**→ Croatoan:boldattribute.*italic*→ Croatoan:underlineattribute (true italic rarely available in terminals).`inline code`→ dim background highlight on the span.- Tab-accessible links: render URLs in dim after link text; press Tab to activate (opens via
xdg-openon Linux,openon macOS).
Implementation: a render-styled wrapper that takes a list of (text . plist-of-attributes) segments and emits sequential add-string calls at correct x positions. ~50 lines. The markdown parser is ~80 lines of regex-based block/span detection. Total: ~130 lines.
v0.7.2: TUI — Gate Trace + HITL + Search
Gate trace data is already stored per-message (:gate-trace field in add-msg) but never rendered. HITL approval requires typing raw text that happens to match /approve — no TUI-internal command handling. Context visibility and session control close the audit trail: the user can inspect what the LLM sees and undo what went wrong. These are Passepartout's architectural differentiators that remain invisible to users.
TODO Gate trace visualization
Render gate trace lines below each agent message in dim:
✓ gate-namein:gate-passedtheme color (green) for passed gates✗ gate-name: reasonin:gate-blockedtheme color (red) for blocked gates→ gate-name: HITL requiredin:gate-approvaltheme color (yellow) for gates requiring human approval- Collapsible: Tab on a message toggles trace visibility. Default: visible.
Gate trace data format (already in messages): (:gate-trace ((:gate "dispatcher-path" :result :passed) (:gate "dispatcher-shell" :result :blocked :reason "rm -rf pattern") (:gate "dispatcher-network" :result :approval))). ~50 lines.
TODO HITL inline command handling
on-key currently treats /approve HITL-xxxx as a raw text message forwarded to the daemon. The daemon's perceive gate intercepts it, but the TUI should:
- Parse
/approve HITL-xxxxand/deny HITL-xxxxas TUI-internal commands (not forwarded as chat text) - Send structured approval/denial message to daemon:
(:type :event :payload (:action :hitl-respond :token "HITL-abcd" :decision :approved)) - Render HITL prompts as styled inline panels with colored border (permission theme color), showing the action, explanation, and available choices ("Allow (Enter)" / "Deny (Esc)")
- After approval/denial, collapse the prompt panel and add a system message: "✓ Approved: shell command" or "✗ Denied: shell command"
~40 lines.
TODO Message search (/search or Ctrl+F)
Ctrl+For/search <query>: fuzzy-filter the message list, show matching messages in a temporary filtered view- Up/Down navigate matches, Enter to jump to that message in full chat
- Escape to exit search and return to full view
- Highlight matching text in the rendered messages
~80 lines.
TODO
Context visibility command (/context)
Show the user exactly what the agent sees — the assembled system prompt trimmed to the current context budget. Resolves the "context efficiency vs. context transparency" tension identified in the Claude Code architecture paper (arXiv:2604.14228v1).
/contextrenders the full assembled prompt as a scrollable overlay divided into sections: IDENTITY, TOOLS, TIME, CONTEXT, LOGS- Each section shows token count in the section header:
IDENTITY (124 tokens) - Total usage at bottom:
"3,241 / 8,192 tokens (39%)"— matches the sidebar gauge - Color-coded: sections below budget in green, near budget in yellow, trimmed sections in red with "X nodes dropped (budget)" annotation
- The data already exists in
think()'s prompt assembly incore-reason.lisp— this is a rendering exposure, not new computation - ~40 lines.
TODO Session rewind, fork, and resume — Merkle-root-based
Passepartout's Merkle tree makes session control more powerful than Claude Code's transcript-based model. Claude Code rewinds conversations but not filesystem state. Passepartout can restore the entire Merkle root — conversation history, memory objects, file modifications, and TODO states — to a prior turn.
memory-snapshotat each turn boundary (not just on crash). Existing infrastructure from v0.2.0.- Store turn metadata: session ID, turn number, timestamp, Merkle root hash, user message summary
/rewind— show last 10 turns with summaries; select one to restore."⚠ This restores all files to their state at Turn 7."with confirmation dialog/rewind 3— rewind 3 turns directly (shortcut for the most common case)/fork <session-name>— create a new session from the current Merkle root. Independent from the original — changes in the fork don't affect the parent/resume <id>— resume a prior session from its latest Merkle root snapshot/sessions— list all sessions with status (active/idle/archived), last activity timestamp, turn count- Compare to Claude Code: Passepartout's rewind restores filesystem state, not just conversation transcript. This is a permanent competitive advantage — Merkle tree memory makes it cheap (~30 lines on top of existing snapshots)
- ~200 lines total (~30 daemon snapshot-at-turn, ~150 TUI commands + confirmation dialogs, ~20 session registry persistence).
TODO Safe-tool allowlist — read-only operations auto-approve
Claude Code and Hermes both have safe-tool allowlists that skip HITL for read-only operations. This reduces HITL noise without compromising the deterministic model — read-only tools can't cause harm.
- Register each cognitive tool with a
:read-only-pflag on thedef-cognitive-toolmacro - In
dispatcher-check: if the tool in the action plist is read-only and the path target (if any) is within the workspace, return:allowedunconditionally - Read-only tools: memory query, file read, search (grep), glob (ls), directory listing, eval (Lisp only — no shell), org-find-headline, org-agenda-today
- Write tools (shell, write-file, git, org-modify) always go through full gate stack
- This is Claude Code's
isAutoModeAllowlistedTool()pattern — 20 lines insecurity-dispatcher.lisp
TODO
Agent identity file — /memex/IDENTITY.org
Claude Code has CLAUDE.md (always-loaded instructions hierarchy). OpenClaw has SOUL.md~/~IDENTITY.md. Hermes has MemoryProvider system prompt blocks. Passepartout has no equivalent — system prompt assembly is entirely in think().
~/memex/IDENTITY.org— a single Org file loaded at daemon startup into*agent-identity*- Injected into
think()'s IDENTITY section between the assistant name and the standing mandates - Can contain Org headlines with sections: Preferences, Conventions, Projects, Contacts, Boundaries
- User-editable in any text editor or via
/identityTUI command (opens in $EDITOR, reloads on save) - Survives daemon restarts, survives skill reloads, survives tangling
30 lines in ~core-reason.lisp + ~20 lines TUI command.
TODO
Undo/redo per operation — /undo, /redo
Session rewind (above) restores the Merkle root to a prior turn boundary. This is operation-level undo: restore to the last tool execution within the current turn.
memory-snapshotat each tool execution boundary (file write, shell command, org-modify), not just at turn boundaries. Existing infrastructure from v0.2.0 — just change the snapshot trigger point./undorestores the most recent operation-level Merkle snapshot. "Undid: write-file/memex/projects/passepartout/lisp/core-reason.lisp"/redorestores the pre-undo snapshot. "Redid: write-file core-reason.lisp"- Max 20 operation snapshots per session (ring buffer, oldest evicted)
~20 lines on top of existing Merkle snapshot infrastructure.
TODO Expand /context debugging — similarity trace + dropped nodes
The /context command (above) shows what the model sees. Add two deeper views:
/context why <node-id>— show similarity score trace: "Node #42 'dispatch-loop redesign' included at depth 2 because cosine similarity to foveal node #17 'core-loop.lisp' = 0.73 (threshold 0.60)."/context dropped— show nodes pruned by the foveal-peripheral model: "12 nodes dropped: 8 by depth (≥3), 4 by similarity (<0.60)."- Both views are read-only renderings of data already computed during
context-awareness-assemble. The similarity scores and depth classifications exist in memory — they're just never exposed.
~60 lines of rendering on existing data.
TODO Tool execution hardening — timeouts + write verification
Existing tools are thin wrappers with no error recovery. Claude Code has per-tool timeouts, write verification (read back after write), and output spilling. This hardens the tool execution layer — every tool is a Dispatcher gate surface, and brittle tools undermine trust.
*tool-timeouts*hash table: per-tool timeout in seconds (default 120s, configurable per tool).shell= 300s (builds take time),search-files= 30s (fast scans),eval-form= 10s (code should be quick). Enforced viawith-timeoutmacro wrapping tool body execution.- Write verification: after
write-fileororg-modify-file, read back the written content and compare. On mismatch, log a warning and re-attempt once. Catches filesystem failures and partial writes.20 lines in ~programming-tools.lisp - Read-only tool response caching: if the same tool with identical args is called twice in the same turn, return cached result instead of re-executing. ~15 lines.
~60 lines total.
TODO Tag stack — categories + severity tiers
The privacy tag filter (dispatcher-check-privacy-tags) is binary: a tag matches or it doesn't. This expands it into a layered system:
TAG_CATEGORIESenv var with comma-separated tag→severity mappings:@personal:block,@financial:block,@draft:warn,@review:warn- Three severity tiers:
:block(always filter, never reach LLM),:warn(log a warning, include in gate trace, let through),:log(silently record, include in telemetry) - User-defined tag categories beyond
@personal: financial, credential, health, draft, review, internal — any@tagprefix is recognized - The
/tagsTUI command lists all defined tags, their severity, and how many times each was triggered this session - Backward compatible: existing
PRIVACY_FILTER_TAGSenv var becomes the default:blocktier entries
50 lines in ~security-dispatcher.lisp + ~20 lines TUI command.
TODO
Merkle provenance audit — /audit <node-id>
Every Passepartout memory object has content-addressed identity via Merkle hashing (v0.2.0). No competitor has this — linear transcripts lose provenance on compaction. Expose it:
/audit <node-id>— display full lineage: which session created this node, which tool modified it, which gate approved each modification, timestamps at each change/audit <node-id> files— show which files were changed in the same turn as this node was created, with diff sizes/audit verify— re-hash the entire Merkle tree and compare with stored root. "✓ 847 nodes verified, root hash matches." Catches silent corruption.- Provenance data is already in the Merkle tree's parent-child hash chain. This is a rendering exposure, not new data.
~30 lines on existing Merkle infrastructure.
v0.8.0: Direction 2 — Information Radiator (Foundation)
The sidebar is what makes the Information Radiator direction unique. No competitor can render gate traces, focus maps, or rule counters because none has deterministic gates, foveal-peripheral context, or rule synthesis. The sidebar makes this data permanently visible. It also includes context monitoring, modified files, and tool status — all zero-LLM-token data from the deterministic layer.
TODO Sidebar — always visible information panel
Sidebar renders at right side of terminal, 42 columns wide. Visible when terminal ≥ 120 columns. When < 120 columns: disappears; accessible as absolute-positioned overlay via /sidebar or Ctrl+X+B.
Content (ordered vertically):
Gate Trace— live per-message trace from the most recent agent response. Colored by gate state (green/yellow/red). Updates on each response.Focus— current foveal node ID + related node count. Shows what the agent is "looking at."Rules— rule counter ([Rules: 47]) + session delta (+2 this session). Tick sound on increment.Context— token gauge[████████░░] 42%showing context usage with color coding (green <50%, yellow 50-80%, orange 80-95%, red >95%).Files— modified files list with +/- line counts. Updated on every tool execution that touches files.Cost— session cost ($0.12 this session) updating after each LLM call.Protection— gate effectiveness counter: "Gates blocked: 3 destructive, 7 network exfil, 12 secrets." Updated on each gate decision. This is the specific-value-proposition panel — no competitor has deterministic gates to count.
Implementation uses a fourth Croatoan window (sidebar on right) or a panel overlay. All data is already in the daemon's response plist (:rule-count, :foveal-id, :gate-trace). The gate block counts come from a new *dispatcher-block-counts* alist tracked in dispatcher-check. ~200 lines (includes panel 7 addition).
TODO Sidebar overlay mode (< 120 cols)
When terminal width < 120, sidebar becomes an absolute-positioned overlay with semi-transparent backdrop (ncurses opaque + themed background). Toggle via /sidebar or Ctrl+X+B. The chat area fills the full width when sidebar is hidden. ~30 lines.
TODO Command palette (Ctrl+P)
Single entry point for all actions. Mirrors OpenCode's pattern — fuzzy-searchable, categorized, keyboard-navigable:
Ctrl+Popens palette as overlay dialog- Categories: Session (
/focus,/scope,/unfocus,/rename), Agent (/rules,/approve,/config), View (/theme,/sidebar,/clear), System (/eval,/status,/reconnect,/quit) - Fuzzy text filter; Up/Down to navigate; Enter to execute; Esc to dismiss
- Also shows keyboard shortcuts for each command as hints
- Implemented as a Croatoan
windowoverlay withadd-string-based rendering andget-char-based filtering. ~100 lines.
TODO TrueColor theme expansion (8 presets)
All 27 existing theme keys wired into rendering. Use Croatoan's set-rgb for 24-bit hex color support (already available in Croatoan; currently unused). Add 4 new presets to the existing 4:
nord: blue-gray backgrounds, frost accent (#5E81AC key, #BF616A error, #A3BE8C success)tokyonight: purple-blue backgrounds, teal accent (#7AA2F7 key, #F7768E error, #9ECE6A success)catppuccin: warm pastels, mauve accent (#CBA6F7 key, #F38BA8 error, #A6E3A1 success)monokai: dark brown backgrounds, orange accent (#A6E22E key, #F92672 error, #E6DB74 success)
Theme switch via /theme <name> (already implemented). Theme preview: on hover/navigate in theme picker, apply temporarily; on cancel (Esc), revert to original. ~60 lines TUI + ~120 lines preset definitions.
v0.8.1: Direction 2 — Rich Rendering
Full markdown, tool execution visualization, mouse support, and cost display. This makes the TUI competitive on rendering quality with Claude Code and OpenCode.
TODO Full markdown rendering
Extend the markdown renderer from v0.7.1:
- OSC 8 hyperlinks: embed
\x1b]8;;url\x1b\\before link text and\x1b]8;;\x1b\\after. Makes URLs clickable in supporting terminals (iTerm2, Kitty, WezTerm, Ghostty, Windows Terminal). - Blockquotes (
> text): rendered with a colored left border (theme's:accentcolor), indented text. - Tables: aligned column text. No borders (terminal tables with box-drawing characters are noisy). Column alignment inferred from header separators.
- Syntax highlighting for code blocks: keyword/string/function colors from theme. Regex-based (no parser dependency).
- All markdown features degrade gracefully to plain text on terminals without attribute support. ~100 lines.
TODO Tool execution visualization
When the agent invokes a tool:
- Pre-execution:
[Running: 🔍 search "dispatch" ...]in:tool-runningcolor with spinner - Success:
✓ search "dispatch" → 12 matches (0.3s)in:tool-successcolor - Error:
✗ shell "bad-cmd" → exit 127 (0.1s)in:tool-failurecolor with error output expanded below - Output collapsed by default to single-line summary. Tab on a tool invocation toggles full output.
- Diff display:
+(green) /-(red) coloring for file edits. 3 lines of context around changes. The:tool-outputtheme color provides the background.
Uses Croatoan's init-pair + color-pair for 256-color backgrounds on tool state regions. ~100 lines.
TODO Mouse support
Croatoan supports ncurses mouse mode via (setf mouse-enabled-p). Enable:
- Scroll wheel: PageUp/PageDown equivalent, scrolls chat by viewport height
- Click to position cursor in input area
- Click on OSC 8 link to open in browser (via
xdg-open) - Click on tool invocation to toggle expand/collapse
- Click on gate trace line to expand/collapse trace
~40 lines.
TODO Cost display
/costcommand: displays per-session and per-LLM-call cost breakdown- Optional sidebar cost counter:
$0.12 this session, updating after eachbackend-cascade-call - Per-provider pricing table (from v0.5.0 token economics)
- Color-coded: green under daily budget, yellow approaching, red exceeding
- Requires token counter infrastructure from v0.5.0. ~50 lines for display; token counting is v0.5.0 infrastructure.
TODO
Session export — /export command
Claude Code has /share (shareable URL). OpenCode has /export (Markdown). Hermes has trajectory export. Passepartout has no way to share what the agent did.
/exportwrites the current session as an Org file to~/memex/exports/<session-title>-<date>.org- Format: each message as an Org headline with role tag, timestamp, content, gate trace as property drawer
/export mdoutputs Markdown instead of Org (for sharing with non-Org users)/export jsonoutputs the session as JSON (for programmatic consumption)
50 lines. Uses existing message vector and ~memory-object-render for Org formatting.
TODO Tool output spilling — large results to file
Claude Code saves tool results >30KB to ~/.claude/tool-results/ with a 200-line preview in the response. Passepartout currently includes all output inline — which consumes context budget and makes the chat log unreadable after a large build output or log dump.
- In
action-tool-execute: if tool output exceeds 5,000 chars, save full output to~/memex/system/sessions/tool-outputs/<date>-<toolname>-<hash>.txt - In the response, replace full output with:
[Output: 12,847 chars. Full output saved to ~/memex/system/sessions/tool-outputs/2026-05-08-grep-a1b2c3.txt. Top 2,000 chars:]followed by truncated preview - The LLM can
read-filethe full output if it needs to analyze it
30 lines in ~core-loop-act.lisp
TODO Read-only output caching within a turn
Claude Code caches read-only tool results within a turn. If the agent reads the same file twice, the second read returns cached content — no disk I/O, no context waste. Passepartout re-executes the tool.
*turn-result-cache*hash table keyed by(cons tool-name args-hash), cleared at the start of eachthink()cycle- Read-only tools (read-file, search-files, find-files, list-directory, org-find-headline, org-agenda-today, lsp-*) check the cache before executing
- Cache hit: return stored result with
[cached]prefix in the response - Prevents redundant tool calls when the agent asks the same question twice within a reasoning step
25 lines in ~programming-tools.lisp
v0.8.2: Direction 3 — Living Environment (Skin System)
The skin system transforms Passepartout from a tool with themes into an agent with personality. Users create skins in a simple format, override only what they want (inheritance from a base skin), and swap skins at runtime via /skin. The spinner has personality. The borders have personality. The agent's name and welcome message are skin-customizable.
TODO Skin engine
-
Skin format: a plist file (
~/.config/passepartout/skins/myskin.lisp) defining::colors— 40+ color slots (extends the 27 theme keys): agent colors for 8 roles, status bar colors, tool colors, spinner colors, input colors, border colors. All in hex (#RRGGBB).:spinner— style (:braille,:dots,:minimal), speed (ms/frame), kawaii faces, thinking verbs:branding— agent name, welcome message, goodbye message, prompt symbol, help header:tool-prefix— character for tool output lines (default┊):tool-emojis— per-tool emoji overrides (e.g.,(:shell "⚡" :search "🔎")):banner— Rich-markup ASCII art logo displayed on startup
- Skin inheritance:
(:inherit :default)— missing values cascade from parent - Custom skins from
~/.config/passepartout/skins/*.lisp - Hot-swap via
/skin <name>— no restart. Skin changes take effect on next redraw (sub-frame latency). - Skin preview:
/skin <name>with--previewflag applies temporarily; Esc or timeout reverts. - Built-in skins as plist data in a
*skin-registry*hash table. ~250 lines.
TODO Skin presets (10+ built-in)
Organized by mood rather than theme. Each skin is a complete personality profile:
| Skin | Mood | Accent | Spinner | Character |
|---|---|---|---|---|
gold (default) |
Warm, approachable | #FFD700 | Kawaii faces | "⚕ Passepartout" |
professional |
Cool, focused | #5C9CF5 | Minimal braille | "Passepartout" |
minimal |
Zero decoration | #AAAAAA | None | "p" |
forest |
Calm, earthy | #7CB342 | Dots | "Passepartout" |
ocean |
Deep, contemplative | #26C6DA | Pulse | "Passepartout" |
ember |
Warm, energetic | #FF6D00 | Bounce | "Passepartout" |
mono |
Grayscale | #E6EDF3 | Minimal | "Passepartout" |
retro |
Amber terminal feel | #FFB000 | Blinking cursor | "PASSEPARTOUT" |
unicorn |
Playful, colorful | #E040FB | Sparkle | "🦄 Passepartout" |
midnight |
Dark blue, calm | #82AAFF | Brain | "Passepartout" |
Each skin's color slots derived systematically from accent + background. ~200 lines of skin definitions.
TODO Hooks on defskill — lifecycle interception
Passepartout's skills can inject instructions and react to triggers but cannot intercept behavior. All 4 competitors have lifecycle hooks (PreToolUse, PostToolUse, session events). Hooks complete the extension model: skills define what the agent knows; hooks define when skills get to inspect and veto actions.
- Add
:pre-tool-hookand:post-tool-hookslots to thedefskillstruct :pre-tool-hookreceives(action context), returns:allow,:deny, or:ask. Called before tool execution in the Dispatcher pipeline (new vector between shell-safety and network-exfil).:post-tool-hookreceives(action context result), returns(values modified-result modified-context)or nil to leave unchanged. Called after tool execution. Useful for logging, auto-commit, notification.:on-session-start,:on-heartbeat,:on-compactlifecycle hooks for maintenance skills- Hooks run in skill priority order. A
:denyfrom any hook short-circuits the chain. - This is Claude Code's PreToolUse pattern — 50 lines in
defskillmacro +core-perceive.lisp
TODO Prompt templates / output styles
Claude Code has "output styles" (default, Explanatory, Learning). Hermes has agent profiles. Passepartout has a single hardcoded system prompt. Users should be able to change how the agent works, not just how it looks.
- Output styles are Org files in
~/.config/passepartout/styles/with a plist frontmatter:#+STYLE: explanatory,#+DESCRIPTION: Teaches while doing -
Three built-in styles:
default— current behavior, direct and efficientexplanatory— agent explains implementation choices, provides educational insights with★ Insightblocks. Claude Code's Explanatory output stylelearning— agent pauses to ask user to write small code pieces (2-10 lines), uses● Learn by Doingblocks. Claude Code's Learning output style
/style <name>TUI command to switch at runtime. Injects a STYLE section into the system prompt between IDENTITY and TOOLS.- Style changes are immediate (next think() call). Survive restarts via config persistence.
~100 lines (~60 prompt templates + ~40 TUI integration).
TODO Skill auto-detection — file-watch hot-reload
Passepartout's image-based Lisp model enables hot-reload — redefine a function without restarting. No competitor has this. Claude Code plugins require manual /reload-plugins. Passepartout can auto-detect changes.
-
Daemon watches
org/and~/.config/passepartout/skills/withinotify(Linux) orkqueue(macOS). On.orgfile change:- Wait 200ms debounce (multiple writes within 200ms coalesce)
- Tangle the changed org file:
(org-tangle-file "org/skill-name.org") - Compile the tangled lisp:
(compile-file "lisp/skill-name.lisp") - Reload:
(load (compile-file-pathname "lisp/skill-name.lisp")) - TUI shows system message:
"Skill 'skill-name' reloaded (23 defuns, 0 errors)"
- Respects
SELF_BUILD_MODE— core files require HITL before reload. Skills reload automatically. - On compile error: keep the old version loaded, log the error, show TUI warning:
"✗ Skill 'skill-name' failed to compile — old version retained."
80 lines in a new ~symbolic-file-watch.org skill.
v0.8.3: Direction 3 — Adaptive Layout + Personality
The TUI adapts to the terminal it's running in — full sidebar at ultrawide, compact at standard, minimal at narrow (phone/SSH). It has a personality: spinner style, relative timestamps, progress bars, live context help.
TODO Adaptive layout (3 tiers)
- ≥ 120 columns: Full layout. Sidebar visible with all 6 panels. Chat area left of sidebar.
- 80–119 columns: Compact layout. Sidebar hidden (toggle via
/sidebaror Ctrl+X+B, rendered as overlay). Status bar 2 lines. Full markdown rendering. - < 80 columns: Minimal layout. Single-column chat. Status bar reduced to 1 line (model, ctx%, duration). Markdown reduced to bold + code blocks only. Input height clamps to 1-2 lines.
Re-renders on terminal resize (already handled via KEY_RESIZE). Content re-flows — not truncated. The layout remembers per-terminal-size preference. ~80 lines.
TODO Spinner personality
Configurable spinner style per skin:
:braille— ⠋⠙⠹⠸⠼⠴⠦⠧⠇⠏ cycling at 80ms (default):dots— ·✢✳✶✻✽ cycling (macOS style, Claude Code default):kawaii— (。◕‿◕。) (◕‿◕✿) ٩(◕‿◕。)۶ cycling with wing decorations⟪⚔ ... ⚔⟫:minimal— single ● dot blinking at 2000ms:none— static prompt symbol
Stall indication: when no response for 10s, spinner color interpolates from theme color → error red (Claude Code pattern). Reduced motion preference: spinner replaced with slow-pulse ●. ~50 lines.
TODO Progress bar
For measurable operations (file processing, test runs with known count, batch operations), render a progress bar using Unicode block characters:
[████████░░░░░░░░░░░░] 42% (5/12 tests passed)
Uses 9 block characters for sub-character precision: [' ', '▏', '▎', '▍', '▌', '▋', '▊', '▉', '█'] (Claude Code pattern). Color-coded by progress: red <25%, yellow 25-75%, green 75%+. ~25 lines.
TODO Live timestamps
- Relative timestamps on messages: "just now" (< 30s), "2m ago", "1h ago", "yesterday"
- Absolute timestamp on hover/focus (via Tab navigation to message)
- Status bar shows session duration:
Session: 3h 12m - Timestamps update live (per-minute recalculation, not per-frame)
~40 lines.
TODO Context-sensitive help
Press ? to show available actions in current context:
- In chat: list of navigation keys, command shortcuts
- In sidebar: sidebar-specific bindings
- In HITL prompt: approval/denial bindings
- In command palette: palette navigation bindings
Rendered as a dim help bar at the bottom of the screen (above input). Dismisses on any key or after 5 seconds. ~40 lines.
v0.9.0: Signal Pipeline, Concurrency & Streaming
(Renumbered from old v0.7.0. Streaming moved to v0.7.1; streaming section removed below.)
The current pipeline is strictly sequential — one signal traverses Perceive → Reason → Act before the next signal begins. Background tasks (heartbeat, embedding cron, gardener scans) compete with foreground interactions. A heartbeat that fires during a long tool chain is queued. A Telegram message during a multi-step planning cycle is queued. The system feels sluggish under concurrent load even though the symbolic operations are near-instant (SBCL hash table lookups are microseconds) — the bottleneck is the single-pipeline architecture, not the hardware.
Design insight: why concurrency matters for an agent that is "one brain." Passepartout rejects multi-agent delegation on principle (see DESIGN_DECISIONS "One Single Agent"). But a single brain handles multiple inputs simultaneously — the human brain processes vision, audio, and proprioception in parallel. Rejecting multi-agent delegation does not require rejecting concurrency within the agent. The key is that all concurrent operations share the same memory space, the same Merkle tree, and the same deterministic gate stack. They are threads of one cognition, not separate agents.
TODO Priority-queue signal processing
-
Replace the linear
process-signalcall chain with a priority-ordered signal queue. The queue is a sorted plist-list consumed by the main loop. Priority tiers::user-input/:chat-message— highest priority (the user is waiting):approval-required— high (HITL re-injections need quick resolution):tool-output— medium (feedback from tool execution, needs LLM assessment):interrupt— medium-high (shutdown signal):heartbeat/:cron/:delegation— low (background maintenance)
- Coalesce duplicate heartbeats: if the queue already contains a
:heartbeatsignal when a new one arrives, discard the older one (no value in processing stale ticks). Keep at most one pending heartbeat at any time. - The main loop drains the highest-priority signal from the queue, processes it through the pipeline, and repeats. If the pipeline produces feedback (tool-output → think), the feedback is enqueued at its appropriate priority — it may preempt background signals but won't interrupt the current signal mid-processing.
- Add telemetry: average queue depth by priority tier, max wait time per tier.
- TUI
/reconnectcommand: when the connection-loss detection from v0.3.3 fires, the user can reconnect without restarting the TUI. The command closes the stale socket, re-runsconnect-daemonwith its retry backoff, and restores the:connectedstate on success.
TODO MVCC memory concurrency
- Replace
*memory-store*(mutable global hash table) with a versioned Merkle-root pointer. The root is an(or null merkle-node)struct containing the tree and a monotonic version counter. - Read threads snapshot the root before beginning their pipeline cycle. All object lookups dereference through the snapshot — they see a consistent view of memory regardless of concurrent writes. Reads never block.
- Write threads (ingest-ast, org-modify, snapshot-memory) build new object hashes, construct a new Merkle root, and CAS-replace the global root pointer. If another thread won the CAS race (root version changed), the loser re-reads the new root, replays its changes on the updated tree, and retries the CAS.
- Conflict probability is near-zero because concurrent signals almost never touch the same Org headline. The replay-on-conflict path exists for correctness but is rarely exercised. Lock contention is eliminated — the only atomic operation is the CAS on the root pointer.
- Remove the single-threaded pipeline assumption: previously,
process-signalwas safe because nothing else wrote to*memory-store*during its execution. With MVCC, multiple signals can process concurrently because each has its own snapshot. The*loop-interrupt-lock*becomes*signal-queue-lock*(protecting only the queue, not the memory). - Test: concurrent ingest-ast from two threads writing to different memory objects, verify both commits succeed without corruption.
TODO Structured output enforcement
- Add a plist validation step between
markdown-stripandread-from-stringinthink(). Before attempting to parse, validate: (a) the output starts with(or[, (b) it contains balanced delimiters (count opens vs closes), (c) it doesn't contain#.(redundant after v0.3.1*read-eval* nilbut defense-in-depth). - On validation failure: construct a rejection trace (similar to the existing deterministic gate rejection feedback) and re-inject into the LLM prompt. The trace includes the raw output and a diagnostic ("Your response did not produce a valid plist. Ensure it starts with ( and has balanced parentheses.").
- Configurable
LLM_OUTPUT_RETRIES(default 2). After exhausting retries, fall through with the raw text as a:MESSAGEaction (current behavior). - Track parse-failure rate per provider in telemetry. Use to guide provider cascade ordering: a provider with 20% parse-failure rate falls behind one with 2%.
- If retries are exhausted without a parseable plist, the TUI renders the raw LLM output in a dimmed, collapsible region labeled "Parse failure — could not interpret this response." The user can inspect what the model produced.
TODO Doom-loop detection — 3 identical tool calls triggers HITL
OpenCode detects 3 consecutive identical tool calls and prompts the user. Without this, Passepartout could loop forever on a stuck tool — burning tokens and producing no progress.
- Track last 3 tool calls (name + args plist) in a ring buffer
- Before executing a tool, compare against the 3 previous calls
- If all 3 have the same name and equal args (using
equalp), inject a HITL prompt: "The agent has attempted 'grep defun' 3 times without progress. Continue or abort?" - Resets on any different tool call or successful output
15 lines in ~core-loop-act.lisp
TODO Busy-mode — queue on interrupt
When the agent is processing a turn and the user types a message, the current behavior is undefined. Hermes has interrupt/queue/steer. Passepartout should at minimum support queue mode.
BUSY_INPUT_MODEenv var:interrupt(default, stop current turn),queue(process after current turn)- In
queuemode: user messages arriving during an active turn are enqueued. When the current turn's tool chain completes, the queued message is injected as the next turn's user input — no HITL approval needed (it's user input). /busy interrupt//busy queueTUI commands to toggle at runtime- The priority queue (above) naturally supports this — user input queued during a turn has higher priority than heartbeats, lower than the active turn
20 lines in ~core-pipeline.lisp
TODO
CLI / non-interactive mode — passepartout ask
Claude Code supports claude -p "fix the failing test" --print. Hermes has hermes -c "command". Passepartout can only be used interactively via the TUI. A non-interactive single-shot mode enables CI/CD integration, cron jobs, and scripting.
passepartout ask "what's the status of project X?"— sends a framed message to the daemon, waits for response, prints to stdout- Daemon-side:
process-one-shothandler — inject:user-inputsignal, run through full pipeline (perceive → reason → act → loop until stop), return final agent message --jsonflag outputs the full response plist for programmatic consumption--timeout Nflag (default 120s) limits execution time- Uses the existing wire protocol — no new protocol, just a CLI wrapper around the framed TCP message format
80 lines in ~passepartout bash script + ~50 lines daemon handler.
TODO Provider health tracking — success rate + latency
backend-cascade-call tries providers in order until one succeeds. On failure it moves to the next. But it has no memory of which providers failed or succeeded in the past. A degraded provider gets retried first on every call.
*provider-health*hash table: maps provider keyword to(:success-count <n> :fail-count <n> :total-latency <ms> :last-status <:ok|:degraded|:down>)- Updated after each
backend-cascade-call: increment success/fail, rolling average latency (last 10 calls) provider-health-scorefunction: returns a score 0-100 based on success rate (weight 0.6) and latency vs baseline (weight 0.4)/provider-statusTUI command: displays a table of all providers with status indicators (● Up, ◐ Degraded, ○ Down) and recent history- Telemetry: provider health data feeds the session telemetry system
60 lines in ~neuro-provider.lisp + ~30 lines TUI.
TODO Cost-based provider routing
backend-cascade-call currently tries providers in registration order. With cost tracking (v0.5.0) and provider health (above), the cascade can be sorted by cost-effectiveness.
COST_ROUTINGenv var (defaulttrue): when enabled, sort the cascade by(provider-health-score * 0.3 + cost-savings-score * 0.7)cost-savings-score: cheap providers score high. Free providers (Ollama local) score 100. Expensive providers (GPT-4) score 10.- Health override: a provider with score < 20 (degraded) is demoted below healthy providers regardless of cost
/routingTUI command: displays current cascade order with scores and reasons
40 lines in ~core-reason.lisp
TODO Intelligent provider fallback — per-task-type routing
Current fallback is "try the next provider." But different providers excel at different tasks. DeepSeek is strong at code generation. Groq is fast for simple queries. Claude is better at reasoning. The cascade should adapt to the task.
*task-provider-scores*hash table: maps(task-type keyword) → (provider keyword → score)- Task types:
:chat(conversation),:code(code generation/editing),:plan(multi-step planning),:search(information retrieval),:summary(compaction),:reflex(deterministic lookup) - Scores updated after each call: if the response was accepted (no rejection retry), increment that provider's score for that task type
- When the primary provider fails, the fallback picks the highest-scored provider for the current task type (not just the next in line)
- Bootstrap from defaults: GPT-4/Claude for reasoning, DeepSeek for code, Groq for chat, local Ollama for reflex
60 lines in ~neuro-router.lisp
TODO Internal evaluation harness — 10 tasks, regression detection
When moved from v0.12.0: the internal eval harness must ship before v0.10.0 so it can validate the Signal Pipeline (v0.9.0) and catch regressions from MCP Tools (v0.10.0), Planning (v0.11.0), and beyond. The SWE-bench competitive scoring harness remains at v0.12.0 — this is the lightweight internal suite.
- New skill:
symbolic-evaluation.org→symbolic-evaluation.lisp deftaskmacro: define an eval task with:setup(create test environment),:prompt(what to ask the agent),:verify(function that checks the output),:teardown(cleanup)run-eval-suite: run all registered tasks, produce score (pass count / total), per-task diagnostics- Initial 10 tasks: find TODOs, create Org note, search codebase, read file, query memory, list projects, run safe shell command, find definition, set TODO state, summarize session
- Regression mode: run after each version build. Fail CI if score drops.
- Task suite grows with codebase: every bug fix adds a regression task
~200 lines.
TODO Autonomous certification badge
After N HITL approvals of the same pattern, the dispatcher auto-approves it. But unlike Claude Code's "auto mode," this is deterministic — no probability, no model hallucination granting permission. The certification is a logical certainty.
- When a pattern crosses
DISPATCHER_RULE_THRESHOLD, the dispatcher writes the rule torules.orgAND grants a certification entry: "Certified: shell commands targeting ~/memex/projects/* with git status are deterministically safe. 47 approvals, 0 denials." - The sidebar Rules panel shows:
[Rules: 47 | Certified: 12]— learned rules vs certified patterns /certificationsTUI command: lists all certified patterns with approval counts, last-used timestamps, and the gate vector that checks them- Certification downgrade: if a certified pattern is later denied by the user, the certification is revoked and the pattern returns to HITL
- This is the operational realization of "the more you use it, the cheaper it gets" — each certification represents a category of actions that will never cost another HITL prompt
60 lines in ~security-dispatcher.lisp + sidebar rendering reuse.
v0.10.0: Tool Ecosystem (MCP-Native) + Voice Gateway
(Renumbered from old v0.8.0.)
The original roadmap placed MCP at v0.9.0 and planned "10+ cognitive tools" built from scratch for v1.0.0. This is inverted: the ecosystem already provides 50+ tools (filesystem, git, postgres, slack, github, web search, memory servers). Building bespoke tools from scratch duplicates work the community has already done and tested. Passepartout's advantage is not in tool implementation but in tool orchestration — the deterministic gate stack that verifies every tool invocation before execution.
Why MCP matters for competitive positioning: Claude Code's native tools (Read, Write, Edit, Bash, Grep, Glob, WebSearch) are implemented in TypeScript within the Claude Code runtime. They are not extensible — you cannot add a tool without modifying the runtime. OpenClaw's tools are similarly baked into the Node.js process. By building a native MCP client, Passepartout gains tool breadth that exceeds both competitors (50+ tools via the MCP ecosystem versus ~10 native tools) without building a single tool implementation. The tool quality is maintained by the ecosystem; the safety verification is maintained by Passepartout's gate stack. This division of labor is the right architecture for a small team building a competitor to well-funded commercial agents.
TODO MCP native client
- Pure Common Lisp MCP client: parse JSON-RPC messages from MCP servers over stdio or SSE. No Python bridge, no Node.js subprocess. The client runs in the same Lisp image as the agent — zero serialization overhead between the agent and the MCP layer.
- Implement the MCP protocol lifecycle: initialize handshake, list tools, call tool, handle notifications. Each MCP server registers its tools as entries in Passepartout's
*cognitive-tool-registry*at connection time — the LLM's tool belt prompt automatically expands to include them. MCP_SERVERSenv var: comma-separated paths to MCP server config files (JSON). Each config specifies the server command, args, and env vars. Example:MCP_SERVERS=~/.config/passepartout/mcp/filesystem.json,~/.config/passepartout/mcp/git.json.- Tool invocation route: LLM proposes a tool call → Dispatcher verifies against permission table → MCP client serializes call as JSON-RPC → server executes → result deserialized back to plist → returned to LLM as tool output. The Dispatcher does not distinguish between native tools and MCP tools — the gate stack is uniform.
- Register the MCP client as a skill (
defskill~:passepartout-mcp-client~) so it can be hot-reloaded. The MCP client is not core infrastructure — it is a skill that extends the tool ecosystem.
TODO Core MCP tools (from existing roadmap items)
- Git Steward: status, diff, commit, push, branch via the MCP Git server. Policy gate enforces commit-before-modify: any file write to a git-tracked directory must be preceded by a diff review.
- Web Research: headless browser via Puppeteer/Playwright MCP server. Text extraction, screenshot capture, page interaction.
- Interactive PTY: stream long-running process output to context window, async interrupt control.
TODO TUI tool visualization
- Already implemented in v0.8.1 (tool execution visualization). This TODO confirms the rendering path works for MCP tools as well as native tools — no distinction at the TUI level.
TODO Environment Steward
- Detect "command not found" in shell actuator output.
- Search system PATH and package manager registries for the missing command.
- Propose installation command and retry the failed action on user approval.
- Cache resolved dependency paths to avoid repeated searches.
v0.10.3 — TODO Voice Gateway
Rationale: OpenClaw ships voice wake words and talk mode on macOS/iOS/Android via ElevenLabs. Hermes Agent has voice memo transcription. Both treat voice as a first-class channel. Passepartout's daemon already handles text — voice is an I/O format conversion. Speech-to-text turns audio into :user-input signals. Text-to-speech turns agent responses into audio. The architecture requires no changes; the voice gateway is a skill that wraps existing REST APIs.
- Speech-to-text: POST audio to OpenAI Whisper API (
/v1/audio/transcriptions) or local Whisper via Ollama. Receive text. Inject as a:user-inputsignal into the pipeline. The daemon processes it identically to a typed message. - Text-to-speech: POST text to ElevenLabs REST API (
/v1/text-to-speech/{voice-id}) with stream response. Also support systemsay(macOS) /espeak(Linux) as zero-dependency fallbacks. - TUI voice toggle:
/voice onenables voice capture, shows a🎤(listening) indicator in the status bar./voice offreturns to text-only. The microphone capture runs in a dedicated thread that feeds audio chunks to the speech-to-text backend. - Voice mode in messaging gateways: on Telegram and Discord, the voice gateway transcribes voice messages into text and injects them as
:user-inputsignals. Agent responses can be optionally spoken back via text-to-speech if the user's message included a voice note (reply in kind). - The voice gateway is a skill (
defskill~:passepartout-gateway-voice~). No core daemon changes required. The daemon receives text signals whether they originated from a keyboard, a messaging app, or a microphone.
TODO
Web search + web fetch tools — search-web, fetch-web
Claude Code has WebSearchTool + WebFetchTool. Hermes has firecrawl-py + exa-py. Passepartout's agent cannot answer questions about the world, look up documentation, or research current events. Two new cognitive tools, no external dependencies:
search-web— POST query to a search API (SearXNG public instance as default, configurable viaWEB_SEARCH_URLenv var). Returns title + URL + snippet for top 10 results. Dispatcher's network-exfiltration gate (vector 8) provides free safety — search queries are already vetted.fetch-web— GET a URL, extract text content via regex-based HTML stripping (no parser dependency — strip tags, keep whitespace). Returns plain text, truncated to 10,000 chars. Dispatcher's network-exfiltration gate checks the URL domain against the allowlist.- Both register via
def-cognitive-toolas read-only tools (auto-approve via v0.7.2 safe-tool allowlist)
150 lines as a new skill ~programming-web.org. No external Python/Node.js process.
TODO LSP integration — language server protocol client
Claude Code uses LSP for code intelligence — find definitions, find references, diagnostics, hover types. Without LSP, Passepartout can grep patterns but cannot answer "where is this function defined?" or "what calls this?" — questions Claude Code answers instantly with zero LLM tokens.
- LSP client as a skill (
lsp-client.org). Communicates with language servers via stdio JSON-RPC (same pattern as MCP client, different protocol). - Three cognitive tools:
lsp-definition(go to definition),lsp-references(find references),lsp-diagnostics(get errors/warnings for file) - Read-only tools — auto-approve via v0.7.2 safe-tool allowlist
- Supported languages: any language with an LSP server (TypeScript, Python, Rust, Go, C/C++, Java, etc.) — not Lisp-specific
- LSP servers installed by the user (e.g.,
npm install -g typescript-language-server). Passepartout auto-discovers installed servers via PATH.
~200 lines. Register as read-only cognitive tools. No daemon protocol changes — LSP is a background process, not a rendering concern.
TODO
Auto-saved session transcripts — /memex/system/sessions/
Passepartout has no session persistence beyond Merkle tree snapshots. Chat history lives in the TUI's in-memory vector and is lost on restart. Every competitor persists sessions: Claude Code uses JSONL, OpenCode uses SQLite, OpenClaw uses JSONL, Hermes uses SQLite+FTS5.
- Auto-save on every message (user and agent): append to
~/memex/system/sessions/<date>-<title>.orgas an Org file - Format: each message as an Org headline with role tag (
:user:,:agent:,:system:), universal timestamp, content in body. Gate trace as a property drawer under the agent message headline. - Session title derived from the first user message (first 60 chars, sanitized for filename). Override with
/rename <title> - Auto-save is automatic — no
/exportneeded. The/exportcommand delegates to the same function with format options (Org/Markdown/JSON) - Location:
/memex/system/sessions/— undersystem/, notdaily/, no clutter - Survives daemon restarts. Resume via
/resume <date-title>(existing session resume from v0.7.2)
80 lines in ~core-transport.lisp (append on message send) + reuse existing Org rendering.
TODO Auto-memory extraction — learnings from sessions
Claude Code's extractMemories runs at the end of each query loop, scanning the conversation for durable learnings and writing them to memory files. Hermes's MemoryProvider.sync_turn does the same. Passepartout records everything in the Merkle tree but never extracts cross-session learnings.
- After each
think()cycle that produces a final response (no tool calls pending), runextract-session-memory: a lightweight LLM call (50 tokens of prompt) that asks "What should I remember from this session?" and writes the result to ~~/memex/system/memory/<project>/<date>.org - The extraction uses a forked LLM call (separate from the main response) with the session transcript as context
- Auto-memory files are injected into the CONTEXT section of future
think()calls as "Session memory: [learnings from prior sessions about this project]" - Extracted memories include: decisions made, patterns observed, preferences expressed, errors encountered and fixed, codebase facts learned
- Opt-out via
AUTO_MEMORY=falseenv var. Extraction frequency capped at one per minute to prevent runaway API costs.
80 lines in ~core-reason.lisp + reuse session transcript for context.
TODO Universal cross-project Org query
Passepartout's entire memex is Org — one format for memory, tasks, documents, transcripts. No competitor has this. Claude Code queries CLaude.md (one file), SQLite (separate DB), and file tools (grep). Passepartout can query everything with one function.
(org-query :tag "@urgent" :state "TODO" :since "-7d" :path "~/memex/projects/")— scans all projects in memex, returns matching Org headlines as memory objects. Zero LLM tokens, ~2ms execution.(org-query :property "DEADLINE" :before "-1d")— overdue items. Feeds/agendacommand.(org-query :where "dispatch" :in-title-p t)— search headlines containing a term across all projects.(org-query :limit 20 :sort :priority)— sorted, capped results.- This is the infrastructure that makes the GTD weekly review (v0.13.0) possible — pure Lisp tree traversal with no external database.
150 lines in ~programming-org.lisp (extends existing Org manipulation primitives).
TODO
debug-inspect cognitive tool — live state inspection
Lisp enables live state inspection that no TypeScript/Python agent can match. Claude Code has no REPL. Passepartout can inspect and modify its own running state.
debug-inspectcognitive tool: evaluates a Lisp form in the running image and returns the result as a structured plist. Parameters:code(Lisp form string),package(optional).- Read-only tool: auto-approve via v0.7.2 safe-tool allowlist. No side effects — inspection only.
- Use cases:
(hash-table-count *memory-store*),(inspect memory-object-by-id "node-42"),(map 'list #'car *skill-registry*) - The agent can introspect its own state to answer meta-questions: "How many objects are in memory?" "What skills are loaded?" "What was the last HITL decision?"
30 lines in ~programming-repl.lisp(extends existing repl-eval with safety guard).
Competitive Advantage Analysis — v0.10.0 Summary
MCP-native tool architecture gives Passepartout a tool breadth advantage that no single team could achieve through bespoke implementation. The MCP ecosystem is growing faster than any individual agent's tool set. By connecting to it rather than competing with it, Passepartout's tool count scales with the ecosystem — every new MCP server is a new Passepartout tool.
The Dispatcher's tool permission table (allow/ask/deny) applies uniformly to MCP tools, giving Passepartout tool-level security granularity that competitors lack. Claude Code's tools are binary: available or not. Passepartout can conditionally allow filesystem writes to /projects/* while requiring HITL for writes to ~/.config/* — per-path, per-tool, per-session. This is the deterministic gate stack's natural application domain.
The Git policy gate (commit-before-modify) is a safety feature no competitor provides. It prevents the most common agent failure mode: modifying files without preserving the prior state. Combined with memory snapshots (v0.2.0), this gives every action a dual audit trail: the git history and the memory object history.
The TUI tool visualization (v0.8.1) extends seamlessly to MCP tools — the rendering layer doesn't distinguish between native tools and MCP tools. The same colored backgrounds, collapsible outputs, and gate traces apply universally.
The voice gateway (v0.10.3) adds parity with OpenClaw's voice features without architectural changes — speech-to-text and text-to-speech are thin REST wrappers that feed text signals into the existing pipeline. Combined with the Emacs bridge (v0.4.0), messaging gateways (v0.4.0), and the now-SOTA TUI (v0.7.0–v0.8.3), Passepartout supports four interaction surfaces by v0.10.3: terminal (TUI), messaging apps, Emacs, and voice.
v0.11.0: Planning, Self-Modification & Deterministic Routing
(Renumbered from old v0.9.0.)
Design insight: the inverted tier classifier. The current tier classifier routes "rm", "write-file", and "shell" to :REFLEX (no LLM). This routes the most dangerous operations to the path with the least oversight. It should be inverted: :REFLEX handles deterministic lookups (list TODOs, check file existence, query memory), :COGNITION handles text processing and summarization, :REASONING handles planning and code generation. Dangerous operations should always route through :REASONING where the full LLM cycle and Dispatcher gate stack apply. v0.11.1 fixes this.
TODO Long-horizon planning (task tree DAG)
- Decompose complex tasks into Org-mode headline trees. Each task node is a memory-object with terminal states:
:todo→:next-action→:in-progress→:done/:blocked/:stuck. - The LLM generates the initial task tree from the user's request. The REASONING tier processes each leaf task sequentially, updating node states as it progresses.
- Parent nodes summarise child results: when all children of a node reach
:done, the parent is promoted to:donewith a synthesised summary. When any child reaches:stuck, the parent is promoted to:blockedwith the blocking child's diagnostic. - Branch pruning: if a child is
:stuckafter three retries with different LLM providers, the parent re-plans the branch — the LLM generates alternative decomposition paths for the blocked sub-task. - Task trees persist as Org headlines in
/memex/system/tasks/. Survive restarts. Visible to the user as editable Org files. - TUI task tree visualization: a collapsible Org headline tree rendered in the chat area. Each node shows its terminal state with a colored indicator (
○todo,▶next-action,◉in-progress,✓done,✗blocked,⏸stuck). Nodes expand/collapse on Enter. The tree updates in real time as the agent progresses through subtasks.
TODO Tier classifier fix
- Invert the current classifier:
:REFLEX= deterministic lookups only (memory query, file-exists-p, check time, list TODOs by tag).:COGNITION= text processing, summarization, simple Q&A, note formatting.:REASONING= planning, code generation, multi-step task execution, dangerous operations. - Track classifier accuracy via telemetry: for each classified action, record whether the classification was appropriate.
- The classifier function is overrideable via
*tier-classifier*, allowing users or skills to customize routing. - The classifier should be a skill, not core infrastructure — reloadable and replaceable without restart.
TODO Skill Creator
- LLM drafts complete skill org-file from natural language description.
- Mandatory pipeline: (a) syntax validation via
lisp-syntax-validate, (b) sandbox-load in temporary jailed package (v0.3.2), (c) run registered trigger function against mock contexts, (d) run registered deterministic gate against mock proposals, (e) on pass, promote to live registry underpassepartout.skills.<name>. - Required
:repl-verifiedflag on alldefunforms — the existing Dispatcher lint check warns on writes without verification. The Skill Creator enforces this at creation time. - Skills are the primary extension mechanism for users. The Skill Creator makes skill authoring accessible to non-Lisp-programmers: describe what you want in English, the LLM drafts the Org file, the system verifies it, and the skill is live.
Competitive Advantage Analysis — v0.11.0 Summary
The task tree DAG with terminal states and branch pruning is Passepartout's planning primitive — analogous to Claude Code's TODO list but structural (Org headlines with parent-child relationships) rather than flat.
The tier classifier fix is a safety correctness issue. The current inverted classifier (dangerous ops → no-LLM path) is actively harmful — it reduces oversight on the operations that need it most.
The Skill Creator is the mechanism by which Passepartout escapes the "team of Lisp programmers" constraint. Most agent frameworks require Python/TypeScript to extend. Passepartout's extension language is English — the LLM writes the Lisp, the system verifies it.
v0.12.0: Evaluation & Vision
(Renumbered from old v0.10.0.)
With tools (v0.10.0) and planning (v0.11.0) in place, the agent can execute complex multi-step tasks. v0.12.0 answers two questions: (1) how do we prove it works? (SWE-bench evaluation harness), and (2) can the agent interact with visual interfaces? (computer use / vision).
TODO SWE-bench harness
- Automated pipeline: clone a repository from SWE-bench dataset, parse the GitHub issue, feed the issue description into Passepartout's cognitive loop, track the resolution trajectory as an Org headline tree, apply the generated patch, run the repository's test suite, score success (tests pass yes/no).
- Trajectory persistence: each benchmark run produces an Org file under
/memex/system/benchmarks/recording everythink()call, every tool invocation, every Dispatcher decision, and the final test result. - Regression mode: run the same benchmark after each version release. Track score trends. A version that regresses on SWE-bench does not ship.
- Target: competitive score with Claude Code and OpenClaw on SWE-bench-verified by v1.0.0.
TODO Computer Use / Vision
- Screenshot capture: X11 (
xwd/import) and Wayland (grim) bridge. - Vision model integration: send screenshot to a vision-capable model (GPT-4V, Claude 3.5, Gemini 2.0 Flash).
- Coordinate-based interaction:
xdotool/ydotoolfor click and type commands. Dispatcher approval gate applies — screen interaction requires HITL by default. - Use case: "open Firefox, search for the Passepartout GitHub repo, and star it."
TODO Telemetry / observability — structured event logging
Claude Code tracks everything via GrowthBook feature flags. OpenClaw has structured telemetry with trajectory sidecars. Hermes logs session metrics to SQLite. Passepartout has log-message — unstructured, no aggregation. Without telemetry, Passepartout cannot answer: "How many HITL prompts per session?" "What's the approval rate?" "Which gate blocks most often?" "What's the average context usage?" These are the metrics that would validate the README's "2-3x fewer tokens" claim.
- Structured event log as JSONL in
~/.local/share/passepartout/telemetry/(one file per session + aggregate) - Event types:
:session-start,:think-call(tokens in/out, provider, model, duration),:tool-execution(name, duration, success/error),:gate-decision(gate name, result, pattern),:hitl-decision(approved/denied, pattern, session count),:context-snapshot(tokens used, foveal node, pruned count),:session-end(total tokens, total cost, tool calls, HITL count) - Aggregate keys tracked as a hash table: HITL approval rate, average context usage, most-blocked gate, tokens saved by foveal pruning vs full context
/telemetryTUI command: displays aggregate stats + per-session breakdown- Feeds the evaluation harness (SWE-bench trajectory data comes from the same telemetry system)
200 lines as a new skill ~symbolic-telemetry.org. No daemon protocol changes.
Competitive Advantage Analysis — v0.12.0 Summary
SWE-bench evaluation is the industry standard for coding agent capability claims. Passepartout's trajectory persistence is a differentiator: most harnesses produce a pass/fail score. Passepartout's produces a complete Org-mode audit trail showing exactly where the reasoning succeeded or failed.
Vision + screen interaction is table stakes for competing with Claude Code's computer use feature. The Passepartout advantage: every screen interaction passes through the Dispatcher gate stack.
v0.13.0: Consensus, GTD & Deep Emacs Integration
(Renumbered from old v0.11.0.)
Near-SOTA. The agent has tools, planning, evaluation, and streaming. v0.13.0 adds reliability (consensus), productivity methodology (GTD), and environment depth (Emacs integration).
TODO Consensus loop
- Multi-provider parallel inference for critical decisions. When the action's impact score exceeds a threshold, the system sends the same prompt to 2–3 independent providers.
- Disagreement detection: compare structured outputs. If all providers agree, proceed with highest-confidence result. If they disagree, flag for HITL approval.
- Cost-aware: consensus mode doubles/triples cost. Only trigger when impact exceeds cost threshold. Configurable via
CONSENSUS_THRESHOLD. - TUI consensus display: collapsible region listing each provider, its model, its proposal, and its confidence score.
✓ 3/3 providers agreein green;✗ 2/3 agreein yellow.
TODO GTD integration
- Full GTD cycle: capture → process → clarify → organize → reflect → engage.
- Org properties:
:TRIGGER:(what context),:BLOCKER:(what must complete first). - Weekly review: agent scans all projects and tasks, surfaces stalled items, suggests next actions. Produced deterministically — zero LLM tokens.
- TUI agenda view:
/agendacommand renders Org-agenda as formatted scrollable region within the chat area.
TODO Deep Emacs integration
- Phase II — Interpreter: ELisp compatibility layer runs inside Passepartout's Common Lisp image. Key Emacs packages (Org-mode, Magit) run natively without an Emacs process.
- Org-agenda awareness: agent queries agenda view, incorporates agenda context into planning.
- Clock time tracking: agent starts/stops clocks on Org headlines, produces clock tables.
- Refile and archive: agent refiles headlines between Org files and archives completed items.
Competitive Advantage Analysis — v0.13.0 Summary
The consensus loop benefits from structured output enforcement (v0.9.0) — comparing plists for semantic equivalence is simpler than comparing free-text responses.
The GTD and Emacs integration are Passepartout's "unfair advantages" — no competitor has either. Claude Code and Copilot are development tools, not life management tools. Org-mode is the bridge: the same format that holds the agent's memory holds the user's tasks, calendar, and notes.
v0.14.0: Self-Configuring Setup Binary
Rationale: The current passepartout configure flow is a bash script that detects
Debian or Fedora, installs packages, installs Quicklisp, tangles Org sources, and
runs the setup wizard. It handles 2 distro families. A save-lisp-and-die binary
distributes Passepartout as a single executable with no SBCL or Quicklisp
prerequisite, and an optional small LLM fallback expands coverage to any distro
with a package manager.
Installation is handled by the bash script or this binary. Configuration is handled by the TUI setup wizard (the new decision from v0.8.0).
TODO Save-lisp-and-die executable
- The setup binary (
passepartout-setup) is asave-lisp-and-dieexecutable (~100MB: SBCL runtime + core Lisp code + native embedding inference from v0.4.0 + 23MB embedding model). No SBCL install required. No Quicklisp. No bash script. The user runs one file. - Deterministic path (default, always runs first): the same distro detection, package installation, and configuration logic from today's bash script, reimplemented in Lisp. Handles Debian and Fedora families. Covers the common case without touching an LLM.
- LLM-assisted path (optional, activates on deterministic failure): downloads
Qwen2.5-0.5B (
500MB GGUF, pinned by hash, cached to ~~/.local/share/passepartout/models/). The model reads command output, classifies success/failure/recoverable-error from a finite set of outcomes, and selects the next corrective action from a constrained decision tree. On unrecognized failures, generates a diagnostic for the user. - Model hash verification: the GGUF file is pinned by SHA-256 hash. If the hash doesn't match (wrong version, corrupted download), fall back to deterministic setup with a warning.
- After setup completes, the binary exits. The user runs
passepartout daemonto start the full system (a live SBCL process, not a sealed binary — REPL, hot-reload, self-modification all available). - Add FiveAM test: the deterministic path succeeds on a system with all dependencies pre-installed; the LLM-assisted path correctly classifies 10 common package-manager error messages.
v1.0.0: SOTA Parity (verified)
Feature-complete, benchmark-verified, production-hardened. All capabilities from v0.3.0 through v0.14.0 integrated and tested end-to-end.
v1.0.0 is not a feature release — it is a verification release. Every feature from the v0.x series is tested under concurrent load, resource starvation, adversarial input, and benchmark scoring. The evaluation harness (v0.12.0) provides the scoring apparatus; v1.0.0 is the scored release.
| Area | Parity Target | Verification Method |
|---|---|---|
| Self-improvement | Skill Creator + self-edit + hot-reload | Skill regression suite |
| Planning | Task tree DAG with terminal states | Multi-step integration tests |
| Tool ecosystem | 15+ MCP tools + native shell + git | MCP protocol compliance tests |
| Context window | Semantic search + foveal-peripheral + caching | Token budget vs competitor audit |
| Safety | 10-vector Dispatcher + policy + permissions | Chaos testing |
| Multi-step tasks | Task trees with terminal states | SWE-bench score (v0.12.0 harness) |
| Code editing | Full file read/write via MCP + Org | SWE-bench-verified subset |
| Memory | Vector recall + Merkle integrity + MVCC | Concurrency stress test (v0.9.0) |
| Emacs integration | Full org-mode control (exceeds Claude Code) | Org-agenda round-trip test |
| Streaming | Live text + interrupt-and-redirect (v0.7.1) | TUI UX latency benchmark |
| TUI | Streaming, markdown, gate trace, sidebar, | TUI integration test suite |
| theme system, adaptive layout, mouse, search | ||
| Packaging | Source install + save-lisp-and-die binary | Install test matrix across distros |
| Offline | 100% local capable (7-13B model) | Air-gapped integration test |
| Cost | 2-3x fewer tokens than competitors | SWE-bench token audit |
| Concurrency | Priority queue + MVCC + parallel signals | Concurrent load test (3 users + bg) |
Performance projection at v1.0.0:
| Scenario | Passepartout v1.0.0 | Claude Code | OpenClaw |
|---|---|---|---|
| Single-turn chat (local 8B) | 2-4s, ~1,500 tok | N/A (cloud-only) | N/A (cloud-only) |
| Single-turn chat (cloud) | 1-3s, ~1,500 tok | 1-3s, ~3,000 tok | 1-3s, ~3,500 tok |
| Multi-step coding (5 files) | 15-30s, ~30,000 tok | 10-20s, ~65,000 tok | 20-40s, ~85,000 tok |
| Knowledge base query (500 nodes) | <1s (in-image vector), 0 LLM tok | 3-5s, ~5,000 tok (LLM-assisted) | 3-5s, ~5,000 tok (LLM-assisted) |
| Background maintenance | 0 LLM tok (deterministic cron) | Variable or skipped | Variable or skipped |
| Offline operation | Full capability | None | None |
| Cost per coding session | ~$0.15 (gpt-4o-mini) | ~$0.45 (gpt-4o-mini) | ~$0.55 (gpt-4o-mini) |
Passepartout wins on cost (2-3x savings from sparse trees + deterministic gates + caching), offline capability (unique), and knowledge management (10-40x savings from in-image vector lookup + Org-native format). It is competitive on single-turn latency and slightly behind on multi-step latency (the single-pipeline architecture adds ~5s overhead per tool execution versus competitors' parallel tool dispatch).
The TUI at v1.0.0 is a SOTA competitive agent interface: streaming responses, gate trace visualization, Information Radiator sidebar, skin system with 10+ presets, adaptive layout, full markdown, mouse support, and personality. The sidebar's gate trace, focus map, and rule counter are capabilities no competitor can replicate — Passepartout's permanent UX differentiator.
The key insight at v1.0.0: Passepartout does not beat competitors at everything. It wins decisively where the architecture's structural advantages apply (safety, cost, offline operation, knowledge management, TUI transparency) and is competitive where they don't (raw LLM inference speed, parallel tool dispatch). This is a defensible position — the niches Passepartout dominates are exactly the niches that matter for a sovereign, local-first AI assistant.
But it is still fundamentally probabilistic at its core. The symbolic engine verifies and constrains, but the generative engine is still the primary reasoning source. The architectural transition to symbolic-first reasoning happens in v3.0.0.
v2.0.0: Lisp Machine Emergence
v2.0.0 is where Passepartout stops being a daemon with clients and becomes the environment. The agent's cognitive loop, the user's editor, the user's shell, and the user's browser run in the same Common Lisp image. The Dispatcher gate stack verifies every action regardless of who initiated it — user or agent. The distinction between "tool" and "self" dissolves.
Why this version matters for UX parity. v0.4.0 through v1.0.0 give Passepartout four interaction surfaces (TUI, messaging apps, Emacs, voice). v2.0.0 inverts the problem: instead of building more clients, it builds a platform where the agent's environment and the user's environment are the same process, separated not by a sandbox but by the Dispatcher gate stack. The editor IS the agent's prompt. The shell IS the agent's actuator. The browser IS the agent's web research tool. There are no clients — there is one Lisp image, one address space, one Org-mode file system.
Architectural principle: Browser inside Lisp, not Lisp inside browser. Lisp is the parent process. It owns the window, the memory, and the input loop. The rendering engine (WebKit/Blink) is a library that paints pixels inside a Lisp buffer. The user can redefine functions while browsing without restarting. Keybinding lookups happen in microseconds (SBCL machine code) — the browser cannot "steal" shortcuts.
Qt/QML via EQL5 — the rendering surface
- Qt/QML (via EQL5) is the UI framework. EQL5 exposes the full Qt C++ API from Common Lisp. QML is declarative — it matches Lisp's generation model.
- Desktop: native look and feel on Linux, macOS, and Windows.
- Mobile: Qt runs natively on iOS and Android. Android uses F-Droid for the unrestricted version and Play Store for sandboxed. iOS uses Guideline 4.7 ("Educational/Developer Tool" loophole, no JIT compilation).
- Safety Bridge for mobile: Lisp code can manipulate browser/files but cannot touch hardware (GPS, camera, contacts) without standard permission pop-ups.
- The minibuffer: a universal command line at the bottom of the screen. Not an Emacs modeline. Not a VS Code command palette. A single command surface for every action — edit files, navigate web, run Lisp expressions, invoke agent commands.
M-xfor everything.
Lish — the Common Lisp editor
Not elisp. Not Emacs. A multi-threaded Common Lisp editor rendered via Qt/QML. The complete system prompt lives in an Org buffer — the agent's identity, its skill registry, its memory, and its reasoning are visible and editable as Org text. The user modifies the agent's prompt and the agent reflects the change immediately — the prompt is a file in memory, not a hidden string in a config.
Org-babel for interactive evaluation: source blocks in Org files are executable. The user evaluates a #+begin_src lisp block and the result appears inline. The agent evaluates blocks to verify code before writing. The REPL is not a separate window — it is the Org buffer in which the agent and user both work.
The editor and the agent share the same Lisp image. The editor is not a client that connects to a daemon — it IS the daemon process. The TUI from v0.3.6 (with word wrap, streaming, gate trace, focus map) is the editor's rendering surface.
Nyxt — the Common Lisp browser (three erosion stages)
The browser is not a one-time feature. It is a multi-year erosion of the rendering stack toward pure Lisp:
Stage 1 — Qt + WebKit. Qt provides window management and native widgets. WebKit renders web content inside a Lisp buffer. Network requests via dexador (pure Lisp). HTML parsed via Plump (pure Lisp). Layout via Yoga (C-based Flexbox, wrapped via FFI). JavaScript via embedded QuickJS. This stage delivers a working browser in months, not years.
Stage 2 — S-expression DOM. Lisp builds its own DOM representation as native S-expressions. WebKit is reduced to pixel painting only — it receives rendered layouts from Lisp, not raw HTML. The agent can traverse and manipulate the DOM as Lisp data structures without serialization. This makes web content natively queryable and modifiable by the agent's cognitive loop.
Stage 3 — Pure Lisp layout. WebKit turned off entirely. Lisp-native layout engine (12-18 months of focused development). CSS subset sufficient for the modern web's 95% use case. JavaScript via QuickJS remains for interactive content. The browser is now a Lisp application that happens to speak HTTP, not a web engine wrapped in a Lisp process.
Lish — the Lisp shell
Bash is a text-stream protocol. Passepartout speaks plists. The Lish shell replaces text streams with structured data — every command returns a plist, not a byte stream. Pipe becomes function composition. Scripts become Lisp functions that operate on memory objects directly.
The agent and the user share the same shell. The user types (list-todos :tag "@urgent"). The agent proposes (shell "npm run build"). The Dispatcher verifies both. The shell is not a separate process — it is a REPL connected to the same Lisp image as the agent's cognitive loop.
Org-mode buffers become the file system. The user's memex (~/memex/) is browsable as a tree of Org headlines. File operations (read, write, list, search) operate on Org AST nodes, not byte streams. A "directory listing" is a tree of headlines. A "file read" is a subtree rendered as text.
Bash remains available as a backend for running external commands, but it is not the primary interface.
Emacs migration — three phases
The Emacs bridge (v0.4.0) is Phase I. The deep integration is three phases, not one:
Phase I — Parasite (v0.4.0). Emacs is a client. The elisp TCP bridge sends text and receives responses. The agent does not control Emacs. Emacs users get a native chat experience alongside the TUI.
Phase II — Interpreter (v2.0.0). An ELisp compatibility layer runs inside Passepartout's Common Lisp image. Key Emacs packages (Org-mode, Magit) run natively without an Emacs process. The compatibility layer does not aim for 100% coverage — it targets the packages the agent's workflows depend on.
Phase III — Successor (v2.0.0 and beyond). Native Common Lisp implementations of Org-mode workflows and Git integration read/write the same file formats. Total independence from Emacs. Emacs users who prefer Emacs keep the bridge. New users get the native experience.
Strategic timeline
v0.4.0 Emacs bridge (Phase I Parasite) → v1.0.0 SOTA parity → v2.0.0 Lish editor + Nyxt browser (Stage 1) + Emacs Phase II/III + mobile. The Qt/QML surface enables gradual erosion of the rendering stack without rewriting the application logic. The three-phase Emacs migration ensures Lisp users are never abandoned — the bridge works from day one, the native experience grows under it.
v3.0.0: Neurosymbolic Maturity
Deterministic planner takes the wheel. LLM relegated to semantic translation.
Architectural approach: Stitching, not building. The symbolic engine is not a from-scratch reasoner. It is an integration of existing Common Lisp libraries connected by macros and DSLs. The Lisp advantage is the macro system — it transforms human-readable rules into formal logic queries without requiring a new engine.
Open-source Lisp stack
- Knowledge Graph: VivaceGraph v3 — Lisp-native graph database with a Prolog-like query language built in. Stores facts, relationships, and rules as native Lisp objects in the same image as the agent.
- Constraint Solver: Screamer — non-deterministic backtracking. Given a set of constraints, finds all valid solutions or proves none exist. Used to verify that proposed actions do not violate invariants.
- Formal Verifier: ACL2 — a theorem prover for Common Lisp, BSD licensed. Proves properties about functions before they are committed to the running image. Used for skill verification and Dispatcher rule validation.
The 10-80-10 architecture
Ten percent neural for input translation, eighty percent symbolic for reasoning against a knowledge graph, ten percent neural for output formatting.
- 10% Input: The LLM translates natural language into structured queries (Prolog facts, knowledge graph lookups). The neural translator is trained via EGGROLL (low-rank evolution strategies) on the reward signal from the symbolic verifier — it learns to produce queries that the symbolic engine accepts.
- 80% Reasoning: Pure Lisp. Task graphs generated by the deterministic planner against the knowledge graph. Formal verification via ACL2. Constraint checking via Screamer. Fact retrieval via VivaceGraph. Zero LLM tokens. Zero hallucinations.
- 10% Output: The LLM formats symbolic results back into natural language. The neural formatter is structurally identical to the translator — same training loop, reversed direction.
The auto-formalizer bootstrap
The symbolic engine needs a populated knowledge graph. The auto-formalizer populates it:
- Feed unstructured data (documentation, manuals, logs, session histories) to the LLM in
auto-formalizermode. - The LLM extracts facts, relationships, and rules as structured S-expressions.
- The symbolic verifier (Screamer + ACL2) checks each extracted fact for consistency with the existing knowledge graph.
- Consistent facts are added. Conflicting facts are flagged for human review.
- Over time, the knowledge graph grows without manual ontology engineering.
DSL approach over engine building
Domain-specific languages, not general-purpose reasoners:
- Lisp macros transform human-readable rules into Prolog queries that run against VivaceGraph.
(defrule check-privacy :when (contains-tag payload "@personal") :then :block)expands to a VivaceGraph query with Screamer constraint checking.- Users write rules in a domain-specific DSL. The macros handle the translation to formal logic.
- The Skill Creator (v0.9.0) generates DSL rules from English descriptions. The auto-formalizer verifies them.
(macroexpand-1 '(defrule ...))shows exactly how the rule compiles — 100% auditable.
Self-correcting gates
Gates learn from the full history of outcomes — did the plan succeed? Where did it fail? The symbolic engine updates its own rules based on results:
- Induced functions from v0.5.0 feed into the symbolic engine as candidate rules.
- The symbolic verifier checks each candidate against the knowledge graph for consistency.
- Rules that pass verification are promoted to the active gate stack.
- Rules that fail verification are discarded with a diagnostic — the agent learns why the pattern doesn't generalize.
Implications
Hallucination becomes structurally impossible because the symbolic engine will not accept a fact that contradicts its knowledge graph. Safety becomes provable because ACL2 can prove properties about the system's behavior. Self-improvement becomes stable because the agent modifies skills that are then verified before execution. The 80% of computation that happens in the symbolic middle layer costs zero LLM tokens.
v4.0.0: Native Inference
LLM inference moves in-process. No external servers. No API keys required for inference.
Lisp as Sovereign Governor, not as Math Engine. The weights themselves are not stored as Lisp objects — this would waste 50% memory on type tags and destroy cache locality through pointer-chasing. Instead, the entire tensor is tagged as a single Lisp object (macro-tag). The Lisp image holds a pointer to optimized flat binary (GPU-friendly, FPGA-compatible). The tag is checked once. After that, all math happens in the optimized backend.
Native inference (FFI binding to llama.cpp)
- FFI binding to llama.cpp via CFFI: load GGUF models, run inference, manage KV cache. Single SBCL image, zero process boundaries. The agent and the model share memory.
- Speculative safety: the Dispatcher gate stack intercepts token generation in real time. A token that would produce a blocked action is preemptively suppressed before generation. No external inference API supports this.
- Foveal-peripheral compute: the model skips pruned context nodes during attention computation. External APIs compute full attention regardless of what you send. In-process inference makes the sparse-tree rendering pay off at the compute level, not just the token level.
Live surgery on cognition
With in-process inference, the agent's internal state becomes inspectable:
- Pause inference mid-stream. Inspect hidden states and activations as Lisp variables.
- Modify a vector, change a sampling parameter, resume.
- Detect when the agent is likely to hallucinate by comparing current activation patterns against historical baselines.
- The REPL becomes a surgical instrument for the agent's own cognition — not just for verifying code, but for inspecting and correcting the neural process that generates it.
DSL-compiled model architectures
Model architectures are described as Lisp DSL:
(defmodel passepartout-reasoning :type 'transformer :heads 32 :dim 4096 :layers 32)- The DSL compiles to machine code for the target backend (GPU via CUDA, FPGA via VexRiscv, CPU via llama.cpp).
- Python interprets at runtime. Lisp compiles once. Model architecture changes are treated the same as code changes — edited, verified, hot-reloaded.
v5.0.0: Hardware — Tagged Lisp Architecture
The Lisp machine becomes physical. RISC-V with tagged architecture, hardware-enforced type checking, and FPGA prototype for the symbolic core.
Not a from-scratch processor. Use RISC-V as the skeleton, add custom Lisp extensions. RISC-V provides the carrier architecture (standard instruction set, existing toolchain, LLVM support). Lisp extensions provide tagged computation (type checking in hardware, parallel garbage collection, S-expression traversal as atomic operations).
The macro-tag approach
- Top 4–8 bits of every memory word = Type Tag. Hardware checks tags in parallel with ALU operations. Trap on type mismatch.
- A tensor (70B weights) is one macro-tagged Lisp object — a pointer to flat binary. The tag is checked once. Math happens at native speed. This replaces "weights as sexps" (which wastes 50% memory on per-weight tags and destroys cache locality).
- Custom instructions: TADD (tagged add), LISP.CAR, LISP.CDR — Lisp primitives as single-cycle hardware operations.
Phase migration: Host → Co-processor → Self-hosted
- Parasitic. Lisp card (FPGA) is a PCIe co-processor. Host CPU (Intel/AMD, Linux/Windows) handles "dirty" I/O — networking, display, file systems. Lisp card handles tagged computation and the agent's cognitive loop. If Lisp crashes, host survives. Reset card, reload. Memory mapping: the card can see the host's memory. The Lisp environment reaches out and inspects data.
- Functional Hijacking. Lisp UI runs on the card, displays through the PC's GPU. The agent indexes Linux files into Lisp objects. The host becomes an I/O server for the Lisp card.
- Driver Cannibalization. Point the agent at C drivers. Ask it to generate native Lisp drivers for the hardware the card controls directly. PCIe Passthrough for direct hardware access.
- Self-Hosting. Replace the Linux bootloader with Stage0 Lisp (a bootstrap from 500 bytes of hex to a self-hosting Lisp). Cut the umbilical cord. The Lisp machine runs on bare metal.
Concrete prototyping milestones
| Stage | Hardware | Cost | What it delivers |
|---|---|---|---|
| TinyTapeout | Custom silicon (130nm) | ~$500–1,000 | 8-bit tagged toy processor with Lisp primitives |
| Shuttle | Multi-project wafer | ~$10,000–20,000 | Tagged RISC-V core at 100–300MHz |
| FPGA | Terasic DE10-Nano / Xilinx KCU105 | ~$200–500 | VexRiscv with custom Lisp extensions, PCIe card form factor |
| Industrial | Commercial foundry (5nm) | ~$10M–100M+ | Competes with modern CPUs on tagged workloads |
Start at TinyTapeout. Validate the tagged architecture works. Move to FPGA. Validate at speed. Only then consider silicon.
Garbage collection in hardware
Dedicated bus master (Scavenger) runs background garbage collection while the main CPU executes code. No "GC pause." The scavenger traverses the heap in parallel with computation, freeing unreachable objects without stopping the agent.
Persistent single-address-space memory
NVRAM for the entire heap. Turn on the machine — state is exactly where you left it. No "booting." No "loading memory from disk." The agent's Merkle-tree memory, skill registry, knowledge graph, and induced functions survive restarts as a contiguous hardware state.
Why this is not "Lisp inside browser"
Most Lisp-on-hardware attempts fail because they try to compete with Intel on raw math. That's the wrong axis. The tagged architecture doesn't need to beat a GPU at matrix multiplication. It needs to beat a CPU at symbolic computation — graph traversal, constraint solving, theorem proving, garbage collection. These are the v3.0.0 symbolic engine's workload. Hardware that makes them single-cycle is the differentiator, not hardware that runs matrix math faster.
v6.0.0: True Agency
World models, temporal reasoning, goal persistence across restarts.
- World models: Predictive models of user behavior, project dynamics, system state.
- Temporal reasoning: Scheduling, deadlines, elapsed duration awareness.
- Goal persistence: Goals survive restarts. Long-term projects in memory-objects.