Phase 1 — dedup + hardening (~9 items): - Remove duplicate *skill-registry* defvar from core-skills - Merge *backend-registry* into *probabilistic-backends*, delete backend-register - Remove inject-stimulus alias, standardize on stimulus-inject - Add pre-eval sandbox (skill-source-scan) blocks restricted symbols before eval - Remove dead plist-get function; remove duplicate json-alist-to-plist export - Fix read-framed-message whitespace DoS (4096-iteration max) - Add *read-eval* nil to dispatcher-approvals-process read-from-string (RCE) - Add test-op to ASDF; update .asd version 0.4.3→0.7.2 Phase 2 — prose + contracts + reorder: - Split ROADMAP: 2623→1089 lines (TODO only), CHANGELOG: 260→1528 lines (full DONE history, 14 versions reverse chron) - Add Contracts + Overview to 6 channel files + embedding-native + programming-standards + symbolic-scope - Reorder 28 .org files: Contract → Test Suite → Implementation (TDD order) - Add 7-phase inline prose to think() in core-reason - Expand USER_MANUAL: 183→461 lines (10 new sections) Phase 3 — decomposition + export organization: - Decompose think() into think-assemble-prompt, think-call-llm, think-parse-response orchestrator - Organize 188 exports into 16 grouped sections by module Phase 4 — budget enforcement + error protocol: - Per-session budget enforcement (SESSION_BUDGET_USD env var, budget-exhausted-p, guard in think-call-llm) - Error condition hierarchy (6 conditions: pipeline-error, llm-error, gate-error, budget-error, protocol-error) - Restarts in loop-process: skip-signal, use-fallback, abort-pipeline
88 KiB
Passepartout Evolutionary Roadmap
- The Evolutionary Roadmap
- File Update Checklist
- v0.8.1: Direction 2 — Rich Rendering
- v0.8.2: Direction 3 — Living Environment (Skin System)
- v0.8.3: Direction 3 — Adaptive Layout + Personality
- v0.9.0: Signal Pipeline, Concurrency & Streaming
- Priority-queue signal processing
- MVCC memory concurrency
- Structured output enforcement
- Doom-loop detection — 3 identical tool calls triggers HITL
- Busy-mode — queue on interrupt
- CLI / non-interactive mode —
passepartout ask - Provider health tracking — success rate + latency
- Cost-based provider routing
- Intelligent provider fallback — per-task-type routing
- Internal evaluation harness — 10 tasks, regression detection
- Autonomous certification badge
- Autonomous certification progress bar — visible "learning" indicator
- Update mechanism + migrations
- Self-configuration — agent proposes and applies config changes
- Self-diagnosis coach —
/coachcommand - Failure attribution — tag task failures with probable component
- v0.10.0: Tool Ecosystem (MCP-Native) + Voice Gateway
- MCP native client
- Core MCP tools (from existing roadmap items)
- TUI tool visualization
- Environment Steward
- Channels + providers — match OpenClaw on demand
- Web search + web fetch tools —
search-web,fetch-web - LSP integration — language server protocol client
- Auto-saved session transcripts —
/memex/system/sessions/ - Auto-memory extraction — learnings from sessions
- Universal cross-project Org query
debug-inspectcognitive tool — live state inspection- Competitive Advantage Analysis — v0.10.0 Summary
- v0.11.0: Planning, Self-Modification & Deterministic Routing
- v0.12.0: Evaluation & Vision
- v0.13.0: Consensus, GTD & Deep Emacs Integration
- v0.14.0: Self-Configuring Setup Binary
- v1.0.0: SOTA Parity (verified)
- v2.0.0: Lisp Machine Emergence
- v3.0.0: Neurosymbolic Maturity
- v4.0.0: Native Inference
- v5.0.0: Hardware — Tagged Lisp Architecture
- v6.0.0: True Agency
The Evolutionary Roadmap
Understanding Passepartout as a function in time is not nostalgia. It is architectural guidance. Every decision in v0.x should be made with awareness of where the system is going. Code written today becomes the substrate for v3.0. Skills designed today become the vocabulary the symbolic engine speaks tomorrow.
The probabilistic beginning is not a weakness to overcome. It is the bootstrap. The system learns the domain through probabilistic inference, and that learned knowledge becomes the seed for the symbolic engine. By the time the symbolic engine takes over, it has a rich knowledge graph to reason about, grown from thousands of probabilistic interactions.
This is how you build a reasoning machine: start with a learner, make it learn to verify by watching itself and its user, let verification become the core. Every blocked action becomes a rule. Every approved exception becomes a pattern. The symbolic layer grows at the probabilistic layer's expense. Remove the learner once it has learned enough.
Each version expands the deterministic layer. The Dispatcher writes rules from approved exceptions. Shadow mode runs trial executions. Tool permission tiers mature from simple allow/deny to nuanced context-aware policies. The agent becomes less likely to attempt dangerous actions not because it is smarter but because the guard has more complete information.
The roadmap is designed working backwards from SOTA parity (v1.0.0), guiding each version toward a fully autonomous, self-editing agent. Each version builds on the previous, with features designed to be implemented in pure Common Lisp + Org-mode.
The TODO states in each version's Tasks section are the authoritative task tracker. The feature tables describe what each version delivers.
Feature releases increment the minor version (v0.X.0). Bugfix and hardening releases increment the patch version (v0.X.Y). This ensures that security patches and critical fixes are visible in the version number and can ship independently of feature work. No feature release ships without its prerequisite hardening releases resolved.
File Update Checklist
When a version's state changes (DONE → tested → released), update these locations:
ROADMAP.org— mark item DONE, update LOGBOOK timestampREADME.org— update version badge (line 6), update Current Capabilities table (add new Stable rows for shipped features, remove Planned rows that have shipped)~.env.example— update version references as neededlisp/core-transport.lisp— update themake-hello-messageversion stringpassepartout(bash entry point) — update version referenceCHANGELOG.org— add new version entry with DONE items, LOGBOOK release date, and feature summaries
On release:
- Tag the release on GitHub
- Extract DONE items from ROADMAP (all items with LOGBOOK timestamps since the last release tag) and use as the release notes body
- If a
CHANGELOG.mdis needed for packaging tools, auto-generate it from ROADMAP DONE items
v0.8.1: Direction 2 — Rich Rendering
Full markdown, tool execution visualization, mouse support, and cost display. This makes the TUI competitive on rendering quality with Claude Code and OpenCode.
TODO Full markdown rendering
Extend the markdown renderer from v0.7.1:
- OSC 8 hyperlinks: embed
\x1b]8;;url\x1b\\before link text and\x1b]8;;\x1b\\after. Makes URLs clickable in supporting terminals (iTerm2, Kitty, WezTerm, Ghostty, Windows Terminal). - Blockquotes (
> text): rendered with a colored left border (theme's:accentcolor), indented text. - Tables: aligned column text. No borders (terminal tables with box-drawing characters are noisy). Column alignment inferred from header separators.
- Syntax highlighting for code blocks: keyword/string/function colors from theme. Regex-based (no parser dependency).
- All markdown features degrade gracefully to plain text on terminals without attribute support. ~100 lines.
TODO Tool execution visualization
When the agent invokes a tool:
- Pre-execution:
[Running: 🔍 search "dispatch" ...]in:tool-runningcolor with spinner - Success:
✓ search "dispatch" → 12 matches (0.3s)in:tool-successcolor - Error:
✗ shell "bad-cmd" → exit 127 (0.1s)in:tool-failurecolor with error output expanded below - Output collapsed by default to single-line summary. Tab on a tool invocation toggles full output.
- Diff display:
+(green) /-(red) coloring for file edits. 3 lines of context around changes. The:tool-outputtheme color provides the background.
Uses Croatoan's init-pair + color-pair for 256-color backgrounds on tool state regions. ~100 lines.
TODO Mouse support
Croatoan supports ncurses mouse mode via (setf mouse-enabled-p). Enable:
- Scroll wheel: PageUp/PageDown equivalent, scrolls chat by viewport height
- Click to position cursor in input area
- Click on OSC 8 link to open in browser (via
xdg-open) - Click on tool invocation to toggle expand/collapse
- Click on gate trace line to expand/collapse trace
~40 lines.
TODO Cost display
/costcommand: displays per-session and per-LLM-call cost breakdown- Optional sidebar cost counter:
$0.12 this session, updating after eachbackend-cascade-call - Per-provider pricing table (from v0.5.0 token economics)
- Color-coded: green under daily budget, yellow approaching, red exceeding
- Requires token counter infrastructure from v0.5.0. ~50 lines for display; token counting is v0.5.0 infrastructure.
TODO
Session export — /export command
Claude Code has /share (shareable URL). OpenCode has /export (Markdown). Hermes has trajectory export. Passepartout has no way to share what the agent did.
/exportwrites the current session as an Org file to~/memex/exports/<session-title>-<date>.org- Format: each message as an Org headline with role tag, timestamp, content, gate trace as property drawer
/export mdoutputs Markdown instead of Org (for sharing with non-Org users)/export jsonoutputs the session as JSON (for programmatic consumption)
50 lines. Uses existing message vector and ~memory-object-render for Org formatting.
TODO Tool output spilling — large results to file
Claude Code saves tool results >30KB to ~/.claude/tool-results/ with a 200-line preview in the response. Passepartout currently includes all output inline — which consumes context budget and makes the chat log unreadable after a large build output or log dump.
- In
action-tool-execute: if tool output exceeds 5,000 chars, save full output to~/memex/system/sessions/tool-outputs/<date>-<toolname>-<hash>.txt - In the response, replace full output with:
[Output: 12,847 chars. Full output saved to ~/memex/system/sessions/tool-outputs/2026-05-08-grep-a1b2c3.txt. Top 2,000 chars:]followed by truncated preview - The LLM can
read-filethe full output if it needs to analyze it
30 lines in ~core-loop-act.lisp
TODO Read-only output caching within a turn
Claude Code caches read-only tool results within a turn. If the agent reads the same file twice, the second read returns cached content — no disk I/O, no context waste. Passepartout re-executes the tool.
*turn-result-cache*hash table keyed by(cons tool-name args-hash), cleared at the start of eachthink()cycle- Read-only tools (read-file, search-files, find-files, list-directory, org-find-headline, org-agenda-today, lsp-*) check the cache before executing
- Cache hit: return stored result with
[cached]prefix in the response - Prevents redundant tool calls when the agent asks the same question twice within a reasoning step
25 lines in ~programming-tools.lisp
v0.8.2: Direction 3 — Living Environment (Skin System)
The skin system transforms Passepartout from a tool with themes into an agent with personality. Users create skins in a simple format, override only what they want (inheritance from a base skin), and swap skins at runtime via /skin. The spinner has personality. The borders have personality. The agent's name and welcome message are skin-customizable.
TODO Skin engine
-
Skin format: a plist file (
~/.config/passepartout/skins/myskin.lisp) defining::colors— 40+ color slots (extends the 27 theme keys): agent colors for 8 roles, status bar colors, tool colors, spinner colors, input colors, border colors. All in hex (#RRGGBB).:spinner— style (:braille,:dots,:minimal), speed (ms/frame), kawaii faces, thinking verbs:branding— agent name, welcome message, goodbye message, prompt symbol, help header:tool-prefix— character for tool output lines (default┊):tool-emojis— per-tool emoji overrides (e.g.,(:shell "⚡" :search "🔎")):banner— Rich-markup ASCII art logo displayed on startup
- Skin inheritance:
(:inherit :default)— missing values cascade from parent - Custom skins from
~/.config/passepartout/skins/*.lisp - Hot-swap via
/skin <name>— no restart. Skin changes take effect on next redraw (sub-frame latency). - Skin preview:
/skin <name>with--previewflag applies temporarily; Esc or timeout reverts. - Built-in skins as plist data in a
*skin-registry*hash table. ~250 lines.
TODO Skin presets (10+ built-in)
Organized by mood rather than theme. Each skin is a complete personality profile:
| Skin | Mood | Accent | Spinner | Character |
|---|---|---|---|---|
gold (default) |
Warm, approachable | #FFD700 | Kawaii faces | "⚕ Passepartout" |
professional |
Cool, focused | #5C9CF5 | Minimal braille | "Passepartout" |
minimal |
Zero decoration | #AAAAAA | None | "p" |
forest |
Calm, earthy | #7CB342 | Dots | "Passepartout" |
ocean |
Deep, contemplative | #26C6DA | Pulse | "Passepartout" |
ember |
Warm, energetic | #FF6D00 | Bounce | "Passepartout" |
mono |
Grayscale | #E6EDF3 | Minimal | "Passepartout" |
retro |
Amber terminal feel | #FFB000 | Blinking cursor | "PASSEPARTOUT" |
unicorn |
Playful, colorful | #E040FB | Sparkle | "🦄 Passepartout" |
midnight |
Dark blue, calm | #82AAFF | Brain | "Passepartout" |
Each skin's color slots derived systematically from accent + background. ~200 lines of skin definitions.
TODO Hooks on defskill — lifecycle interception
Passepartout's skills can inject instructions and react to triggers but cannot intercept behavior. All 4 competitors have lifecycle hooks (PreToolUse, PostToolUse, session events). Hooks complete the extension model: skills define what the agent knows; hooks define when skills get to inspect and veto actions.
- Add
:pre-tool-hookand:post-tool-hookslots to thedefskillstruct :pre-tool-hookreceives(action context), returns:allow,:deny, or:ask. Called before tool execution in the Dispatcher pipeline (new vector between shell-safety and network-exfil).:post-tool-hookreceives(action context result), returns(values modified-result modified-context)or nil to leave unchanged. Called after tool execution. Useful for logging, auto-commit, notification.:on-session-start,:on-heartbeat,:on-compactlifecycle hooks for maintenance skills- Hooks run in skill priority order. A
:denyfrom any hook short-circuits the chain. - This is Claude Code's PreToolUse pattern — 50 lines in
defskillmacro +core-perceive.lisp
TODO Prompt templates / output styles
Claude Code has "output styles" (default, Explanatory, Learning). Hermes has agent profiles. Passepartout has a single hardcoded system prompt. Users should be able to change how the agent works, not just how it looks.
- Output styles are Org files in
~/.config/passepartout/styles/with a plist frontmatter:#+STYLE: explanatory,#+DESCRIPTION: Teaches while doing -
Three built-in styles:
default— current behavior, direct and efficientexplanatory— agent explains implementation choices, provides educational insights with★ Insightblocks. Claude Code's Explanatory output stylelearning— agent pauses to ask user to write small code pieces (2-10 lines), uses● Learn by Doingblocks. Claude Code's Learning output style
/style <name>TUI command to switch at runtime. Injects a STYLE section into the system prompt between IDENTITY and TOOLS.- Style changes are immediate (next think() call). Survive restarts via config persistence.
~100 lines (~60 prompt templates + ~40 TUI integration).
TODO Skill auto-detection — file-watch hot-reload
Passepartout's image-based Lisp model enables hot-reload — redefine a function without restarting. No competitor has this. Claude Code plugins require manual /reload-plugins. Passepartout can auto-detect changes.
-
Daemon watches
org/and~/.config/passepartout/skills/withinotify(Linux) orkqueue(macOS). On.orgfile change:- Wait 200ms debounce (multiple writes within 200ms coalesce)
- Tangle the changed org file:
(org-tangle-file "org/skill-name.org") - Compile the tangled lisp:
(compile-file "lisp/skill-name.lisp") - Reload:
(load (compile-file-pathname "lisp/skill-name.lisp")) - TUI shows system message:
"Skill 'skill-name' reloaded (23 defuns, 0 errors)"
- Respects
SELF_BUILD_MODE— core files require HITL before reload. Skills reload automatically. - On compile error: keep the old version loaded, log the error, show TUI warning:
"✗ Skill 'skill-name' failed to compile — old version retained."
80 lines in a new ~symbolic-file-watch.org skill.
TODO Heavy thinking skill — parallel reasoning + sequential deliberation
The HeavySkill paper (arXiv:2605.02396v1) demonstrates that a two-stage pipeline — K independent reasoning trajectories followed by a critical deliberation step — consistently outperforms majority voting and approaches Pass@K. The authors distill it into a readable skill file that works across any agent harness. Passepartout's Merkle tree makes this auditable, rewoundable, and cross-session comparable.
-
New skill:
org/heavy-thinking.org— a readable skill document loaded at startup. The agent follows a defined protocol when facing complex reasoning tasks:- Activation: triggers when the complexity classifier detects a STEM/reasoning/code-generation task. Dormant for simple factual queries or casual conversation
- Parallel reasoning: spawns K independent
think()calls (default K=3,HEAVY_THINKING_WIDTHenv var). Each call solves the same problem from scratch without access to other trajectories. Encourages diverse strategies - Sequential deliberation: a second model call reads all K trajectories (pruned to essential thinking content to stay under context budget). Critically evaluates each — not voting, but re-reasoning. Produces a synthesized final answer with a deliberation trace: "Trajectories 1,3 converged on answer X. Trajectory 2 had error Y. Synthesized answer: X."
- Output: returns the synthesized answer with
[Heavy-thinking: 3 parallel, 1 deliberate]annotation in the response metadata
- Merkle advantage: each trajectory is stored as a content-addressed node. The deliberation trace is permanent and auditable — users can see WHY one answer was chosen
- Iterative deliberation optional (capped at 2 — the paper shows iterations 3+ degrade HP@K)
- Cost model: 3 parallel × 1 deliberation = 4 API calls for complex tasks (vs 1 normally).
HEAVY_THINKING_COST_MULTIPLIERenv var for cost-aware auto-activation
100 lines as a skill (~60 prompt template + ~40 orchestration in ~symbolic-heavy-thinking.org).
v0.8.3: Direction 3 — Adaptive Layout + Personality
The TUI adapts to the terminal it's running in — full sidebar at ultrawide, compact at standard, minimal at narrow (phone/SSH). It has a personality: spinner style, relative timestamps, progress bars, live context help.
TODO Adaptive layout (3 tiers)
- ≥ 120 columns: Full layout. Sidebar visible with all 6 panels. Chat area left of sidebar.
- 80–119 columns: Compact layout. Sidebar hidden (toggle via
/sidebaror Ctrl+X+B, rendered as overlay). Status bar 2 lines. Full markdown rendering. - < 80 columns: Minimal layout. Single-column chat. Status bar reduced to 1 line (model, ctx%, duration). Markdown reduced to bold + code blocks only. Input height clamps to 1-2 lines.
Re-renders on terminal resize (already handled via KEY_RESIZE). Content re-flows — not truncated. The layout remembers per-terminal-size preference. ~80 lines.
TODO Spinner personality
Configurable spinner style per skin:
:braille— ⠋⠙⠹⠸⠼⠴⠦⠧⠇⠏ cycling at 80ms (default):dots— ·✢✳✶✻✽ cycling (macOS style, Claude Code default):kawaii— (。◕‿◕。) (◕‿◕✿) ٩(◕‿◕。)۶ cycling with wing decorations⟪⚔ ... ⚔⟫:minimal— single ● dot blinking at 2000ms:none— static prompt symbol
Stall indication: when no response for 10s, spinner color interpolates from theme color → error red (Claude Code pattern). Reduced motion preference: spinner replaced with slow-pulse ●. ~50 lines.
TODO Progress bar
For measurable operations (file processing, test runs with known count, batch operations), render a progress bar using Unicode block characters:
[████████░░░░░░░░░░░░] 42% (5/12 tests passed)
Uses 9 block characters for sub-character precision: [' ', '▏', '▎', '▍', '▌', '▋', '▊', '▉', '█'] (Claude Code pattern). Color-coded by progress: red <25%, yellow 25-75%, green 75%+. ~25 lines.
TODO Live timestamps
- Relative timestamps on messages: "just now" (< 30s), "2m ago", "1h ago", "yesterday"
- Absolute timestamp on hover/focus (via Tab navigation to message)
- Status bar shows session duration:
Session: 3h 12m - Timestamps update live (per-minute recalculation, not per-frame)
~40 lines.
TODO Context-sensitive help
Press ? to show available actions in current context:
- In chat: list of navigation keys, command shortcuts
- In sidebar: sidebar-specific bindings
- In HITL prompt: approval/denial bindings
- In command palette: palette navigation bindings
Rendered as a dim help bar at the bottom of the screen (above input). Dismisses on any key or after 5 seconds. ~40 lines.
v0.9.0: Signal Pipeline, Concurrency & Streaming
(Renumbered from old v0.7.0. Streaming moved to v0.7.1; streaming section removed below.)
The current pipeline is strictly sequential — one signal traverses Perceive → Reason → Act before the next signal begins. Background tasks (heartbeat, embedding cron, gardener scans) compete with foreground interactions. A heartbeat that fires during a long tool chain is queued. A Telegram message during a multi-step planning cycle is queued. The system feels sluggish under concurrent load even though the symbolic operations are near-instant (SBCL hash table lookups are microseconds) — the bottleneck is the single-pipeline architecture, not the hardware.
Design insight: why concurrency matters for an agent that is "one brain." Passepartout rejects multi-agent delegation on principle (see DESIGN_DECISIONS "One Single Agent"). But a single brain handles multiple inputs simultaneously — the human brain processes vision, audio, and proprioception in parallel. Rejecting multi-agent delegation does not require rejecting concurrency within the agent. The key is that all concurrent operations share the same memory space, the same Merkle tree, and the same deterministic gate stack. They are threads of one cognition, not separate agents.
TODO Priority-queue signal processing
-
Replace the linear
process-signalcall chain with a priority-ordered signal queue. The queue is a sorted plist-list consumed by the main loop. Priority tiers::user-input/:chat-message— highest priority (the user is waiting):approval-required— high (HITL re-injections need quick resolution):tool-output— medium (feedback from tool execution, needs LLM assessment):interrupt— medium-high (shutdown signal):heartbeat/:cron/:delegation— low (background maintenance)
- Coalesce duplicate heartbeats: if the queue already contains a
:heartbeatsignal when a new one arrives, discard the older one (no value in processing stale ticks). Keep at most one pending heartbeat at any time. - The main loop drains the highest-priority signal from the queue, processes it through the pipeline, and repeats. If the pipeline produces feedback (tool-output → think), the feedback is enqueued at its appropriate priority — it may preempt background signals but won't interrupt the current signal mid-processing.
- Add telemetry: average queue depth by priority tier, max wait time per tier.
- TUI
/reconnectcommand: when the connection-loss detection from v0.3.3 fires, the user can reconnect without restarting the TUI. The command closes the stale socket, re-runsconnect-daemonwith its retry backoff, and restores the:connectedstate on success.
TODO MVCC memory concurrency
- Replace
*memory-store*(mutable global hash table) with a versioned Merkle-root pointer. The root is an(or null merkle-node)struct containing the tree and a monotonic version counter. - Read threads snapshot the root before beginning their pipeline cycle. All object lookups dereference through the snapshot — they see a consistent view of memory regardless of concurrent writes. Reads never block.
- Write threads (ingest-ast, org-modify, snapshot-memory) build new object hashes, construct a new Merkle root, and CAS-replace the global root pointer. If another thread won the CAS race (root version changed), the loser re-reads the new root, replays its changes on the updated tree, and retries the CAS.
- Conflict probability is near-zero because concurrent signals almost never touch the same Org headline. The replay-on-conflict path exists for correctness but is rarely exercised. Lock contention is eliminated — the only atomic operation is the CAS on the root pointer.
- Remove the single-threaded pipeline assumption: previously,
process-signalwas safe because nothing else wrote to*memory-store*during its execution. With MVCC, multiple signals can process concurrently because each has its own snapshot. The*loop-interrupt-lock*becomes*signal-queue-lock*(protecting only the queue, not the memory). - Test: concurrent ingest-ast from two threads writing to different memory objects, verify both commits succeed without corruption.
TODO Structured output enforcement
- Add a plist validation step between
markdown-stripandread-from-stringinthink(). Before attempting to parse, validate: (a) the output starts with(or[, (b) it contains balanced delimiters (count opens vs closes), (c) it doesn't contain#.(redundant after v0.3.1*read-eval* nilbut defense-in-depth). - On validation failure: construct a rejection trace (similar to the existing deterministic gate rejection feedback) and re-inject into the LLM prompt. The trace includes the raw output and a diagnostic ("Your response did not produce a valid plist. Ensure it starts with ( and has balanced parentheses.").
- Configurable
LLM_OUTPUT_RETRIES(default 2). After exhausting retries, fall through with the raw text as a:MESSAGEaction (current behavior). - Track parse-failure rate per provider in telemetry. Use to guide provider cascade ordering: a provider with 20% parse-failure rate falls behind one with 2%.
- If retries are exhausted without a parseable plist, the TUI renders the raw LLM output in a dimmed, collapsible region labeled "Parse failure — could not interpret this response." The user can inspect what the model produced.
TODO Doom-loop detection — 3 identical tool calls triggers HITL
OpenCode detects 3 consecutive identical tool calls and prompts the user. Without this, Passepartout could loop forever on a stuck tool — burning tokens and producing no progress.
- Track last 3 tool calls (name + args plist) in a ring buffer
- Before executing a tool, compare against the 3 previous calls
- If all 3 have the same name and equal args (using
equalp), inject a HITL prompt: "The agent has attempted 'grep defun' 3 times without progress. Continue or abort?" - Resets on any different tool call or successful output
15 lines in ~core-loop-act.lisp
TODO Busy-mode — queue on interrupt
When the agent is processing a turn and the user types a message, the current behavior is undefined. Hermes has interrupt/queue/steer. Passepartout should at minimum support queue mode.
BUSY_INPUT_MODEenv var:interrupt(default, stop current turn),queue(process after current turn)- In
queuemode: user messages arriving during an active turn are enqueued. When the current turn's tool chain completes, the queued message is injected as the next turn's user input — no HITL approval needed (it's user input). /busy interrupt//busy queueTUI commands to toggle at runtime- The priority queue (above) naturally supports this — user input queued during a turn has higher priority than heartbeats, lower than the active turn
20 lines in ~core-pipeline.lisp
TODO
CLI / non-interactive mode — passepartout ask
Claude Code supports claude -p "fix the failing test" --print. Hermes has hermes -c "command". Passepartout can only be used interactively via the TUI. A non-interactive single-shot mode enables CI/CD integration, cron jobs, and scripting.
passepartout ask "what's the status of project X?"— sends a framed message to the daemon, waits for response, prints to stdout- Daemon-side:
process-one-shothandler — inject:user-inputsignal, run through full pipeline (perceive → reason → act → loop until stop), return final agent message --jsonflag outputs the full response plist for programmatic consumption--timeout Nflag (default 120s) limits execution time- Uses the existing wire protocol — no new protocol, just a CLI wrapper around the framed TCP message format
80 lines in ~passepartout bash script + ~50 lines daemon handler.
TODO Provider health tracking — success rate + latency
backend-cascade-call tries providers in order until one succeeds. On failure it moves to the next. But it has no memory of which providers failed or succeeded in the past. A degraded provider gets retried first on every call.
*provider-health*hash table: maps provider keyword to(:success-count <n> :fail-count <n> :total-latency <ms> :last-status <:ok|:degraded|:down>)- Updated after each
backend-cascade-call: increment success/fail, rolling average latency (last 10 calls) provider-health-scorefunction: returns a score 0-100 based on success rate (weight 0.6) and latency vs baseline (weight 0.4)/provider-statusTUI command: displays a table of all providers with status indicators (● Up, ◐ Degraded, ○ Down) and recent history- Telemetry: provider health data feeds the session telemetry system
60 lines in ~neuro-provider.lisp + ~30 lines TUI.
TODO Cost-based provider routing
backend-cascade-call currently tries providers in registration order. With cost tracking (v0.5.0) and provider health (above), the cascade can be sorted by cost-effectiveness.
COST_ROUTINGenv var (defaulttrue): when enabled, sort the cascade by(provider-health-score * 0.3 + cost-savings-score * 0.7)cost-savings-score: cheap providers score high. Free providers (Ollama local) score 100. Expensive providers (GPT-4) score 10.- Health override: a provider with score < 20 (degraded) is demoted below healthy providers regardless of cost
/routingTUI command: displays current cascade order with scores and reasons
40 lines in ~core-reason.lisp
TODO Intelligent provider fallback — per-task-type routing
Current fallback is "try the next provider." But different providers excel at different tasks. DeepSeek is strong at code generation. Groq is fast for simple queries. Claude is better at reasoning. The cascade should adapt to the task.
*task-provider-scores*hash table: maps(task-type keyword) → (provider keyword → score)- Task types:
:chat(conversation),:code(code generation/editing),:plan(multi-step planning),:search(information retrieval),:summary(compaction),:reflex(deterministic lookup) - Scores updated after each call: if the response was accepted (no rejection retry), increment that provider's score for that task type
- When the primary provider fails, the fallback picks the highest-scored provider for the current task type (not just the next in line)
- Bootstrap from defaults: GPT-4/Claude for reasoning, DeepSeek for code, Groq for chat, local Ollama for reflex
60 lines in ~neuro-router.lisp
TODO Internal evaluation harness — 10 tasks, regression detection
When moved from v0.12.0: the internal eval harness must ship before v0.10.0 so it can validate the Signal Pipeline (v0.9.0) and catch regressions from MCP Tools (v0.10.0), Planning (v0.11.0), and beyond. The SWE-bench competitive scoring harness remains at v0.12.0 — this is the lightweight internal suite.
- New skill:
symbolic-evaluation.org→symbolic-evaluation.lisp deftaskmacro: define an eval task with:setup(create test environment),:prompt(what to ask the agent),:verify(function that checks the output),:teardown(cleanup)run-eval-suite: run all registered tasks, produce score (pass count / total), per-task diagnostics- Initial 10 tasks: find TODOs, create Org note, search codebase, read file, query memory, list projects, run safe shell command, find definition, set TODO state, summarize session
- Regression mode: run after each version build. Fail CI if score drops.
- Task suite grows with codebase: every bug fix adds a regression task
~200 lines.
TODO Autonomous certification badge
After N HITL approvals of the same pattern, the dispatcher auto-approves it. But unlike Claude Code's "auto mode," this is deterministic — no probability, no model hallucination granting permission. The certification is a logical certainty.
- When a pattern crosses
DISPATCHER_RULE_THRESHOLD, the dispatcher writes the rule torules.orgAND grants a certification entry: "Certified: shell commands targeting ~/memex/projects/* with git status are deterministically safe. 47 approvals, 0 denials." - The sidebar Rules panel shows:
[Rules: 47 | Certified: 12]— learned rules vs certified patterns /certificationsTUI command: lists all certified patterns with approval counts, last-used timestamps, and the gate vector that checks them- Certification downgrade: if a certified pattern is later denied by the user, the certification is revoked and the pattern returns to HITL
- This is the operational realization of "the more you use it, the cheaper it gets" — each certification represents a category of actions that will never cost another HITL prompt
60 lines in ~security-dispatcher.lisp + sidebar rendering reuse.
TODO Autonomous certification progress bar — visible "learning" indicator
The certification badge grants permanent auto-approval. Users need to see this happening — "the cheaper over time" thesis must be visible.
- Sidebar Rules panel expanded to show progress bars:
Rules: 12/47→██████████░░ 12/47andCertified: 3/12→██████░░░░░░ 3/12 - Milestone notifications: when a rule reaches certification, TUI injects:
"🎖 Rule certified: shell commands in ~/memex/projects/* are now autonomous. 47 approvals, 0 denials. /certifications to review." - Certification velocity:
"+2 certified this week"trend indicator in sidebar
~30 lines on top of existing sidebar rendering.
TODO Update mechanism + migrations
No update mechanism exists. Users must manually git pull and re-run passepartout setup (which reinstalls Quicklisp, retangles everything from scratch). Claude Code has claude update, Hermes has hermes update. Passepartout needs an incremental update path.
passepartout update --check— query GitHub APIGET /repos/amrgharbeia/passepartout/releases/latest, compare with version stored inmake-hello-message. Report: "v0.5.1 available. 47 changes."passepartout update(git-based) —git fetch --tags && git checkout v0.5.1, incremental tangle (only org files changed since previous tag, viagit diff --name-only v0.5.0..v0.5.1 -- org/*.org), recompile changed lisp files, restart daemon- Migration hooks:
~/memex/system/migrations/— ordered Lisp scripts run after tangle, before daemon restart.migrate-v051.lispupgrades memory format, config schema, package names. Tracked by*migration-version*in~/.config/passepartout/version.lisp - Post-update verification: run internal eval suite, verify skill count ≥ 10, smoke test daemon port 9105. On failure:
passepartout update --rollback→git checkout v0.5.0→ re-tangle → restart - Binary update path (when v0.14.0 ships): download binary from GitHub Releases, verify SHA-256, replace, restart
~80 lines bash + ~50 lines Lisp.
TODO Self-configuration — agent proposes and applies config changes
Passepartout's config is text files (`.env`, `.lisp`) — the same format the agent already edits. No competitor can self-configure because their config requires runtime restart or schema validation after file write. Passepartout can edit `.env` → daemon detects change → reloads → takes effect without restart.
passepartout config set <key> <value>CLI command: writes to `.env`, triggers daemon reload. ~20 lines bash.- Runtime config reload: daemon watches `.env` with
inotify(reuses file-watch from v0.8.2). On change: re-reads env vars, reloads provider cascade, updates gate thresholds. No restart needed. - Config validation before write: agent verifies provider names exist (against
neuro-explorerregistry), ports are valid numbers, thresholds are integers, file paths are within memex. On invalid value, proposes correction. - Config change audit: every change writes to Merkle tree: "Agent changed DISPATCHER_RULE_THRESHOLD from 3 to 5. HITL approved." Gate trace records the decision.
~40 lines daemon + ~30 lines config validation.
Three tiers of self-configuration:
- Config Query (v0.7.2) — "What providers do I have?" → answered from system prompt CONFIG section. Already implemented.
- Config Suggest (v0.9.0) — "Should I use a cheaper model?" → agent analyzes telemetry, proposes specific config change with estimated savings. User decides.
- Config Apply (v0.9.0) — "Add @credentials to privacy tags" → agent proposes change → HITL review → writes `.env` → daemon reloads → change takes effect within one think() cycle.
- Config Optimize (v0.9.0) — "Make yourself cheaper" → agent analyzes cost patterns across all sessions, proposes multi-key optimization. User approves full batch.
TODO
Self-diagnosis coach — /coach command
Telemetry data (v0.9.0) plus the agent's self-knowledge enables coaching: the agent detects workflow anti-patterns and suggests improvements.
-
/coach— analyzes telemetry from the last N sessions, produces a coaching report with 3-5 actionable tips:"💡 Tip: You type full file paths 89% of the time. Try @mention autocomplete (type @ then start typing a filename) — it's 3x faster and learns your most-used files.""💡 Tip: You've approved 47 git status commands. This pattern can be auto-certified to skip future HITL. /certifications to review.""💡 Tip: Your average context usage is 78%. Consider increasing CONTEXT_MAX_TOKENS for more awareness, or using /focus to reduce irrelevant context.""💡 Tip: You use /theme 0 times. Passepartout has 8 themes. Try /theme gruvbox for a warmer terminal feel."
- Coaching data sources: command frequency, HITL approval patterns, context usage history, feature adoption rate, telemetry aggregates
- Coaching is opt-in (privacy-respecting — no data leaves the machine). ~50 lines in telemetry skill + ~30 lines TUI rendering.
TODO Failure attribution — tag task failures with probable component
AHE (arXiv:2604.25850v2) shows that evolution loops work when failures are attributed to specific harness components, not just "the task failed." Passepartout's telemetry records task outcomes but doesn't classify failures by root cause.
- In telemetry skill: when a session ends with a task failure (agent couldn't complete, user interrupted with denial, or dispatcher blocked irrecoverably), the telemeter classifies the failure as one of:
:tool-failure(tool timeout, tool error),:gate-overblock(dispatcher blocked a necessary command),:gate-underblock(dispatcher allowed a harmful command),:reasoning-error(LLM produced a wrong answer),:context-overflow(context budget exhausted),:timeout(session timeout) - Classification is deterministic: if last action was blocked by dispatcher → gate-overblock. If last action was a tool error → tool-failure. If last action was a successful tool call but wrong output → reasoning-error.
- Feeds the Skill Creator (v0.11.0) — the agent knows which component to fix, not just that something went wrong
~20 lines in telemetry skill.
v0.10.0: Tool Ecosystem (MCP-Native) + Voice Gateway
(Renumbered from old v0.8.0.)
The original roadmap placed MCP at v0.9.0 and planned "10+ cognitive tools" built from scratch for v1.0.0. This is inverted: the ecosystem already provides 50+ tools (filesystem, git, postgres, slack, github, web search, memory servers). Building bespoke tools from scratch duplicates work the community has already done and tested. Passepartout's advantage is not in tool implementation but in tool orchestration — the deterministic gate stack that verifies every tool invocation before execution.
Why MCP matters for competitive positioning: Claude Code's native tools (Read, Write, Edit, Bash, Grep, Glob, WebSearch) are implemented in TypeScript within the Claude Code runtime. They are not extensible — you cannot add a tool without modifying the runtime. OpenClaw's tools are similarly baked into the Node.js process. By building a native MCP client, Passepartout gains tool breadth that exceeds both competitors (50+ tools via the MCP ecosystem versus ~10 native tools) without building a single tool implementation. The tool quality is maintained by the ecosystem; the safety verification is maintained by Passepartout's gate stack. This division of labor is the right architecture for a small team building a competitor to well-funded commercial agents.
TODO MCP native client
- Pure Common Lisp MCP client: parse JSON-RPC messages from MCP servers over stdio or SSE. No Python bridge, no Node.js subprocess. The client runs in the same Lisp image as the agent — zero serialization overhead between the agent and the MCP layer.
- Implement the MCP protocol lifecycle: initialize handshake, list tools, call tool, handle notifications. Each MCP server registers its tools as entries in Passepartout's
*cognitive-tool-registry*at connection time — the LLM's tool belt prompt automatically expands to include them. MCP_SERVERSenv var: comma-separated paths to MCP server config files (JSON). Each config specifies the server command, args, and env vars. Example:MCP_SERVERS=~/.config/passepartout/mcp/filesystem.json,~/.config/passepartout/mcp/git.json.- Tool invocation route: LLM proposes a tool call → Dispatcher verifies against permission table → MCP client serializes call as JSON-RPC → server executes → result deserialized back to plist → returned to LLM as tool output. The Dispatcher does not distinguish between native tools and MCP tools — the gate stack is uniform.
- Register the MCP client as a skill (
defskill~:passepartout-mcp-client~) so it can be hot-reloaded. The MCP client is not core infrastructure — it is a skill that extends the tool ecosystem.
TODO Core MCP tools (from existing roadmap items)
- Git Steward: status, diff, commit, push, branch via the MCP Git server. Policy gate enforces commit-before-modify: any file write to a git-tracked directory must be preceded by a diff review.
- Web Research: headless browser via Puppeteer/Playwright MCP server. Text extraction, screenshot capture, page interaction.
- Interactive PTY: stream long-running process output to context window, async interrupt control.
TODO TUI tool visualization
- Already implemented in v0.8.1 (tool execution visualization). This TODO confirms the rendering path works for MCP tools as well as native tools — no distinction at the TUI level.
TODO Environment Steward
- Detect "command not found" in shell actuator output.
- Search system PATH and package manager registries for the missing command.
- Propose installation command and retry the failed action on user approval.
- Cache resolved dependency paths to avoid repeated searches.
TODO Channels + providers — match OpenClaw on demand
The daemon protocol is client-agnostic hex-framed plists over TCP. Every new channel is a new client that speaks the same protocol. OpenClaw's 23+ channels are trivially copyable — each platform needs a poll loop + send function, 30 lines each. LLM providers are a row in ~*provider-cascade* — a new entry in neuro-provider.lisp with API endpoint + token pricing. Neither deserves its own release.
- Channels: match OpenClaw's 23+ channels on demand. The Emacs bridge (already done, v0.4.0) proves the pattern. Each new platform (WhatsApp, iMessage, Matrix, IRC, etc.) is a skill that registers a poll-fn + send-fn. ~30 lines per channel.
- Providers: match OpenClaw/Hermes on provider count. Adding a new provider is a table entry in
neuro-provider.lisp: name, API endpoint, model list, pricing. ~20 lines per provider. - Voice: STT + TTS are REST wrappers (
whisper/elevenlabs/espeak). Already spec'd as a skill. ~50 lines.
No separate releases. Done when needed, shipped when ready.
TODO
Web search + web fetch tools — search-web, fetch-web
Claude Code has WebSearchTool + WebFetchTool. Hermes has firecrawl-py + exa-py. Passepartout's agent cannot answer questions about the world, look up documentation, or research current events. Two new cognitive tools, no external dependencies:
search-web— POST query to a search API (SearXNG public instance as default, configurable viaWEB_SEARCH_URLenv var). Returns title + URL + snippet for top 10 results. Dispatcher's network-exfiltration gate (vector 8) provides free safety — search queries are already vetted.fetch-web— GET a URL, extract text content via regex-based HTML stripping (no parser dependency — strip tags, keep whitespace). Returns plain text, truncated to 10,000 chars. Dispatcher's network-exfiltration gate checks the URL domain against the allowlist.- Both register via
def-cognitive-toolas read-only tools (auto-approve via v0.7.2 safe-tool allowlist)
150 lines as a new skill ~programming-web.org. No external Python/Node.js process.
TODO LSP integration — language server protocol client
Claude Code uses LSP for code intelligence — find definitions, find references, diagnostics, hover types. Without LSP, Passepartout can grep patterns but cannot answer "where is this function defined?" or "what calls this?" — questions Claude Code answers instantly with zero LLM tokens.
- LSP client as a skill (
lsp-client.org). Communicates with language servers via stdio JSON-RPC (same pattern as MCP client, different protocol). - Three cognitive tools:
lsp-definition(go to definition),lsp-references(find references),lsp-diagnostics(get errors/warnings for file) - Read-only tools — auto-approve via v0.7.2 safe-tool allowlist
- Supported languages: any language with an LSP server (TypeScript, Python, Rust, Go, C/C++, Java, etc.) — not Lisp-specific
- LSP servers installed by the user (e.g.,
npm install -g typescript-language-server). Passepartout auto-discovers installed servers via PATH.
~200 lines. Register as read-only cognitive tools. No daemon protocol changes — LSP is a background process, not a rendering concern.
TODO
Auto-saved session transcripts — /memex/system/sessions/
Passepartout has no session persistence beyond Merkle tree snapshots. Chat history lives in the TUI's in-memory vector and is lost on restart. Every competitor persists sessions: Claude Code uses JSONL, OpenCode uses SQLite, OpenClaw uses JSONL, Hermes uses SQLite+FTS5.
- Auto-save on every message (user and agent): append to
~/memex/system/sessions/<date>-<title>.orgas an Org file - Format: each message as an Org headline with role tag (
:user:,:agent:,:system:), universal timestamp, content in body. Gate trace as a property drawer under the agent message headline. - Session title derived from the first user message (first 60 chars, sanitized for filename). Override with
/rename <title> - Auto-save is automatic — no
/exportneeded. The/exportcommand delegates to the same function with format options (Org/Markdown/JSON) - Location:
/memex/system/sessions/— undersystem/, notdaily/, no clutter - Survives daemon restarts. Resume via
/resume <date-title>(existing session resume from v0.7.2)
80 lines in ~core-transport.lisp (append on message send) + reuse existing Org rendering.
TODO Auto-memory extraction — learnings from sessions
Claude Code's extractMemories runs at the end of each query loop, scanning the conversation for durable learnings and writing them to memory files. Hermes's MemoryProvider.sync_turn does the same. Passepartout records everything in the Merkle tree but never extracts cross-session learnings.
- After each
think()cycle that produces a final response (no tool calls pending), runextract-session-memory: a lightweight LLM call (50 tokens of prompt) that asks "What should I remember from this session?" and writes the result to ~~/memex/system/memory/<project>/<date>.org - The extraction uses a forked LLM call (separate from the main response) with the session transcript as context
- Auto-memory files are injected into the CONTEXT section of future
think()calls as "Session memory: [learnings from prior sessions about this project]" - Extracted memories include: decisions made, patterns observed, preferences expressed, errors encountered and fixed, codebase facts learned
- Opt-out via
AUTO_MEMORY=falseenv var. Extraction frequency capped at one per minute to prevent runaway API costs.
80 lines in ~core-reason.lisp + reuse session transcript for context.
TODO Universal cross-project Org query
Passepartout's entire memex is Org — one format for memory, tasks, documents, transcripts. No competitor has this. Claude Code queries CLaude.md (one file), SQLite (separate DB), and file tools (grep). Passepartout can query everything with one function.
(org-query :tag "@urgent" :state "TODO" :since "-7d" :path "~/memex/projects/")— scans all projects in memex, returns matching Org headlines as memory objects. Zero LLM tokens, ~2ms execution.(org-query :property "DEADLINE" :before "-1d")— overdue items. Feeds/agendacommand.(org-query :where "dispatch" :in-title-p t)— search headlines containing a term across all projects.(org-query :limit 20 :sort :priority)— sorted, capped results.- This is the infrastructure that makes the GTD weekly review (v0.13.0) possible — pure Lisp tree traversal with no external database.
150 lines in ~programming-org.lisp (extends existing Org manipulation primitives).
TODO
debug-inspect cognitive tool — live state inspection
Lisp enables live state inspection that no TypeScript/Python agent can match. Claude Code has no REPL. Passepartout can inspect and modify its own running state.
debug-inspectcognitive tool: evaluates a Lisp form in the running image and returns the result as a structured plist. Parameters:code(Lisp form string),package(optional).- Read-only tool: auto-approve via v0.7.2 safe-tool allowlist. No side effects — inspection only.
- Use cases:
(hash-table-count *memory-store*),(inspect memory-object-by-id "node-42"),(map 'list #'car *skill-registry*) - The agent can introspect its own state to answer meta-questions: "How many objects are in memory?" "What skills are loaded?" "What was the last HITL decision?"
30 lines in ~programming-repl.lisp(extends existing repl-eval with safety guard).
Competitive Advantage Analysis — v0.10.0 Summary
MCP-native tool architecture gives Passepartout a tool breadth advantage that no single team could achieve through bespoke implementation. The MCP ecosystem is growing faster than any individual agent's tool set. By connecting to it rather than competing with it, Passepartout's tool count scales with the ecosystem — every new MCP server is a new Passepartout tool.
The Dispatcher's tool permission table (allow/ask/deny) applies uniformly to MCP tools, giving Passepartout tool-level security granularity that competitors lack. Claude Code's tools are binary: available or not. Passepartout can conditionally allow filesystem writes to /projects/* while requiring HITL for writes to ~/.config/* — per-path, per-tool, per-session. This is the deterministic gate stack's natural application domain.
The Git policy gate (commit-before-modify) is a safety feature no competitor provides. It prevents the most common agent failure mode: modifying files without preserving the prior state. Combined with memory snapshots (v0.2.0), this gives every action a dual audit trail: the git history and the memory object history.
The TUI tool visualization (v0.8.1) extends seamlessly to MCP tools — the rendering layer doesn't distinguish between native tools and MCP tools. The same colored backgrounds, collapsible outputs, and gate traces apply universally.
The voice gateway and additional channels add parity with OpenClaw's multi-surface approach without architectural changes — every channel is a thin client speaking the same framed TCP protocol to the same daemon. Channels and providers are trivially copyable: each new platform is ~30 lines of poll-loop, each new provider is ~20 lines of API config. Passepartout matches OpenClaw's channel count on demand, shipping when needed rather than as a scheduled milestone.
v0.11.0: Planning, Self-Modification & Deterministic Routing
(Renumbered from old v0.9.0.)
Design insight: the inverted tier classifier. The current tier classifier routes "rm", "write-file", and "shell" to :REFLEX (no LLM). This routes the most dangerous operations to the path with the least oversight. It should be inverted: :REFLEX handles deterministic lookups (list TODOs, check file existence, query memory), :COGNITION handles text processing and summarization, :REASONING handles planning and code generation. Dangerous operations should always route through :REASONING where the full LLM cycle and Dispatcher gate stack apply. v0.11.1 fixes this.
TODO Long-horizon planning (task tree DAG)
- Decompose complex tasks into Org-mode headline trees. Each task node is a memory-object with terminal states:
:todo→:next-action→:in-progress→:done/:blocked/:stuck. - The LLM generates the initial task tree from the user's request. The REASONING tier processes each leaf task sequentially, updating node states as it progresses.
- Parent nodes summarise child results: when all children of a node reach
:done, the parent is promoted to:donewith a synthesised summary. When any child reaches:stuck, the parent is promoted to:blockedwith the blocking child's diagnostic. - Branch pruning: if a child is
:stuckafter three retries with different LLM providers, the parent re-plans the branch — the LLM generates alternative decomposition paths for the blocked sub-task. - Task trees persist as Org headlines in
/memex/system/tasks/. Survive restarts. Visible to the user as editable Org files. - TUI task tree visualization: a collapsible Org headline tree rendered in the chat area. Each node shows its terminal state with a colored indicator (
○todo,▶next-action,◉in-progress,✓done,✗blocked,⏸stuck). Nodes expand/collapse on Enter. The tree updates in real time as the agent progresses through subtasks.
TODO Tier classifier fix
- Invert the current classifier:
:REFLEX= deterministic lookups only (memory query, file-exists-p, check time, list TODOs by tag).:COGNITION= text processing, summarization, simple Q&A, note formatting.:REASONING= planning, code generation, multi-step task execution, dangerous operations. - Track classifier accuracy via telemetry: for each classified action, record whether the classification was appropriate.
- The classifier function is overrideable via
*tier-classifier*, allowing users or skills to customize routing. - The classifier should be a skill, not core infrastructure — reloadable and replaceable without restart.
TODO Skill Creator
- LLM drafts complete skill org-file from natural language description.
- Mandatory pipeline: (a) syntax validation via
lisp-syntax-validate, (b) sandbox-load in temporary jailed package (v0.3.2), (c) run registered trigger function against mock contexts, (d) run registered deterministic gate against mock proposals, (e) on pass, promote to live registry underpassepartout.skills.<name>. - Required
:repl-verifiedflag on alldefunforms — the existing Dispatcher lint check warns on writes without verification. The Skill Creator enforces this at creation time. - Skills are the primary extension mechanism for users. The Skill Creator makes skill authoring accessible to non-Lisp-programmers: describe what you want in English, the LLM drafts the Org file, the system verifies it, and the skill is live.
TODO Change manifest — skills ship with falsifiable predictions
AHE (arXiv:2604.25850v2) shows that harness edits work better when each edit ships with a self-declared prediction, verified by next-round outcomes. Passepartout's Skill Creator should do the same — every new or modified skill carries predictions that telemetry verifies.
-
When the Skill Creator generates a skill, it also generates a
#+PREDICTION:block in the Org frontmatter:#+PREDICTION: reduces token usage by 15% for code-generation tasks#+PREDICTION: may increase HITL prompts for shell commands outside workspace#+PREDICTION: should improve success rate on refactoring tasks
- Over the next 10 sessions, telemetry compares actual outcomes against predictions. The verification result is appended to the skill file:
#+VERIFIED: Y token change: -18% (predicted -15%) on 2026-06-01 - Disproven predictions flag the skill for review:
#+DISPROVEN: token usage increased +3% on code tasks (predicted -15%). Skill scheduled for revision. - The change manifest persists in the skill's Org file — every skill carries its own evidence ledger. Users can see which skills worked as predicted and which didn't.
~40 lines in Skill Creator + telemetry integration.
Competitive Advantage Analysis — v0.11.0 Summary
The task tree DAG with terminal states and branch pruning is Passepartout's planning primitive — analogous to Claude Code's TODO list but structural (Org headlines with parent-child relationships) rather than flat.
The tier classifier fix is a safety correctness issue. The current inverted classifier (dangerous ops → no-LLM path) is actively harmful — it reduces oversight on the operations that need it most.
The Skill Creator is the mechanism by which Passepartout escapes the "team of Lisp programmers" constraint. Most agent frameworks require Python/TypeScript to extend. Passepartout's extension language is English — the LLM writes the Lisp, the system verifies it.
v0.12.0: Evaluation & Vision
(Renumbered from old v0.10.0.)
With tools (v0.10.0) and planning (v0.11.0) in place, the agent can execute complex multi-step tasks. v0.12.0 answers two questions: (1) how do we prove it works? (SWE-bench evaluation harness), and (2) can the agent interact with visual interfaces? (computer use / vision).
TODO SWE-bench harness
- Automated pipeline: clone a repository from SWE-bench dataset, parse the GitHub issue, feed the issue description into Passepartout's cognitive loop, track the resolution trajectory as an Org headline tree, apply the generated patch, run the repository's test suite, score success (tests pass yes/no).
- Trajectory persistence: each benchmark run produces an Org file under
/memex/system/benchmarks/recording everythink()call, every tool invocation, every Dispatcher decision, and the final test result. - Regression mode: run the same benchmark after each version release. Track score trends. A version that regresses on SWE-bench does not ship.
- Target: competitive score with Claude Code and OpenClaw on SWE-bench-verified by v1.0.0.
TODO Computer Use / Vision
- Screenshot capture: X11 (
xwd/import) and Wayland (grim) bridge. - Vision model integration: send screenshot to a vision-capable model (GPT-4V, Claude 3.5, Gemini 2.0 Flash).
- Coordinate-based interaction:
xdotool/ydotoolfor click and type commands. Dispatcher approval gate applies — screen interaction requires HITL by default. - Use case: "open Firefox, search for the Passepartout GitHub repo, and star it."
TODO Telemetry / observability — structured event logging
Claude Code tracks everything via GrowthBook feature flags. OpenClaw has structured telemetry with trajectory sidecars. Hermes logs session metrics to SQLite. Passepartout has log-message — unstructured, no aggregation. Without telemetry, Passepartout cannot answer: "How many HITL prompts per session?" "What's the approval rate?" "Which gate blocks most often?" "What's the average context usage?" These are the metrics that would validate the README's "2-3x fewer tokens" claim.
- Structured event log as JSONL in
~/.local/share/passepartout/telemetry/(one file per session + aggregate) - Event types:
:session-start,:think-call(tokens in/out, provider, model, duration),:tool-execution(name, duration, success/error),:gate-decision(gate name, result, pattern),:hitl-decision(approved/denied, pattern, session count),:context-snapshot(tokens used, foveal node, pruned count),:session-end(total tokens, total cost, tool calls, HITL count) - Aggregate keys tracked as a hash table: HITL approval rate, average context usage, most-blocked gate, tokens saved by foveal pruning vs full context
/telemetryTUI command: displays aggregate stats + per-session breakdown- Feeds the evaluation harness (SWE-bench trajectory data comes from the same telemetry system)
200 lines as a new skill ~symbolic-telemetry.org. No daemon protocol changes.
Competitive Advantage Analysis — v0.12.0 Summary
SWE-bench evaluation is the industry standard for coding agent capability claims. Passepartout's trajectory persistence is a differentiator: most harnesses produce a pass/fail score. Passepartout's produces a complete Org-mode audit trail showing exactly where the reasoning succeeded or failed.
Vision + screen interaction is table stakes for competing with Claude Code's computer use feature. The Passepartout advantage: every screen interaction passes through the Dispatcher gate stack.
v0.13.0: Consensus, GTD & Deep Emacs Integration
(Renumbered from old v0.11.0.)
Near-SOTA. The agent has tools, planning, evaluation, and streaming. v0.13.0 adds reliability (consensus), productivity methodology (GTD), and environment depth (Emacs integration).
TODO Consensus loop
- Multi-provider parallel inference for critical decisions. When the action's impact score exceeds a threshold, the system sends the same prompt to 2–3 independent providers.
- Disagreement detection: compare structured outputs. If all providers agree, proceed with highest-confidence result. If they disagree, flag for HITL approval.
- Cost-aware: consensus mode doubles/triples cost. Only trigger when impact exceeds cost threshold. Configurable via
CONSENSUS_THRESHOLD. - TUI consensus display: collapsible region listing each provider, its model, its proposal, and its confidence score.
✓ 3/3 providers agreein green;✗ 2/3 agreein yellow.
TODO GTD integration
- Full GTD cycle: capture → process → clarify → organize → reflect → engage.
- Org properties:
:TRIGGER:(what context),:BLOCKER:(what must complete first). - Weekly review: agent scans all projects and tasks, surfaces stalled items, suggests next actions. Produced deterministically — zero LLM tokens.
- TUI agenda view:
/agendacommand renders Org-agenda as formatted scrollable region within the chat area.
TODO Deep Emacs integration
- Phase II — Interpreter: ELisp compatibility layer runs inside Passepartout's Common Lisp image. Key Emacs packages (Org-mode, Magit) run natively without an Emacs process.
- Org-agenda awareness: agent queries agenda view, incorporates agenda context into planning.
- Clock time tracking: agent starts/stops clocks on Org headlines, produces clock tables.
- Refile and archive: agent refiles headlines between Org files and archives completed items.
Competitive Advantage Analysis — v0.13.0 Summary
The consensus loop benefits from structured output enforcement (v0.9.0) — comparing plists for semantic equivalence is simpler than comparing free-text responses.
The GTD and Emacs integration are Passepartout's "unfair advantages" — no competitor has either. Claude Code and Copilot are development tools, not life management tools. Org-mode is the bridge: the same format that holds the agent's memory holds the user's tasks, calendar, and notes.
v0.14.0: Self-Configuring Setup Binary
Rationale: The current passepartout configure flow is a bash script that detects
Debian or Fedora, installs packages, installs Quicklisp, tangles Org sources, and
runs the setup wizard. It handles 2 distro families. A save-lisp-and-die binary
distributes Passepartout as a single executable with no SBCL or Quicklisp
prerequisite, and an optional small LLM fallback expands coverage to any distro
with a package manager.
Installation is handled by the bash script or this binary. Configuration is handled by the TUI setup wizard (the new decision from v0.8.0).
TODO Save-lisp-and-die executable
- The setup binary (
passepartout-setup) is asave-lisp-and-dieexecutable (~100MB: SBCL runtime + core Lisp code + native embedding inference from v0.4.0 + 23MB embedding model). No SBCL install required. No Quicklisp. No bash script. The user runs one file. - Deterministic path (default, always runs first): the same distro detection, package installation, and configuration logic from today's bash script, reimplemented in Lisp. Handles Debian and Fedora families. Covers the common case without touching an LLM.
- LLM-assisted path (optional, activates on deterministic failure): downloads
Qwen2.5-0.5B (
500MB GGUF, pinned by hash, cached to ~~/.local/share/passepartout/models/). The model reads command output, classifies success/failure/recoverable-error from a finite set of outcomes, and selects the next corrective action from a constrained decision tree. On unrecognized failures, generates a diagnostic for the user. - Model hash verification: the GGUF file is pinned by SHA-256 hash. If the hash doesn't match (wrong version, corrupted download), fall back to deterministic setup with a warning.
- After setup completes, the binary exits. The user runs
passepartout daemonto start the full system (a live SBCL process, not a sealed binary — REPL, hot-reload, self-modification all available). - Add FiveAM test: the deterministic path succeeds on a system with all dependencies pre-installed; the LLM-assisted path correctly classifies 10 common package-manager error messages.
v1.0.0: SOTA Parity (verified)
Feature-complete, benchmark-verified, production-hardened. All capabilities from v0.3.0 through v0.14.0 integrated and tested end-to-end.
v1.0.0 is not a feature release — it is a verification release. Every feature from the v0.x series is tested under concurrent load, resource starvation, adversarial input, and benchmark scoring. The evaluation harness (v0.12.0) provides the scoring apparatus; v1.0.0 is the scored release.
| Area | Parity Target | Verification Method |
|---|---|---|
| Self-improvement | Skill Creator + self-edit + hot-reload | Skill regression suite |
| Planning | Task tree DAG with terminal states | Multi-step integration tests |
| Tool ecosystem | 15+ MCP tools + native shell + git | MCP protocol compliance tests |
| Context window | Semantic search + foveal-peripheral + caching | Token budget vs competitor audit |
| Safety | 10-vector Dispatcher + policy + permissions | Chaos testing |
| Multi-step tasks | Task trees with terminal states | SWE-bench score (v0.12.0 harness) |
| Code editing | Full file read/write via MCP + Org | SWE-bench-verified subset |
| Memory | Vector recall + Merkle integrity + MVCC | Concurrency stress test (v0.9.0) |
| Emacs integration | Full org-mode control (exceeds Claude Code) | Org-agenda round-trip test |
| Streaming | Live text + interrupt-and-redirect (v0.7.1) | TUI UX latency benchmark |
| TUI | Streaming, markdown, gate trace, sidebar, | TUI integration test suite |
| theme system, adaptive layout, mouse, search | ||
| Packaging | Source install + save-lisp-and-die binary | Install test matrix across distros |
| Offline | 100% local capable (7-13B model) | Air-gapped integration test |
| Cost | 2-3x fewer tokens than competitors | SWE-bench token audit |
| Concurrency | Priority queue + MVCC + parallel signals | Concurrent load test (3 users + bg) |
Performance projection at v1.0.0:
| Scenario | Passepartout v1.0.0 | Claude Code | OpenClaw |
|---|---|---|---|
| Single-turn chat (local 8B) | 2-4s, ~1,500 tok | N/A (cloud-only) | N/A (cloud-only) |
| Single-turn chat (cloud) | 1-3s, ~1,500 tok | 1-3s, ~3,000 tok | 1-3s, ~3,500 tok |
| Multi-step coding (5 files) | 15-30s, ~30,000 tok | 10-20s, ~65,000 tok | 20-40s, ~85,000 tok |
| Knowledge base query (500 nodes) | <1s (in-image vector), 0 LLM tok | 3-5s, ~5,000 tok (LLM-assisted) | 3-5s, ~5,000 tok (LLM-assisted) |
| Background maintenance | 0 LLM tok (deterministic cron) | Variable or skipped | Variable or skipped |
| Offline operation | Full capability | None | None |
| Cost per coding session | ~$0.15 (gpt-4o-mini) | ~$0.45 (gpt-4o-mini) | ~$0.55 (gpt-4o-mini) |
Passepartout wins on cost (2-3x savings from sparse trees + deterministic gates + caching), offline capability (unique), and knowledge management (10-40x savings from in-image vector lookup + Org-native format). It is competitive on single-turn latency and slightly behind on multi-step latency (the single-pipeline architecture adds ~5s overhead per tool execution versus competitors' parallel tool dispatch).
The TUI at v1.0.0 is a SOTA competitive agent interface: streaming responses, gate trace visualization, Information Radiator sidebar, skin system with 10+ presets, adaptive layout, full markdown, mouse support, and personality. The sidebar's gate trace, focus map, and rule counter are capabilities no competitor can replicate — Passepartout's permanent UX differentiator.
The key insight at v1.0.0: Passepartout does not beat competitors at everything. It wins decisively where the architecture's structural advantages apply (safety, cost, offline operation, knowledge management, TUI transparency) and is competitive where they don't (raw LLM inference speed, parallel tool dispatch). This is a defensible position — the niches Passepartout dominates are exactly the niches that matter for a sovereign, local-first AI assistant.
But it is still fundamentally probabilistic at its core. The symbolic engine verifies and constrains, but the generative engine is still the primary reasoning source. The architectural transition to symbolic-first reasoning happens in v3.0.0.
v2.0.0: Lisp Machine Emergence
v2.0.0 is where Passepartout stops being a daemon with clients and becomes the environment. The agent's cognitive loop, the user's editor, the user's shell, and the user's browser run in the same Common Lisp image. The Dispatcher gate stack verifies every action regardless of who initiated it — user or agent. The distinction between "tool" and "self" dissolves.
Why this version matters for UX parity. v0.4.0 through v1.0.0 give Passepartout four interaction surfaces (TUI, messaging apps, Emacs, voice). v2.0.0 inverts the problem: instead of building more clients, it builds a platform where the agent's environment and the user's environment are the same process, separated not by a sandbox but by the Dispatcher gate stack. The editor IS the agent's prompt. The shell IS the agent's actuator. The browser IS the agent's web research tool. There are no clients — there is one Lisp image, one address space, one Org-mode file system.
Architectural principle: Browser inside Lisp, not Lisp inside browser. Lisp is the parent process. It owns the window, the memory, and the input loop. The rendering engine (WebKit/Blink) is a library that paints pixels inside a Lisp buffer. The user can redefine functions while browsing without restarting. Keybinding lookups happen in microseconds (SBCL machine code) — the browser cannot "steal" shortcuts.
Qt/QML via EQL5 — the rendering surface
- Qt/QML (via EQL5) is the UI framework. EQL5 exposes the full Qt C++ API from Common Lisp. QML is declarative — it matches Lisp's generation model.
- Desktop: native look and feel on Linux, macOS, and Windows.
- Mobile: Qt runs natively on iOS and Android. Android uses F-Droid for the unrestricted version and Play Store for sandboxed. iOS uses Guideline 4.7 ("Educational/Developer Tool" loophole, no JIT compilation).
- Safety Bridge for mobile: Lisp code can manipulate browser/files but cannot touch hardware (GPS, camera, contacts) without standard permission pop-ups.
- The minibuffer: a universal command line at the bottom of the screen. Not an Emacs modeline. Not a VS Code command palette. A single command surface for every action — edit files, navigate web, run Lisp expressions, invoke agent commands.
M-xfor everything.
Lish — the Common Lisp editor
Not elisp. Not Emacs. A multi-threaded Common Lisp editor rendered via Qt/QML. The complete system prompt lives in an Org buffer — the agent's identity, its skill registry, its memory, and its reasoning are visible and editable as Org text. The user modifies the agent's prompt and the agent reflects the change immediately — the prompt is a file in memory, not a hidden string in a config.
Org-babel for interactive evaluation: source blocks in Org files are executable. The user evaluates a #+begin_src lisp block and the result appears inline. The agent evaluates blocks to verify code before writing. The REPL is not a separate window — it is the Org buffer in which the agent and user both work.
The editor and the agent share the same Lisp image. The editor is not a client that connects to a daemon — it IS the daemon process. The TUI from v0.3.6 (with word wrap, streaming, gate trace, focus map) is the editor's rendering surface.
Nyxt — the Common Lisp browser (three erosion stages)
The browser is not a one-time feature. It is a multi-year erosion of the rendering stack toward pure Lisp:
Stage 1 — Qt + WebKit. Qt provides window management and native widgets. WebKit renders web content inside a Lisp buffer. Network requests via dexador (pure Lisp). HTML parsed via Plump (pure Lisp). Layout via Yoga (C-based Flexbox, wrapped via FFI). JavaScript via embedded QuickJS. This stage delivers a working browser in months, not years.
Stage 2 — S-expression DOM. Lisp builds its own DOM representation as native S-expressions. WebKit is reduced to pixel painting only — it receives rendered layouts from Lisp, not raw HTML. The agent can traverse and manipulate the DOM as Lisp data structures without serialization. This makes web content natively queryable and modifiable by the agent's cognitive loop.
Stage 3 — Pure Lisp layout. WebKit turned off entirely. Lisp-native layout engine (12-18 months of focused development). CSS subset sufficient for the modern web's 95% use case. JavaScript via QuickJS remains for interactive content. The browser is now a Lisp application that happens to speak HTTP, not a web engine wrapped in a Lisp process.
Lish — the Lisp shell
Bash is a text-stream protocol. Passepartout speaks plists. The Lish shell replaces text streams with structured data — every command returns a plist, not a byte stream. Pipe becomes function composition. Scripts become Lisp functions that operate on memory objects directly.
The agent and the user share the same shell. The user types (list-todos :tag "@urgent"). The agent proposes (shell "npm run build"). The Dispatcher verifies both. The shell is not a separate process — it is a REPL connected to the same Lisp image as the agent's cognitive loop.
Org-mode buffers become the file system. The user's memex (~/memex/) is browsable as a tree of Org headlines. File operations (read, write, list, search) operate on Org AST nodes, not byte streams. A "directory listing" is a tree of headlines. A "file read" is a subtree rendered as text.
Bash remains available as a backend for running external commands, but it is not the primary interface.
Emacs migration — three phases
The Emacs bridge (v0.4.0) is Phase I. The deep integration is three phases, not one:
Phase I — Parasite (v0.4.0). Emacs is a client. The elisp TCP bridge sends text and receives responses. The agent does not control Emacs. Emacs users get a native chat experience alongside the TUI.
Phase II — Interpreter (v2.0.0). An ELisp compatibility layer runs inside Passepartout's Common Lisp image. Key Emacs packages (Org-mode, Magit) run natively without an Emacs process. The compatibility layer does not aim for 100% coverage — it targets the packages the agent's workflows depend on.
Phase III — Successor (v2.0.0 and beyond). Native Common Lisp implementations of Org-mode workflows and Git integration read/write the same file formats. Total independence from Emacs. Emacs users who prefer Emacs keep the bridge. New users get the native experience.
Strategic timeline
v0.4.0 Emacs bridge (Phase I Parasite) → v1.0.0 SOTA parity → v2.0.0 Lish editor + Nyxt browser (Stage 1) + Emacs Phase II/III + mobile. The Qt/QML surface enables gradual erosion of the rendering stack without rewriting the application logic. The three-phase Emacs migration ensures Lisp users are never abandoned — the bridge works from day one, the native experience grows under it.
v3.0.0: Neurosymbolic Maturity
Deterministic planner takes the wheel. LLM relegated to semantic translation.
Architectural approach: Stitching, not building. The symbolic engine is not a from-scratch reasoner. It is an integration of existing Common Lisp libraries connected by macros and DSLs. The Lisp advantage is the macro system — it transforms human-readable rules into formal logic queries without requiring a new engine.
Open-source Lisp stack
- Knowledge Graph: VivaceGraph v3 — Lisp-native graph database with a Prolog-like query language built in. Stores facts, relationships, and rules as native Lisp objects in the same image as the agent.
- Constraint Solver: Screamer — non-deterministic backtracking. Given a set of constraints, finds all valid solutions or proves none exist. Used to verify that proposed actions do not violate invariants.
- Formal Verifier: ACL2 — a theorem prover for Common Lisp, BSD licensed. Proves properties about functions before they are committed to the running image. Used for skill verification and Dispatcher rule validation.
The 10-80-10 architecture
Ten percent neural for input translation, eighty percent symbolic for reasoning against a knowledge graph, ten percent neural for output formatting.
- 10% Input: The LLM translates natural language into structured queries (Prolog facts, knowledge graph lookups). The neural translator is trained via EGGROLL (low-rank evolution strategies) on the reward signal from the symbolic verifier — it learns to produce queries that the symbolic engine accepts.
- 80% Reasoning: Pure Lisp. Task graphs generated by the deterministic planner against the knowledge graph. Formal verification via ACL2. Constraint checking via Screamer. Fact retrieval via VivaceGraph. Zero LLM tokens. Zero hallucinations.
- 10% Output: The LLM formats symbolic results back into natural language. The neural formatter is structurally identical to the translator — same training loop, reversed direction.
The auto-formalizer bootstrap
The symbolic engine needs a populated knowledge graph. The auto-formalizer populates it:
- Feed unstructured data (documentation, manuals, logs, session histories) to the LLM in
auto-formalizermode. - The LLM extracts facts, relationships, and rules as structured S-expressions.
- The symbolic verifier (Screamer + ACL2) checks each extracted fact for consistency with the existing knowledge graph.
- Consistent facts are added. Conflicting facts are flagged for human review.
- Over time, the knowledge graph grows without manual ontology engineering.
DSL approach over engine building
Domain-specific languages, not general-purpose reasoners:
- Lisp macros transform human-readable rules into Prolog queries that run against VivaceGraph.
(defrule check-privacy :when (contains-tag payload "@personal") :then :block)expands to a VivaceGraph query with Screamer constraint checking.- Users write rules in a domain-specific DSL. The macros handle the translation to formal logic.
- The Skill Creator (v0.9.0) generates DSL rules from English descriptions. The auto-formalizer verifies them.
(macroexpand-1 '(defrule ...))shows exactly how the rule compiles — 100% auditable.
Self-correcting gates
Gates learn from the full history of outcomes — did the plan succeed? Where did it fail? The symbolic engine updates its own rules based on results:
- Induced functions from v0.5.0 feed into the symbolic engine as candidate rules.
- The symbolic verifier checks each candidate against the knowledge graph for consistency.
- Rules that pass verification are promoted to the active gate stack.
- Rules that fail verification are discarded with a diagnostic — the agent learns why the pattern doesn't generalize.
Implications
Hallucination becomes structurally impossible because the symbolic engine will not accept a fact that contradicts its knowledge graph. Safety becomes provable because ACL2 can prove properties about the system's behavior. Self-improvement becomes stable because the agent modifies skills that are then verified before execution. The 80% of computation that happens in the symbolic middle layer costs zero LLM tokens.
v4.0.0: Native Inference
LLM inference moves in-process. No external servers. No API keys required for inference.
Lisp as Sovereign Governor, not as Math Engine. The weights themselves are not stored as Lisp objects — this would waste 50% memory on type tags and destroy cache locality through pointer-chasing. Instead, the entire tensor is tagged as a single Lisp object (macro-tag). The Lisp image holds a pointer to optimized flat binary (GPU-friendly, FPGA-compatible). The tag is checked once. After that, all math happens in the optimized backend.
Native inference (FFI binding to llama.cpp)
- FFI binding to llama.cpp via CFFI: load GGUF models, run inference, manage KV cache. Single SBCL image, zero process boundaries. The agent and the model share memory.
- Speculative safety: the Dispatcher gate stack intercepts token generation in real time. A token that would produce a blocked action is preemptively suppressed before generation. No external inference API supports this.
- Foveal-peripheral compute: the model skips pruned context nodes during attention computation. External APIs compute full attention regardless of what you send. In-process inference makes the sparse-tree rendering pay off at the compute level, not just the token level.
Live surgery on cognition
With in-process inference, the agent's internal state becomes inspectable:
- Pause inference mid-stream. Inspect hidden states and activations as Lisp variables.
- Modify a vector, change a sampling parameter, resume.
- Detect when the agent is likely to hallucinate by comparing current activation patterns against historical baselines.
- The REPL becomes a surgical instrument for the agent's own cognition — not just for verifying code, but for inspecting and correcting the neural process that generates it.
DSL-compiled model architectures
Model architectures are described as Lisp DSL:
(defmodel passepartout-reasoning :type 'transformer :heads 32 :dim 4096 :layers 32)- The DSL compiles to machine code for the target backend (GPU via CUDA, FPGA via VexRiscv, CPU via llama.cpp).
- Python interprets at runtime. Lisp compiles once. Model architecture changes are treated the same as code changes — edited, verified, hot-reloaded.
v5.0.0: Hardware — Tagged Lisp Architecture
The Lisp machine becomes physical. RISC-V with tagged architecture, hardware-enforced type checking, and FPGA prototype for the symbolic core.
Not a from-scratch processor. Use RISC-V as the skeleton, add custom Lisp extensions. RISC-V provides the carrier architecture (standard instruction set, existing toolchain, LLVM support). Lisp extensions provide tagged computation (type checking in hardware, parallel garbage collection, S-expression traversal as atomic operations).
The macro-tag approach
- Top 4–8 bits of every memory word = Type Tag. Hardware checks tags in parallel with ALU operations. Trap on type mismatch.
- A tensor (70B weights) is one macro-tagged Lisp object — a pointer to flat binary. The tag is checked once. Math happens at native speed. This replaces "weights as sexps" (which wastes 50% memory on per-weight tags and destroys cache locality).
- Custom instructions: TADD (tagged add), LISP.CAR, LISP.CDR — Lisp primitives as single-cycle hardware operations.
Phase migration: Host → Co-processor → Self-hosted
- Parasitic. Lisp card (FPGA) is a PCIe co-processor. Host CPU (Intel/AMD, Linux/Windows) handles "dirty" I/O — networking, display, file systems. Lisp card handles tagged computation and the agent's cognitive loop. If Lisp crashes, host survives. Reset card, reload. Memory mapping: the card can see the host's memory. The Lisp environment reaches out and inspects data.
- Functional Hijacking. Lisp UI runs on the card, displays through the PC's GPU. The agent indexes Linux files into Lisp objects. The host becomes an I/O server for the Lisp card.
- Driver Cannibalization. Point the agent at C drivers. Ask it to generate native Lisp drivers for the hardware the card controls directly. PCIe Passthrough for direct hardware access.
- Self-Hosting. Replace the Linux bootloader with Stage0 Lisp (a bootstrap from 500 bytes of hex to a self-hosting Lisp). Cut the umbilical cord. The Lisp machine runs on bare metal.
Concrete prototyping milestones
| Stage | Hardware | Cost | What it delivers |
|---|---|---|---|
| TinyTapeout | Custom silicon (130nm) | ~$500–1,000 | 8-bit tagged toy processor with Lisp primitives |
| Shuttle | Multi-project wafer | ~$10,000–20,000 | Tagged RISC-V core at 100–300MHz |
| FPGA | Terasic DE10-Nano / Xilinx KCU105 | ~$200–500 | VexRiscv with custom Lisp extensions, PCIe card form factor |
| Industrial | Commercial foundry (5nm) | ~$10M–100M+ | Competes with modern CPUs on tagged workloads |
Start at TinyTapeout. Validate the tagged architecture works. Move to FPGA. Validate at speed. Only then consider silicon.
Garbage collection in hardware
Dedicated bus master (Scavenger) runs background garbage collection while the main CPU executes code. No "GC pause." The scavenger traverses the heap in parallel with computation, freeing unreachable objects without stopping the agent.
Persistent single-address-space memory
NVRAM for the entire heap. Turn on the machine — state is exactly where you left it. No "booting." No "loading memory from disk." The agent's Merkle-tree memory, skill registry, knowledge graph, and induced functions survive restarts as a contiguous hardware state.
Why this is not "Lisp inside browser"
Most Lisp-on-hardware attempts fail because they try to compete with Intel on raw math. That's the wrong axis. The tagged architecture doesn't need to beat a GPU at matrix multiplication. It needs to beat a CPU at symbolic computation — graph traversal, constraint solving, theorem proving, garbage collection. These are the v3.0.0 symbolic engine's workload. Hardware that makes them single-cycle is the differentiator, not hardware that runs matrix math faster.
v6.0.0: True Agency
World models, temporal reasoning, goal persistence across restarts.
- World models: Predictive models of user behavior, project dynamics, system state.
- Temporal reasoning: Scheduling, deadlines, elapsed duration awareness.
- Goal persistence: Goals survive restarts. Long-term projects in memory-objects.