v0.3.0 complete: - Context Manager (project scoping) with persistence - Async Embedding Gateway (mark-vector-stale, cron, defskill) - TUI Experience (all P0-P4 items) Critical fixes: - input-blocking on child window (agent responses now render) - connect-daemon retry with user-friendly feedback - backspace — normalize Croatoan ncurses codes to keywords - cascade parsing — cl-dotenv quote stripping - skill loader — preserve test-package in-package forms - dispatcher — un-jailed from topological sort exclusion Tests: 184 embedded + 7 TUI integration = 0 failures
60 KiB
Passepartout Evolutionary Roadmap
- The Evolutionary Roadmap
- Non-Negotiable Identity
- Version Roadmap
- Versioning Convention
- v0.1.0: The Autonomous Foundation — RELEASED 2026-04-20
- v0.2.0: Interactive Refinement — RELEASED 2026-04-29
- Professional TUI (Croatoan-based, styled, scrollable)
- Self-editing (error detection, surgical fix, hot-reload)
- Enhanced utilities (structural Lisp/Org manipulation + REPL)
- Onboarding wizard (modular Lisp setup for LLM providers)
- Memory rollback (snapshot and restore)
- Secret Exposure Gate, Shell Safety, Lisp Validation
- Multi-distro deployment (Debian+Fedora, systemd, Docker)
- Project rename to Passepartout (files, packages, env vars)
- 31 org files with full literate prose
- Human-in-the-Loop (HITL)
- Event Orchestrator (unified hooks+cron+routing)
- Context Manager (project scoping)
- Model-Tier Routing (cost optimization)
- Memory Scope Segmentation
- Asynchronous Embedding Gateway
- TUI Experience (Daily Driver Quality)
- v0.2.x Backfill Remediation (stubs and gaps)
- Project Renaming (Bouncer → Dispatcher)
- v0.3.x: Security Hardening
- Competitive Advantage Analysis — v0.3.x Summary
- Competitive Advantage Analysis — v0.4.0 Summary
- Competitive Advantage Analysis — v0.5.0 Summary
- Competitive Advantage Analysis — v0.6.0 Summary
- Competitive Advantage Analysis — v0.7.0 Summary
- Competitive Advantage Analysis — v0.8.0 Summary
- Competitive Advantage Analysis — v0.9.0 Summary
The Evolutionary Roadmap
The roadmap is designed working backwards from SOTA parity (v1.0.0), guiding each version toward a fully autonomous, self-editing agent. Each version builds on the previous, with features designed to be implemented in pure Common Lisp + Org-mode.
The TODO states in each version's Tasks section are the authoritative task tracker. The feature tables describe what each version delivers.
Non-Negotiable Identity
- Pure Common Lisp + Org-mode. No JSON. No YAML. No external databases.
- Single-address-space memory (Lisp hash tables in RAM — the agent IS the memory).
- "Thin harness, fat skills" — complexity lives at the edges, not the kernel.
- One agent composed of many skills. Concurrency via bordeaux-threads (shared memory).
- Plists everywhere — homoiconic communication between all components.
Version Roadmap
Understanding Passepartout as a function in time is not nostalgia. It is architectural guidance. Every decision in v0.x should be made with awareness of where the system is going. Code written today becomes the substrate for v3.0. Skills designed today become the vocabulary the symbolic engine speaks tomorrow.
The probabilistic beginning is not a weakness to overcome. It is the bootstrap. The system learns the domain through probabilistic inference, and that learned knowledge becomes the seed for the symbolic engine. By the time the symbolic engine takes over, it has a rich knowledge graph to reason about, grown from thousands of probabilistic interactions.
This is how you build a reasoning machine: start with a learner, make it learn to verify, let verification become the core, remove the learner once it has learned enough.
Versioning Convention
Feature releases increment the minor version (v0.X.0). Bugfix and hardening releases increment the patch version (v0.X.Y). This ensures that security patches and critical fixes are visible in the version number and can ship independently of feature work. No feature release ships without its prerequisite hardening releases resolved.
v0.1.0: The Autonomous Foundation — RELEASED 2026-04-20
The secure, auditable Lisp kernel. All core infrastructure in place.
DONE Perceive-Reason-Act pipeline
- State "DONE" from "TODO" [2026-04-20 Mon]
DONE Skills engine with jailed loading
- State "DONE" from "TODO" [2026-04-20 Mon]
DONE Policy skill (6 invariants)
- State "DONE" from "TODO" [2026-04-20 Mon]
DONE Memory (memory-object + Merkle hashing)
- State "DONE" from "TODO" [2026-04-20 Mon]
DONE Scribe + Gardener background workers
- State "DONE" from "TODO" [2026-04-20 Mon]
DONE LLM gateway (OpenRouter, Ollama)
- State "DONE" from "TODO" [2026-04-20 Mon]
DONE Shell actuator, Emacs bridge, credentials vault
- State "DONE" from "TODO" [2026-04-20 Mon]
DONE FiveAM test suite
- State "DONE" from "TODO" [2026-04-20 Mon]
v0.2.0: Interactive Refinement — RELEASED 2026-04-29
The "Brain" meets the "Machine." Standardization and professionalization of the user interface and environment.
v0.2.0 through v0.3.0: The Dispatcher Learns
Each version expands the deterministic layer. The Dispatcher writes rules from approved exceptions. Shadow mode runs trial executions. Tool permission tiers mature from simple allow/deny to nuanced context-aware policies. The agent becomes less likely to attempt dangerous actions not because it is smarter but because the guard has more complete information.
This is the bootstrapping phase. The system learns by watching itself and its user. Every blocked action becomes a rule. Every approved exception becomes a pattern. The symbolic layer grows at the probabilistic layer's expense.
DONE Professional TUI (Croatoan-based, styled, scrollable)
- State "DONE" from "TODO" [2026-04-29 Wed]
DONE Self-editing (error detection, surgical fix, hot-reload)
- State "DONE" from "TODO" [2026-04-29 Wed]
DONE Enhanced utilities (structural Lisp/Org manipulation + REPL)
- State "DONE" from "TODO" [2026-04-29 Wed]
DONE Onboarding wizard (modular Lisp setup for LLM providers)
- State "DONE" from "TODO" [2026-04-29 Wed]
DONE Memory rollback (snapshot and restore)
- State "DONE" from "TODO" [2026-04-29 Wed]
DONE Secret Exposure Gate, Shell Safety, Lisp Validation
- State "DONE" from "TODO" [2026-05-02 Sat]
DONE Multi-distro deployment (Debian+Fedora, systemd, Docker)
- State "DONE" from "TODO" [2026-05-02 Sat]
DONE Project rename to Passepartout (files, packages, env vars)
- State "DONE" from "TODO" [2026-05-02 Sat]
DONE 31 org files with full literate prose
- State "DONE" from "TODO" [2026-05-02 Sat]
DONE Human-in-the-Loop (HITL)
CLOSED: [2026-05-03 Sun 14:00]
- State "DONE" from "TODO" [2026-05-03 Sun 14:00]
Continuation-based interaction. The agent can suspend its cognitive loop to ask for permission or clarification and resume precisely where it left off. Builds on the dispatcher's existing Flight Plan mechanism.
DONE Event Orchestrator (unified hooks+cron+routing)
- State "DONE" from "TODO" [2026-05-02 Sat 22:36]
Unified control plane for hooks, cron, and complexity-based routing.
- hook-registry + cron-registry + tier classifier
- Hooks via
#+HOOK:Org-mode properties - Three complexity tiers:
:REFLEX(no LLM),:COGNITION(light LLM),:REASONING(full LLM) - Hooked into heartbeat for cron processing
- Rule-based tier classifier (overrideable via
*tier-classifier*)
DONE Context Manager (project scoping)
CLOSED: [2026-05-05 Tue]
- State "DONE" from "TODO" [2026-05-05 Tue]
Stack-based project focusing with persistence.
push-context~/~pop-context~/~with-contextstack operationscurrent-scopewired into perceive gate*scope-resolver*/focus~/~/scope~/~/unfocusTUI commands- Context stack persisted to
~/.cache/passepartout/context.lisp, auto-restores on boot
DONE Model-Tier Routing (cost optimization)
CLOSED: [2026-05-03 Sun 16:00]
- State "DONE" from "TODO" [2026-05-03 Sun 16:00]
Extend *model-selector* for quadrant-based routing with per-slot provider cascades.
- Privacy filter (local-only for @personal content) — top priority
- Quadrant tagging (foreground/background × probabilistic/deterministic)
- Complexity classifier (code/plan/chat/background slots), each with its own provider cascade
- Model-selector skill registers into
*model-selector*hook
Deferred to v0.4.0: budget tracking per request, per-session cost monitoring. Deferred to v0.9.0: TUI /config command for cascade configuration (env vars for now).
DONE Memory Scope Segmentation
CLOSED: [2026-05-03 Sun 16:30]
- State "DONE" from "TODO" [2026-05-03 Sun 16:30]
Extend memory-object with :scope property.
:memex(permanent knowledge),:session(ephemeral),:project(current work)- Scope-aware retrieval in memory layer
DONE Asynchronous Embedding Gateway
CLOSED: [2026-05-05 Tue]
- State "DONE" from "TODO" [2026-05-05 Tue]
Provider-agnostic vector generation (Ollama, OpenAI, hashing fallback).
- Three backends: local (Ollama-compatible), openai (/v1/embeddings), hashing (SHA-256)
embeddings-computeand*embedding-backend*for runtime provider selectioningest-astpopulates vectors at object creation timemark-vector-stalemarks vectors as:pendingand queues for re-embeddingembed-all-pendingdrains queue, computes vectors, stores in*memory-store*- Cron job registered with orchestrator: runs every 10m on
:reflextier EMBEDDING_PROVIDERenv var for provider selection- Registered as proper skill (
defskill~:passepartout-system-model-embedding~)
Note: The default :hashing backend uses SHA-256-derived vectors. SHA-256 is a
cryptographic hash with the avalanche property — one-bit input differences produce
entirely different outputs. This makes it a correct integrity check (Merkle tree)
but an incorrect similarity function (semantic retrieval). v0.3.3 replaces it with
a zero-dependency lexical similarity algorithm that actually captures textual
overlap while remaining offline-capable.
DONE TUI Experience (Daily Driver Quality)
CLOSED: [2026-05-05 Tue]
- State "DONE" from "TODO" [2026-05-05 Tue]
All P0-P4 items implemented:
- P0: Chat scrollback (Page Up/Down), Input history (up/down arrows)
- P1: Status bar (connection, mode, msg count, scroll, activity indicator)
- P1: Message rendering (timestamps, colors, role icons)
- P2: Command palette (
/helpcommand listing) - P2: Multi-line input (
\ + Enterinserts newline) - P3: Background activity indicator (
…thinkingspinner) - P4: Tab completion for all
/~commands - P4: Configurable theme (
*tui-theme*plist,~/theme~command)
DONE v0.2.x Backfill Remediation (stubs and gaps)
CLOSED: [2026-05-03 Sun]
- State "DONE" from "TODO" [2026-05-03 Sun]
- P0: vault-get-secret / vault-set-secret wrappers (one-line delegation to vault-get/vault-set with
:type :secret) - P0: system-archivist Scribe + Gardener (distill daily logs → atomic notes; scan broken links, orphaned memory-objects)
- P0: system-self-improve surgical edit + error fix (read → replace → snapshot → write → balance → tangle → reload)
- P0: programming-org org-modify + org-ast-render (locate node by ID, apply changes; convert plist AST → Org text)
- P0: programming-literate balance check + tangle sync (verify balanced parens in source blocks; verify .lisp matches tangled output)
- P1: system-event-orchestrator bootstrap (scan Org files for HOOK/CRON properties, register via existing registries)
- P1: system-memory introspection (structured statistics: object count by type, TODO distribution, orphans, snapshots)
- P1: path relic skills/ → lisp/ (update skill-initialize-all and context-skill-source to resolve against lisp/ directory)
- P2: core-context semantic retrieval (populate org-object-vector at ingest; fallback: TF-IDF bag-of-words)
- P2: core-context subtree-based skill source loading (context-skill-subtree for targeted retrieval by heading name)
- P3: Variable name drift normalization (memory vs memory-store, skills-registry vs skill-registry)
- P4: Eliminate STYLE-WARNINGs from setup output (reorder defuns for same-file forward references; accept cross-skill references)
DONE Project Renaming (Bouncer → Dispatcher)
- State "DONE" from "TODO" [2026-05-02 Sat 22:00]
The Dispatcher's role has evolved beyond security guard. It is the seed of the deterministic engine — it learns to execute procedures without invoking the neural net.
v0.3.x: Security Hardening
Before any feature work proceeds, three classes of vulnerability are patched. These are not feature releases — they are the floor the system must stand on before v0.4.0 feature development begins. The versioning reflects this: patch releases (v0.3.Y) are reserved for fixes with zero architectural impact.
A note on parser safety and competitive positioning: SBCL defaults to *read-eval* t, which means the #. reader macro can execute arbitrary Lisp during parsing. Three code paths in the current codebase read untrusted input without binding *read-eval* nil — the LLM output parser, the memory snapshot loader, and the system eval actuator. This is not a theoretical risk: a single hallucinated or adversarial LLM output containing #.(shell "dangerous command") bypasses all nine vectors of the Dispatcher's safety gate before any gate ever sees the action. No other SOTA agent parses unstructured model output with an eval-capable reader — they use JSON schemas, function-calling APIs, or at minimum bind *read-eval* nil. Fixing this is three lines of Lisp and gives Passepartout an immediate safety advantage: the same deterministic safety gates that other agents lack are now structurally guaranteed to see every action before it executes.
v0.3.1 — Parser RCE elimination
Rationale: SBCL's default *read-eval* accessor is ~t, enabling the #. reader macro to execute arbitrary Lisp forms during parsing. Three code paths in the current codebase process untrusted input with read-from-string or read without binding *read-eval* to nil. Each represents a remote code execution vector that bypasses all deterministic safety gates — the Dispatcher's shell safety check, path protection, secret scanning, and network exfiltration detection never execute because the malicious form is evaluated during parsing, before the action plist is even constructed.
- Wrap
read-from-stringinthink()(core-loop-reason.lisp:102) with(let ((*read-eval* nil)) ...)— LLM output is untrusted by definition; parsing it must never execute code. The markdown-strip regex already runs, so the fix immediately follows it. - Wrap
readinload-memory-from-disk(core-memory.lisp:143) with(let ((*read-eval* nil)) ...)— thememory.snapfile lives in ~~/ by default and could be corrupted or planted. - Wrap
read-from-stringinaction-system-execute(core-loop-act.lisp:62) with(let ((*read-eval* nil)) ...)— the:system :evalpath executes untrusted payload code. Explicitly assert that this path requires the Dispatcher's approval gate. - Add FiveAM test: inject
"(#.(shell \"echo pwned\"))"into thethink()pipeline and assert no shell execution occurs.
v0.3.2 — Shell safety & actuator sandboxing
Rationale: The :system :eval actuator path is currently unchecked by the Dispatcher's approval gate — only :shell and :tool "shell" trigger HITL. The shell actuator wraps commands through double bash -c nesting (system-actuator-shell.lisp:10), where Lisp's format with s produces S-expression-safe strings, not shell-safe strings. A command containing quotes or substitution characters can break out. Additionally, skill files loaded via skill-initialize-all execute arbitrary Lisp in jailed packages — a skill file containing (uiop:run-program "dangerous") executes immediately on load before any gate can inspect it.
- Fix shell double-wrapping: remove the outer
bash -cinactuator-shell-execute; pass the command string directly touiop:run-programwith:force-shell nil. The timeout wrapping remains via the OStimeoutbinary. - Extend the Dispatcher approval requirement to the
:system :evalpath (currently only:shelland:tool "shell"trigger HITL). An unboundedevalshould require the same Flight Plan approval as a shell command. - Add skill sandbox mode for
skill-initialize-all: load each skill's code into a temporary jailed package, run the registered trigger function in isolation, verify it imports no restricted symbols (from CL package:run-program,shell,run-shell-command), then promote to the live registry on pass. - Add FiveAM test: register a skill containing
(uiop:run-program "echo test")in the body and verify the sandbox blocks its promotion.
v0.3.3 — Semantic retrieval activation
Rationale: Two independent failures prevent the foveal-peripheral semantic retrieval path from ever firing. First, context-awareness-assemble never passes :foveal-vector to context-object-render, so the renderer receives nil for foveal-vector and the similarity calculation always returns 0.0. Second, the default :hashing embedding backend uses SHA-256 (a cryptographic hash with the avalanche property) as a similarity function. SHA-256 is designed to produce entirely different outputs for nearly identical inputs — the property that makes it secure for integrity verification is precisely what makes it useless for semantic retrieval. A content-addressed Merkle tree correctly uses SHA-256 for identity; a retrieval engine needs a similarity function, not an identity function. The infrastructure for real embeddings (local with Ollama, openai with the embeddings API) is fully implemented and working — this release activates the last-mile wiring and replaces the semantically blind default with a zero-dependency algorithm that actually captures textual overlap.
- Wire
:foveal-vectorintocontext-awareness-assemble: pass(memory-object-vector (memory-object-get foveal-id))as the:foveal-vectorargument to thecontext-object-rendercall (one line incore-context.lisp:148-150). - Replace
:hashingdefault backend with character-trigram Jaccard similarity. Pure Lisp, zero external dependencies, works exactly as offline as SHA-256, but captures lexical overlap: "authentication" and "authenticate" share trigrams "aut," "uth," "the," "hen," "ent," etc. The vector is a bloom filter of trigrams; cosine similarity maps to Jaccard (intersection / union). This provides real if crude semantic signal without any server. - Rename existing
embedding-backend-hashingtoembedding-backend-sha256and repurpose it as an explicit:sha256provider for environments where even trivial Lisp computation is undesirable (embedded, resource-constrained). Document it as "integrity-only, no semantic retrieval capability." - Add
EMBEDDING_PROVIDERguidance to the setup wizard: explain that:hashingis the default offline fallback,:localrequires Ollama withnomic-embed-text, and:openaiuses the paid embeddings API. - Add FiveAM test: ingest two semantically related nodes ("implement login form" and "add password authentication"), verify cosine similarity > 0.0 with the trigram backend.
Competitive Advantage Analysis — v0.3.x Summary
Safety is Passepartout's strongest differentiator, but the current codebase undermines it at the parser level. Fixing the three *read-eval* gaps means the nine deterministic safety vectors actually see every action before execution. No competitor — not Claude Code, not OpenClaw, not Hermes — has a comparable stacked gate architecture. By fixing the parser vulnerability, the architecture's safety claim becomes structurally true rather than aspirationally documented.
The semantic retrieval fix (one line to wire foveal-vector, one backend replacement) activates the foveal-peripheral model's full power: deep nodes that are topically related to the user's focus now surface automatically. Without this, the context model is "dumb truncation at depth 2." With it, it's genuine semantic awareness — and since the retrieval is deterministic (in-image vector math, zero LLM tokens), the cost advantage over competitors' LLM-assisted search compounds with every query.
The shell and actuator fixes close the remaining execution-surface gaps. The skill sandbox mode creates a loading boundary that no current agent framework provides — skills are verified before they join the running image, not trusted by convention.
v0.4.0: Token Economics & Prompt Efficiency
The architecture's single largest gap versus SOTA: Passepartout currently spends tokens like a research prototype. Every think() call rebuilds and retransmits the full system prompt — IDENTITY + TOOLS + CONTEXT + LOGS + SKILL_AUGMENTS — with no caching, no budget, and no incremental assembly. The foveal-peripheral model prunes memory content but doesn't touch the fixed overhead. With 20+ skills by v1.0.0, system prompt overhead alone could reach 3,000–8,000 tokens per call before user input is even processed.
Competitors (Claude Code, OpenClaw, Copilot) all implement some form of prefix caching — Anthropic's API gives 90% discount on cached tokens, OpenAI caches automatically. Passepartout's prompt structure is already naturally cacheable: IDENTITY, TOOLS, and LOGS format are static across calls. This version turns that structural property into a cost advantage.
Design insight: why token economics is the structural differentiator. Passepartout's sparse-tree rendering and deterministic safety gates should produce 2–3x fewer tokens than competitors for equivalent coding tasks, and 13–24x fewer for knowledge management. But without caching and budget enforcement, the fixed overhead per call eats these savings. A coding session that touches 30 files with competent context management costs ~72K tokens (Passepartout) versus ~185K (Claude Code). Without caching, the Passepartout number climbs toward ~150K because every call retransmits the static prefix. The architectural advantage exists in theory but requires operational plumbing to materialize.
TODO Tokenizer integration
- Integrate a tokenizer for at minimum the model families used in the provider cascade (cl100k_base for OpenAI, claude-3 tokenizer for Anthropic). Options: FFI binding to tiktoken via CFFI, or a pure-Lisp port of the BPE tokenizer for cl100k_base (the encoding table is ~100KB, the algorithm is ~100 lines).
- Expose
(count-tokens text &key model)as a core utility. - Use for three purposes: context budget enforcement (reject assembly if over limit), cost estimation (tokens × provider price), and prompt optimization (measure which sections of the system prompt consume the most budget).
TODO Prompt prefix caching
- Split the system prompt into a static prefix (IDENTITY string, TOOLS section, LOGS format header) and a dynamic suffix (CONTEXT render, current log entries, skill augments, user prompt).
- Track a hash of the static prefix; only retransmit when it changes (skill load/unload, identity config change). On cache hit, send the cached prefix with the dynamic suffix appended.
- Implement the Anthropic prompt-caching header protocol for providers that support it (claude-3-* models, up to 90% discount on cached tokens). For OpenAI, the automatic caching layer handles prefix detection without explicit headers.
- Log cache hit/miss rate to telemetry for cost tracking.
TODO Incremental context assembly
- Cache the last rendered
context-awareness-assemblestring with metadata: foveal-id at render time, scope, last memory modification timestamp. - On
think()invocation: if foveal-id, scope, and memory-modification-timestamp are unchanged since the cached render, return the cached string. This eliminates re-rendering on heartbeat ticks, tool-output feedback loops, and multi-turn conversations where the user hasn't changed focus. - Invalidate the cache on any
ingest-astcall, anyorg-modify, or any focus change. - For heartbeats specifically: skip context assembly entirely — the heartbeat sensor bypasses the reason gate (returns early in
loop-gate-reason:154), so building awareness for a signal that won't call the LLM is pure waste. Add an early return inthink()for:heartbeat/:delegationsensors.
TODO Per-call token budget
CONTEXT_MAX_TOKENSenv var (default: 16384, half of a 32K context window to leave room for model response).- In
think(): compute total token count (static prefix + dynamic context + user prompt). If over budget, progressively trim: first truncate system logs to 5 lines, then drop skill augments from non-triggered skills, then if still over, downgrade peripheral nodes to title-only (disable:foveal-vectorpath, render strict depth ≤ 2). - Log budget violations to telemetry with the trimmed-token count for diagnostics.
- The goal: Passepartout never silently exceeds a model's context window. Silent truncation by the model API produces undefined behavior (mid-thought cutoff, lost instructions). A system that knows it's over budget can degrade intentionally.
TODO Cost tracking
- Per-provider pricing lookup table: input/output token costs for each model in the provider cascade (gpt-4o-mini, claude-3-5-sonnet, deepseek-chat, llama-3.1-70b, groq-llama, etc.).
- After each
backend-cascade-call: compute cost as (input_tokens × input_price + output_tokens × output_price), log to session accumulator, emit:cost-updatetelemetry event. - Per-session cumulative cost stored in memory (
*session-cost*plist:(:total <float> :by-provider <alist> :by-task <alist>)). - TUI status bar shows current session cost (optional, off by default, toggled via
/costcommand). COST_BUDGET_DAILYenv var with soft cap — warning injected into system prompt when approaching budget, HITL gate on any single action exceeding 25% of remaining budget.
Competitive Advantage Analysis — v0.4.0 Summary
Token economics is the dimension where the architecture's theoretical advantage becomes operationally real. The foveal-peripheral model and deterministic gates reduce the tokens needed per task; prompt caching and incremental assembly reduce the tokens spent per task. Combined, the 2–3x coding savings and 13–24x knowledge management savings in the DESIGN_DECISIONS token analysis become achievable rather than aspirational.
The cost tracking and budget enforcement are defensive advantages: no competitor gives the user visibility into per-task LLM cost. Claude Code and Copilot obscure cost behind flat-rate subscriptions. Passepartout's transparent cost model is a sovereignty feature — the user knows what the agent spends on their behalf and can cap it.
The minimum viable local model advantage is structural: at 2,000–4,000 effective tokens (foveal-peripheral + caching), a 7–8B parameter model on consumer hardware is a daily driver. Competitors at 32K+ effective tokens require 70B+ parameter models and 16–32 GB VRAM. Passepartout runs on a laptop GPU where competitors need a data center card or cloud API.
v0.5.0: Signal Pipeline & Concurrency
The current pipeline is strictly sequential — one signal traverses Perceive → Reason → Act before the next signal begins. Background tasks (heartbeat, embedding cron, gardener scans) compete with foreground interactions. A heartbeat that fires during a long tool chain is queued. A Telegram message during a multi-step planning cycle is queued. The system feels sluggish under concurrent load even though the symbolic operations are near-instant (SBCL hash table lookups are microseconds) — the bottleneck is the single-pipeline architecture, not the hardware.
Design insight: why concurrency matters for an agent that is "one brain." Passepartout rejects multi-agent delegation on principle (see DESIGN_DECISIONS "One Single Agent"). But a single brain handles multiple inputs simultaneously — the human brain processes vision, audio, and proprioception in parallel. Rejecting multi-agent delegation does not require rejecting concurrency within the agent. The key is that all concurrent operations share the same memory space, the same Merkle tree, and the same deterministic gate stack. They are threads of one cognition, not separate agents.
TODO Priority-queue signal processing
-
Replace the linear
process-signalcall chain with a priority-ordered signal queue. The queue is a sorted plist-list consumed by the main loop. Priority tiers::user-input/:chat-message— highest priority (the user is waiting):approval-required— high (HITL re-injections need quick resolution):tool-output— medium (feedback from tool execution, needs LLM assessment):interrupt— medium-high (shutdown signal):heartbeat/:cron/:delegation— low (background maintenance)
- Coalesce duplicate heartbeats: if the queue already contains a
:heartbeatsignal when a new one arrives, discard the older one (no value in processing stale ticks). Keep at most one pending heartbeat at any time. - The main loop drains the highest-priority signal from the queue, processes it through the pipeline, and repeats. If the pipeline produces feedback (tool-output → think), the feedback is enqueued at its appropriate priority — it may preempt background signals but won't interrupt the current signal mid-processing.
- Add telemetry: average queue depth by priority tier, max wait time per tier.
TODO MVCC memory concurrency
- Replace
*memory-store*(mutable global hash table) with a versioned Merkle-root pointer. The root is an(or null merkle-node)struct containing the tree and a monotonic version counter. - Read threads snapshot the root before beginning their pipeline cycle. All object lookups dereference through the snapshot — they see a consistent view of memory regardless of concurrent writes. Reads never block.
- Write threads (ingest-ast, org-modify, snapshot-memory) build new object hashes, construct a new Merkle root, and CAS-replace the global root pointer. If another thread won the CAS race (root version changed), the loser re-reads the new root, replays its changes on the updated tree, and retries the CAS.
- Conflict probability is near-zero because concurrent signals almost never touch the same Org headline. The replay-on-conflict path exists for correctness but is rarely exercised. Lock contention is eliminated — the only atomic operation is the CAS on the root pointer.
- Remove the single-threaded pipeline assumption: previously,
process-signalwas safe because nothing else wrote to*memory-store*during its execution. With MVCC, multiple signals can process concurrently because each has its own snapshot. The*loop-interrupt-lock*becomes*signal-queue-lock*(protecting only the queue, not the memory). - Test: concurrent ingest-ast from two threads writing to different memory objects, verify both commits succeed without corruption.
TODO Structured output enforcement
- Add a plist validation step between
markdown-stripandread-from-stringinthink(). Before attempting to parse, validate: (a) the output starts with(or[, (b) it contains balanced delimiters (count opens vs closes), (c) it doesn't contain#.(redundant after v0.3.1*read-eval* nilbut defense-in-depth). - On validation failure: construct a rejection trace (similar to the existing deterministic gate rejection feedback) and re-inject into the LLM prompt. The trace includes the raw output and a diagnostic ("Your response did not produce a valid plist. Ensure it starts with ( and has balanced parentheses.").
- Configurable
LLM_OUTPUT_RETRIES(default 2). After exhausting retries, fall through with the raw text as a:MESSAGEaction (current behavior). - Track parse-failure rate per provider in telemetry. Use to guide provider cascade ordering: a provider with 20% parse-failure rate falls behind one with 2%.
Competitive Advantage Analysis — v0.5.0 Summary
The priority queue eliminates the perception of sluggishness that concurrent load creates. A user typing a query never waits for a heartbeat tick to finish — their signal jumps the queue. The coalescing of duplicate heartbeats eliminates wasted processing. This is table-stakes UX for a daily-driver agent.
MVCC concurrency on the Merkle tree is genuinely novel for an AI agent. Most agents use either a single-threaded event loop (Claude Code) or process-level isolation (OpenClaw's subprocess model). Passepartout's approach — concurrent threads sharing a versioned content-addressable tree — combines the coherence of a single-agent memory with the throughput of concurrent execution. The Merkle tree, originally designed for integrity verification, gets a second life as the concurrency control primitive. This is the kind of architectural synergy that single-purpose databases can't match.
Structured output enforcement bridges the gap between "Passepartout uses plists, not JSON" and "LLMs sometimes produce malformed syntax." It gives the system the same reliability guarantee that JSON mode gives competitors — the output will parse — without introducing JSON into the architecture.
v0.6.0: Tool Ecosystem (MCP-Native)
The original roadmap placed MCP at v0.7.0 and planned "10+ cognitive tools" built from scratch for v1.0.0. This is inverted: the ecosystem already provides 50+ tools (filesystem, git, postgres, slack, github, web search, memory servers). Building bespoke tools from scratch duplicates work the community has already done and tested. Passepartout's advantage is not in tool implementation but in tool orchestration — the deterministic gate stack that verifies every tool invocation before execution.
Why MCP matters for competitive positioning: Claude Code's native tools (Read, Write, Edit, Bash, Grep, Glob, WebSearch) are implemented in TypeScript within the Claude Code runtime. They are not extensible — you cannot add a tool without modifying the runtime. OpenClaw's tools are similarly baked into the Node.js process. By building a native MCP client, Passepartout gains tool breadth that exceeds both competitors (50+ tools via the MCP ecosystem versus ~10 native tools) without building a single tool implementation. The tool quality is maintained by the ecosystem; the safety verification is maintained by Passepartout's gate stack. This division of labor is the right architecture for a small team building a competitor to well-funded commercial agents.
TODO MCP native client
- Pure Common Lisp MCP client: parse JSON-RPC messages from MCP servers over stdio or SSE. No Python bridge, no Node.js subprocess. The client runs in the same Lisp image as the agent — zero serialization overhead between the agent and the MCP layer.
- Implement the MCP protocol lifecycle: initialize handshake, list tools, call tool, handle notifications. Each MCP server registers its tools as entries in Passepartout's
*cognitive-tool-registry*at connection time — the LLM's tool belt prompt automatically expands to include them. MCP_SERVERSenv var: comma-separated paths to MCP server config files (JSON). Each config specifies the server command, args, and env vars. Example:MCP_SERVERS=~/.config/passepartout/mcp/filesystem.json,~/.config/passepartout/mcp/git.json.- Tool invocation route: LLM proposes a tool call → Dispatcher verifies against permission table → MCP client serializes call as JSON-RPC → server executes → result deserialized back to plist → returned to LLM as tool output. The Dispatcher does not distinguish between native tools and MCP tools — the gate stack is uniform.
- Register the MCP client as a skill (
defskill~:passepartout-mcp-client~) so it can be hot-reloaded. The MCP client is not core infrastructure — it is a skill that extends the tool ecosystem.
TODO Core MCP tools (from existing roadmap items)
- Git Steward (deferred from old v0.4.0): status, diff, commit, push, branch via the MCP Git server. Policy gate enforces commit-before-modify: any file write to a git-tracked directory must be preceded by a diff review.
- Web Research (deferred from old v0.6.0): headless browser via Puppeteer/Playwright MCP server. Text extraction, screenshot capture, page interaction.
- Interactive PTY (deferred from old v0.5.0): stream long-running process output to context window, async interrupt control.
TODO Environment Steward
- Detect "command not found" in shell actuator output.
- Search system PATH and package manager registries for the missing command.
- Propose installation command and retry the failed action on user approval.
- Cache resolved dependency paths to avoid repeated searches.
Competitive Advantage Analysis — v0.6.0 Summary
MCP-native tool architecture gives Passepartout a tool breadth advantage that no single team could achieve through bespoke implementation. The MCP ecosystem is growing faster than any individual agent's tool set. By connecting to it rather than competing with it, Passepartout's tool count scales with the ecosystem — every new MCP server is a new Passepartout tool.
The Dispatcher's tool permission table (allow/ask/deny) applies uniformly to MCP tools, giving Passepartout tool-level security granularity that competitors lack. Claude Code's tools are binary: available or not. Passepartout can conditionally allow filesystem writes to /projects/* while requiring HITL for writes to ~/.config/* — per-path, per-tool, per-session. This is the deterministic gate stack's natural application domain.
The Git policy gate (commit-before-modify) is a safety feature no competitor provides. It prevents the most common agent failure mode: modifying files without preserving the prior state. Combined with memory snapshots (v0.2.0), this gives every action a dual audit trail: the git history and the memory object history.
v0.7.0: Planning, Self-Modification & Deterministic Routing
v0.6.0 provides the tools. v0.7.0 provides the brain that orchestrates them. The two releases are sequenced this way because planning without tools is architecture without construction — the plans describe actions the system cannot execute. With tools in place, planning becomes actionable.
Design insight: the inverted tier classifier. The current tier classifier routes "rm", "write-file", and "shell" to :REFLEX (no LLM). This routes the most dangerous operations to the path with the least oversight. It should be inverted: :REFLEX handles deterministic lookups (list TODOs, check file existence, query memory), :COGNITION handles text processing and summarization, :REASONING handles planning and code generation. Dangerous operations should always route through :REASONING where the full LLM cycle and Dispatcher gate stack apply. v0.7.1 fixes this.
TODO Long-horizon planning (task tree DAG)
- Decompose complex tasks into Org-mode headline trees. Each task node is a memory-object with terminal states:
:todo→:next-action→:in-progress→:done/:blocked/:stuck. - The LLM generates the initial task tree from the user's request. The REASONING tier processes each leaf task sequentially, updating node states as it progresses.
- Parent nodes summarise child results: when all children of a node reach
:done, the parent is promoted to:donewith a synthesised summary. When any child reaches:stuck, the parent is promoted to:blockedwith the blocking child's diagnostic. - Branch pruning: if a child is
:stuckafter three retries with different LLM providers, the parent re-plans the branch — the LLM generates alternative decomposition paths for the blocked sub-task. - Task trees persist as Org headlines in
/memex/system/tasks/. Survive restarts. Visible to the user as editable Org files.
TODO Tier classifier fix
- Invert the current classifier:
:REFLEX= deterministic lookups only (memory query, file-exists-p, check time, list TODOs by tag).:COGNITION= text processing, summarization, simple Q&A, note formatting.:REASONING= planning, code generation, multi-step task execution, dangerous operations. - Track classifier accuracy via telemetry: for each classified action, record whether the classification was appropriate (did the
:REFLEXaction actually succeed without LLM? did a:REASONINGaction turn out to be a simple lookup?). - The classifier function is overrideable via
*tier-classifier*, allowing users or skills to customize routing. - The classifier should be a skill, not core infrastructure — reloadable and replaceable without restart.
TODO Skill Creator
- LLM drafts complete skill org-file from natural language description.
- Mandatory pipeline: (a) syntax validation via
lisp-syntax-validate, (b) sandbox-load in temporary jailed package (v0.3.2), (c) run registered trigger function against mock contexts, (d) run registered deterministic gate against mock proposals, (e) on pass, promote to live registry underpassepartout.skills.<name>. - Required
:repl-verifiedflag on alldefunforms — the existing Dispatcher lint check (core-loop-act.lisp:152–161) warns on writes without verification. The Skill Creator enforces this at creation time. - Skills are the primary extension mechanism for users. The Skill Creator makes skill authoring accessible to non-Lisp-programmers: describe what you want in English, the LLM drafts the Org file, the system verifies it, and the skill is live. This is how Passepartout grows its capability surface without requiring the user to learn Common Lisp.
Competitive Advantage Analysis — v0.7.0 Summary
The task tree DAG with terminal states and branch pruning is Passepartout's planning primitive — analogous to Claude Code's TODO list but structural (Org headlines with parent-child relationships) rather than flat. The advantage: subtask dependencies are explicit in the tree structure, so the agent knows that task C depends on tasks A and B without having to rediscover this from context. Parent summarisation means the LLM can check high-level progress without re-reading every child's output — a token savings multiplier on long-running tasks.
The tier classifier fix is a safety correctness issue. The current inverted classifier (dangerous ops → no-LLM path) is actively harmful — it reduces oversight on the operations that need it most. Fixing this means "dangerous by default → maximal oversight" becomes the routing rule, which is the correct security posture.
The Skill Creator is the mechanism by which Passepartout escapes the "team of Lisp programmers" constraint. Most agent frameworks require Python/TypeScript to extend. Passepartout's extension language is English — the LLM writes the Lisp, the system verifies it. The sandbox-load and verification pipeline (from v0.3.2) make this safe: a skill that fails verification never enters the running image.
v0.8.0: Evaluation, Vision & Streaming
With tools (v0.6.0) and planning (v0.7.0) in place, the agent can execute complex multi-step tasks. v0.8.0 answers two questions: (1) how do we prove it works? (SWE-bench evaluation harness), and (2) what capabilities does the user actually experience? (vision for UI interaction, streaming for responsive TUI).
TODO SWE-bench harness
- Automated pipeline: clone a repository from SWE-bench dataset, parse the GitHub issue, feed the issue description into Passepartout's cognitive loop, track the resolution trajectory as an Org headline tree, apply the generated patch, run the repository's test suite, score success (tests pass yes/no).
- Trajectory persistence: each benchmark run produces an Org file under
/memex/system/benchmarks/recording everythink()call, every tool invocation, every Dispatcher decision, and the final test result. The trajectory is auditable — a human can read why the agent made each decision and where it went wrong on failures. - Regression mode: run the same benchmark after each version release. Track score trends. A version that regresses on SWE-bench does not ship.
- Target: competitive score with Claude Code and OpenClaw on SWE-bench-verified by v1.0.0. The evaluation harness ships in v0.8.0 so there are two full version cycles to iterate and improve before v1.0.0 ships.
TODO Computer Use / Vision
- Screenshot capture: X11 (
xwd/import) and Wayland (grim) bridge. The agent requests a screenshot of a specific window or the full desktop. - Vision model integration: send screenshot to a vision-capable model (GPT-4V, Claude 3.5, Gemini 2.0 Flash). The model analyzes UI elements and returns structured descriptions.
- Coordinate-based interaction:
xdotool/ydotoolfor click and type commands at specific screen coordinates. Dispatcher approval gate applies — screen interaction requires HITL by default, overridable per-application via permission table. - Use case: the user says "open Firefox, search for the Passepartout GitHub repo, and star it." The agent captures screenshots, identifies UI elements via the vision model, and issues click/type commands. Each step is verified by a follow-up screenshot to confirm the action succeeded.
TODO Streaming responses
- Stream LLM output from the daemon to the TUI via the existing TCP protocol. Add a new frame type (
:type :stream-chunk) that carries partial response text. - The TUI renders partial output in the chat window, replacing the "…thinking" spinner with live text. The user sees the agent's response as it's generated, character by character.
- Early termination: the user can press
^Cduring streaming to interrupt the LLM and inject an interrupt signal. The partial response is captured, the LLM call is cancelled if the provider supports it, and the agent resumes with the user's interruption as new input. - Streaming also enables progressive tool execution: if the LLM output contains a tool call, the system can begin executing it before the full response is complete (speculative execution, rolled back if the remainder of the response invalidates the call).
Competitive Advantage Analysis — v0.8.0 Summary
SWE-bench evaluation is the industry standard for coding agent capability claims. Without it, "SOTA parity" is a marketing claim. With it, "SOTA parity" is a number. The harness's trajectory persistence is a differentiator: most evaluation harnesses produce a pass/fail score. Passepartout's produces a complete Org-mode audit trail showing exactly where the reasoning succeeded or failed. This turns benchmarking into a debugging tool — failed trajectories point directly to the skill, gate, or model that needs improvement.
Vision + screen interaction is table stakes for competing with Claude Code's computer use feature. The Passepartout advantage: every screen interaction passes through the Dispatcher gate stack. A vision model might hallucinate a UI element that doesn't exist — the follow-up screenshot verification catches this deterministically. Competitors' computer use features lack this verification step — they trust the vision model's output.
Streaming is a UX requirement, not a capability requirement. But UX determines adoption. A chat interface that shows the response building in real time feels responsive; a spinner followed by a wall of text feels slow. Streaming with early termination also saves tokens: if the user sees the agent going in the wrong direction, they can interrupt before the full response is generated and paid for.
v0.9.0: Consensus, GTD & Deep Integration
Near-SOTA. The agent has tools, planning, evaluation, and streaming. v0.9.0 adds reliability (consensus), productivity methodology (GTD), and environment depth (Emacs integration).
TODO Consensus loop
- Multi-provider parallel inference for critical decisions. When the action's impact score exceeds a threshold (file writes outside home directory, shell commands that touch /etc, git pushes to main), the system sends the same prompt to 2–3 independent providers.
- Disagreement detection: compare the structured outputs (actions proposed by each provider). If all providers propose the same action (or semantically equivalent actions), proceed with the highest-confidence result. If providers disagree, flag the action for HITL approval and present the user with each provider's proposal and confidence score.
- Confidence scoring: when providers agree, use the agreement level as a confidence metric for telemetry. Track which provider combinations produce the highest agreement rates for which task types.
- Cost-aware: consensus mode doubles/triples cost for the action. Only trigger when the action's impact exceeds the cost threshold. Configurable via
CONSENSUS_THRESHOLD— actions below the threshold use single-provider mode.
TODO GTD integration
- Full GTD cycle: capture (inbox → process), clarify (what is this? is it actionable?), organize (project, next action, reference, someday/maybe, trash), reflect (weekly review), engage (context-appropriate action lists).
- Org properties:
:TRIGGER:(what context makes this actionable — @home, @office, @computer, @phone),:BLOCKER:(what task must complete first). - Weekly review: the agent scans all projects and tasks, surfaces stalled items, suggests next actions, and generates a review Org file for the user. The review is produced deterministically (no LLM — pure Org tree traversal) and takes zero tokens.
TODO Deep Emacs integration
- Bidirectional sync: Emacs saves a file → daemon memory updates via
:buffer-updatesignal. Daemon modifies a file → Emacs buffer reflects the change via the Emacs bridge (file-watch or explicit refresh command). - Org-agenda awareness: the agent can query the user's agenda view (scheduled items, deadlines, habits) and incorporate agenda context into planning decisions. "What should I work on today?" considers the agenda, not just the task tree.
- Clock time tracking: the agent can start/stop clocks on Org headlines. Produces clock tables for time reporting.
- Refile and archive: the agent can refile headlines between Org files and archive completed items to
/memex/archives/. Archive decisions are proposed by the LLM and verified by the Dispatcher (archive policy: DONE items older than 30 days, DONE items with no open child tasks).
Competitive Advantage Analysis — v0.9.0 Summary
The consensus loop is not unique (OpenClaw has a similar feature), but Passepartout's implementation benefits from the structured output enforcement in v0.5.2 — comparing plists for semantic equivalence is simpler and more reliable than comparing free-text responses.
The GTD integration and Emacs integration are Passepartout's "unfair advantages" — no competitor has either. Claude Code and Copilot are development tools, not life management tools. Org-mode is the bridge: the same format that holds the agent's memory holds the user's tasks, calendar, and notes. The GTD cycle operates on the same Org trees that the foveal-peripheral model renders into LLM context. There is no import/export, no separate task database, no format conversion. The agent's world model IS the user's Org files. This is the unified format thesis from the DESIGN_DECISIONS document made operational — and it's a capability that JSON-based agents structurally cannot replicate.
v1.0.0: SOTA Parity (verified)
Feature-complete, benchmark-verified, production-hardened. All capabilities from v0.3.0 through v0.9.0 integrated and tested end-to-end.
v1.0.0 is not a feature release — it is a verification release. Every feature from the v0.x series is tested under concurrent load, resource starvation, adversarial input, and benchmark scoring. The evaluation harness (v0.8.0) provides the scoring apparatus; v1.0.0 is the scored release.
| Area | Parity Target | Verification Method |
|---|---|---|
| Self-improvement | Skill Creator + self-edit + hot-reload | Skill regression suite (v0.3.x) |
| Planning | Task tree DAG with terminal states | Multi-step integration tests |
| Tool ecosystem | 15+ MCP tools + native shell + git | MCP protocol compliance tests |
| Context window | Semantic search + foveal-peripheral + caching | Token budget vs competitor audit |
| Safety | 9-vector Dispatcher + policy + permissions | Chaos testing (v0.8.0) |
| Multi-step tasks | Task trees with terminal states | SWE-bench score (v0.8.0 harness) |
| Code editing | Full file read/write via MCP + Org | SWE-bench-verified subset |
| Memory | Vector recall + Merkle integrity + MVCC | Concurrency stress test (v0.5.1) |
| Emacs integration | Full org-mode control (exceeds Claude Code) | Org-agenda round-trip test |
| Streaming | Partial output + early termination | TUI UX latency benchmark |
| Offline | 100% local capable (7-13B model) | Air-gapped integration test |
| Cost | 2-3x fewer tokens than competitors | SWE-bench token audit |
| Concurrency | Priority queue + MVCC + parallel signals | Concurrent load test (3 users + bg) |
Performance projection at v1.0.0:
| Scenario | Passepartout v1.0.0 | Claude Code | OpenClaw |
|---|---|---|---|
| Single-turn chat (local 8B) | 2-4s, ~1,500 tok | N/A (cloud-only) | N/A (cloud-only) |
| Single-turn chat (cloud) | 1-3s, ~1,500 tok | 1-3s, ~3,000 tok | 1-3s, ~3,500 tok |
| Multi-step coding (5 files) | 15-30s, ~30,000 tok | 10-20s, ~65,000 tok | 20-40s, ~85,000 tok |
| Knowledge base query (500 nodes) | <1s (in-image vector), 0 LLM tok | 3-5s, ~5,000 tok (LLM-assisted) | 3-5s, ~5,000 tok (LLM-assisted) |
| Background maintenance | 0 LLM tok (deterministic cron) | Variable or skipped | Variable or skipped |
| Offline operation | Full capability | None | None |
| Cost per coding session | ~$0.15 (gpt-4o-mini) | ~$0.45 (gpt-4o-mini) | ~$0.55 (gpt-4o-mini) |
Passepartout wins on cost (2-3x savings from sparse trees + deterministic gates + caching), offline capability (unique), and knowledge management (10-40x savings from in-image vector lookup + Org-native format). It is competitive on single-turn latency and slightly behind on multi-step latency (the single-pipeline architecture adds ~5s overhead per tool execution versus competitors' parallel tool dispatch).
The key insight at v1.0.0: Passepartout does not beat competitors at everything. It wins decisively where the architecture's structural advantages apply (safety, cost, offline operation, knowledge management) and is competitive where they don't (raw LLM inference speed, parallel tool dispatch). This is a defensible position — the niches Passepartout dominates are exactly the niches that matter for a sovereign, local-first AI assistant.
But it is still fundamentally probabilistic at its core. The symbolic engine verifies and constrains, but the generative engine is still the primary reasoning source. The architectural transition to symbolic-first reasoning happens in v3.0.0.
v2.0.0: Lisp Machine Emergence
This version is not about the symbolic engine - it is about tools. The agent stops running inside Emacs and starts replacing it. Lish (Lisp shell) emerges: a shell that speaks plists, not POSIX. Org-mode buffers become the file system. Org-babel becomes the REPL. The agent is no longer a passenger in Emacs - it is the operating system.
The key insight is that the agent's interface and the agent's brain become the same thing. In earlier versions, there is a clear separation: the agent produces output, the TUI displays it. In v2.0.0, the distinction blurs. The agent's thoughts are displayed in Org buffers that are also the interface that the agent manipulates.
This is the Emacs cannibalization phase. Not hostile replacement but evolution - Emacs was always a Lisp machine, and v2.0.0 completes the metamorphosis.
From Lisp-using agent to true Lisp machine. Agent IS the Emacs process.
- Lish: Lisp editor — Org-mode as IDE. Org-babel for interactive evaluation. Full REPL in TUI.
- Lish: Shell replacement — Lisp-based shell that speaks plists. Org-mode buffers as file system.
v3.0.0: Neurosymbolic Maturity
Deterministic planner takes the wheel. LLM relegated to semantic translation.
- Deterministic planner: Pure Lisp task scheduler. No LLM needed for scheduling.
- Self-correcting gates: Gates learn from false positives (user override patterns).
This is the architectural leap. The system transitions from "probabilistic engine with symbolic verification" to "symbolic engine with probabilistic input and output."
The 10-80-10 architecture becomes fully realized: ten percent neural for input translation, eighty percent symbolic for reasoning against a knowledge graph, ten percent neural for output formatting. The symbolic engine maintains facts, relationships, rules, and formal proofs. When the neural engine generates something, the symbolic engine verifies it - not by checking against a blocklist, but by running the proposal through a Prolog/Datalog reasoner that understands the domain constraints.
The deterministic planner takes the wheel. The LLM is no longer consulted for planning decisions - it translates human language to structured queries and structured results back to human language. The planning itself is pure Lisp: task graphs generated by a symbolic reasoner that has access to the full knowledge graph.
Self-correcting gates replace the learned Bouncer rules. The system learns not just from approved exceptions but from the full history of outcomes - did the plan succeed? Where did it fail? The symbolic engine updates its own rules based on the results.
The implications are significant. Hallucination becomes structurally impossible because the symbolic engine will not accept a fact that contradicts its knowledge graph. Safety becomes provable because the formal verification layer can prove properties about the system's behavior. Self-improvement becomes stable because the agent modifies skills that are then verified before execution.
v4.0.0: AI Stack Internalized
The agent understands its own weights. No external inference.
- Llama.cpp in Lisp: FFI binding. No Python subprocess. Pure Common Lisp inference.
- Weights as sexps: Neural weights as Lisp data structures. Homoiconic model introspection.
v5.0.0: Hardware
The Lisp machine becomes physical. RISC-V with tagged architecture, hardware-enforced type checking, FPGA prototype for the symbolic core. The agent runs not in emulation but on silicon purpose-built for the architecture.
This is the long horizon. The symbolic engine runs on logic ASICs optimized for symbolic computation. The neural engine runs on GPU or purpose-built matrix math hardware. Lisp orchestrates both, enforcing at the hardware level what it enforced at the software level in earlier versions.
v6.0.0: True Agency
World models, temporal reasoning, goal persistence across restarts.
- World models: Predictive models of user behavior, project dynamics, system state.
- Temporal reasoning: Scheduling, deadlines, elapsed duration awareness.
- Goal persistence: Goals survive restarts. Long-term projects in memory-objects.