diff --git a/.#README.org b/.#README.org deleted file mode 120000 index 0ea5c1f..0000000 --- a/.#README.org +++ /dev/null @@ -1 +0,0 @@ -user@amr.48055:1777984387 \ No newline at end of file diff --git a/README.org b/README.org index a0b6470..09a5578 100644 --- a/README.org +++ b/README.org @@ -1,66 +1,145 @@ -#+TITLE: Passepartout — Your Autonomous, Plain-Text Life Assistant +#+TITLE: Passepartout — The Plain-Text AI Assistant That Never Gets More Expensive #+AUTHOR: Amr #+FILETAGS: :passepartout:ai:assistant: #+HTML:
-#+HTML: -#+HTML: +#+HTML: +#+HTML: #+HTML: #+HTML: #+HTML:
-Passepartout is an AI assistant that runs in your terminal, reads and writes your Org-mode files, executes tasks through a verified safety gate, and works fully offline with local LLMs. Everything it knows is a folder of plain text files that you own. +Passepartout is an AI assistant that runs in your terminal. It reads and writes your Org-mode files, executes tasks through a verified safety gate, and works fully offline with local LLMs. Every action the LLM proposes is checked by nine deterministic safety gates before it touches a file, runs a command, or sends a message. The LLM suggests. The gate decides. +Everything it knows is a folder of plain text files that you own. -**One-line install:** +*Install:* #+begin_src bash -curl -fsSL https://raw.githubusercontent.com/amrgharbeia/opencortex/main/passepartout | bash -s configure +curl -fsSL https://raw.githubusercontent.com/amrgharbeia/passepartout/main/passepartout | bash -s configure #+end_src -Then ~passepartout tui~ to start chatting. +This installs dependencies (SBCL, Quicklisp), tangles the Org source files, and runs the setup wizard for LLM providers. Requires curl and sudo access for package installation. + +* What Makes Passepartout Different + +** Every action is verified, not trusted. + +Most AI agents add safety checks as an afterthought — prompt-based guardrails that consume LLM tokens and can be evaded with clever phrasing. Passepartout inverts this: nine deterministic safety gates run in pure Lisp between the LLM's proposal and execution. Secret scanning checks for API key leaks. Path protection blocks reads and writes to sensitive files. Shell safety detects destructive commands and injection vectors. Network exfiltration detection flags unauthorized outbound connections. Lisp syntax validation catches malformed code before it writes to disk. + +Every gate costs 0 LLM tokens. Every gate is a Common Lisp function, not a prompt. Every gate runs for every action, unconditionally. + +If a gate blocks a proposal, the rejection feedback goes back to the LLM so it can self-correct and try again. If the deterministic Dispatcher is uncertain, it creates a Flight Plan — a human-readable Org buffer you review and approve. The human decides. The Dispatcher learns from your decision and writes a rule for next time. + +** The more you use it, the cheaper it gets. + +Passepartout has a downward cost curve. This runs counter to every other AI agent. + +Here is why. When you use Passepartout, the Dispatcher observes every blocked action and every human-approved exception. Each decision becomes a deterministic rule. A file write you approved once becomes an allowed path pattern. A shell command you denied becomes a permanent block. Each hardened rule means one fewer LLM call next time. + +Meanwhile, the foveal-peripheral context model prunes your [[https://en.wikipedia.org/wiki/Memex][memex]] — your personal knowledge base, a term coined by Vannevar Bush in 1945 for a mechanised private library — to the relevant Org subtrees before sending anything to the LLM. The agent does not load your entire knowledge base, or even the entire file like agents that use Markdown do — it loads precisely the headlines that matter. Less context in, fewer tokens out. + +Other agents grow more expensive over time (context histories accumulate, safety instructions grow). Passepartout's cost curve bends down. + +** It edits its own source code. Verified before execution. + +Passepartout can read its own Org-mode source files, propose changes, and hot-reload skills into the running image without restarting. The skill engine loads every skill into a jailed Common Lisp package, validates its syntax, tests its trigger function in isolation, and only then promotes it to the live registry. + +Core pipeline files — the Perceive-Reason-Act loop, the Merkle-tree memory, the Dispatcher gate stack — are path-protected. The agent could modify its own brain stem, but it cannot do this without human review. Skills and system modules expand freely. The core stays small, protected, and auditable. + +No other AI agent can modify its own reasoning engine and reload the change while it is running. This is not a planned feature. It works today. + +** Your memory and your tasks are the same format. Org-mode. + +Passepartout makes a bet that most systems consider too expensive: humans and machines should share the same file format. That format is Org-mode. + +Your notes, your calendar, your project plans, the agent's memory, and the agent's own source code are all the same thing: Org files in ~/memex/. =headline trees. Property drawers for metadata. Source blocks for code. TODO keywords for task state. Tags for categorization. + +When you write a TODO in Emacs, the agent sees it immediately as a native data structure and acts on it. When the agent creates a note, you can open it in any text editor and read it. There is no import/export step, no hidden database (except maybe for indexing), no format conversion. If Passepartout stops existing tomorrow, your data survives in plain text, readable in 2040. + +** Works offline. Works locally. The safety doesn't stop. + +You can run Passepartout entirely on your hardware with a local LLM via Ollama or some other inference engine. No internet connection required. But unlike most local AI tools, offline mode does not mean safety-last. The nine deterministic safety gates are pure Common Lisp — they run identically whether you are online or off. The Merkle-tree memory with snapshot rollback is in-process, 0 milliseconds, 0 network calls. Semantic retrieval runs on in-image vectors, 0 LLM tokens per query. + +Cloud providers (OpenRouter, OpenAI, Anthropic, Groq, Gemini, DeepSeek, NVIDIA NIM...) are optional add-ons. When you use them, the model-tier router automatically selects the cheapest provider that matches your task's complexity. Privacy-tagged content stays local even when cloud providers are configured. + +* How It Works + +Every signal — a chat message, a heartbeat tick, a file change notification — moves through three stages: + +#+begin_example +Signal → Perceive → Reason → Act + normalize LLM proposes dispatch approved action + gates verify tool output feeds back +#+end_example + +*Perceive* normalizes raw input from any gateway (TUI, CLI, Telegram, Signal) into a uniform signal plist. Buffer updates from Emacs ingest Org AST nodes into memory. Heartbeat ticks trigger background maintenance. HITL commands intercept before the LLM is invoked. + +*Reason* calls the LLM to generate a proposal, then runs the proposal through every registered deterministic gate — sorted by priority, highest first. If a gate rejects (shell command blocked, path protected, secret exposed), the rejection trace feeds back to the LLM for self-correction, up to three retries. If a gate requests human approval, the action becomes a Flight Plan awaiting your decision. If all gates pass, the action proceeds to Act. + +*Act* dispatches the approved action to the correct actuator: shell commands go to the shell actuator (with timeout and output limiting), tool invocations go to the cognitive tool registry, system commands trigger internal harness operations, and chat responses route to the TUI or messaging gateway. Each stage can feed back into Perceive — a tool output becomes the next perception. + +This pipeline is not a single-threaded bottleneck. The priority-queued signal processor (v0.5.0 roadmap) preempts background maintenance for user interactions. The Merkle-tree memory supports concurrent reads and writes through versioned snapshots — multiple signals can process simultaneously without corrupting shared state. + +Deep detail: [[file:docs/ARCHITECTURE.org][Architecture]] for the full code map and pipeline flow, [[file:docs/DESIGN_DECISIONS.org][Design Decisions]] for the rationale behind every architectural choice. + +* Current Capabilities + +Features marked =Stable= ship in the current release. Features marked =Planned= are scheduled in the [[file:docs/ROADMAP.org][Roadmap]]. + +| Capability | Status | Since | Notes | +|----------------------------------+----------+---------+----------------------------------------------------------------------| +| 9-vector deterministic safety | Stable | v0.2.0 | Secrets, paths, shells, network, lisp, privacy | +| Foveal-peripheral context model | Stable | v0.2.0 | Sends relevant subtrees, not all files | +| Merkle-tree memory + snapshots | Stable | v0.2.0 | Integrity hashing, copy-on-write rollback | +| Self-editing + hot-reload | Stable | v0.2.0 | Agent reads, modifies, reloads its own code | +| 8 provider cascade | Stable | v0.1.0 | OpenRouter, OpenAI, Anthropic, Groq, Gemini, DeepSeek, NVIDIA, local | +| Terminal UI (Croatoan) | Stable | v0.2.0 | Scrollback, history, themes, commands, tab completion | +| Skill engine (20+ skills) | Stable | v0.1.0 | Jailed loading, topological sort, hot-reload | +| Human-in-the-Loop approval | Stable | v0.3.0 | Flight Plan workflow for blocked actions | +| Model-tier routing | Stable | v0.3.0 | Sends simple tasks to cheaper models | +| Event orchestrator (hooks + cron) | Stable | v0.3.0 | Org-based hook and cron dispatch | +| Context manager (project scoping) | Stable | v0.3.0 | Push/pop focus, persist across restart | +| Semantic retrieval (embeddings) | Stable | v0.3.0 | In-image vector lookup, 0 LLM tokens | +| TUI gate trace + focus map | Planned | v0.4.0 | Visual safety trace + what the agent is looking at | +| Emacs bridge | Planned | v0.4.0 | Native Emacs client over the wire protocol | +| Self-build safety boundary | Planned | v0.4.0 | Core files path-protected, Flight Plan required | +| Discord + Slack gateways | Planned | v0.4.0 | Messaging alongside Telegram and Signal | +| Token economics + cost tracking | Planned | v0.5.0 | Per-session cost counter, prompt caching, budget enforcement | +| Priority-queue signal processing | Planned | v0.6.0 | Preempts background for user interactions | +| MVCC memory concurrency | Planned | v0.6.1 | Concurrent reads/writes on Merkle tree | +| Structured output enforcement | Planned | v0.6.2 | Plist validation with retry and feedback | +| Streaming responses | Planned | v0.6.3 | Live output in TUI, interrupt-and-redirect | +| MCP-native tool ecosystem | Planned | v0.7.0 | 50+ tools from the MCP ecosystem | +| Voice gateway | Planned | v0.7.3 | Speech-to-text + text-to-speech via Whisper / ElevenLabs | +| Task planning (tree DAG) | Planned | v0.8.0 | Org headline task trees, branch pruning | +| Skill creator | Planned | v0.8.0 | LLM drafts skills from natural language, verified before load | +| Computer use / vision | Planned | v0.9.0 | Screenshot capture, UI interaction | +| SWE-bench evaluation harness | Planned | v0.9.0 | Automated benchmark scoring with Org trajectory audit | +| Consensus loop (multi-provider) | Planned | v0.10.0 | Parallel inference, disagreement detection | +| GTD integration | Planned | v0.10.0 | Full capture-clarify-organize-reflect-engage | +| Deep Emacs integration | Planned | v0.10.0 | Org-agenda, clock time, refile, archive | * Quick Start -You need SBCL (Common Lisp), git, and curl. +After installation, the =passepartout= command is available from anywhere. #+begin_src bash -git clone https://github.com/amrgharbeia/opencortex.git ~/projects/passepartout -cd ~/projects/passepartout -./passepartout configure # install deps, tangle, setup wizard -passepartout tui # launch the terminal interface +passepartout tui # launch the terminal interface +passepartout daemon # start the background daemon (for TUI/CLI/gateways) +passepartout doctor # run system health check #+end_src See [[file:docs/USER_MANUAL.org][User Manual]] for the full guide. -* Why Passepartout - -- *Your data stays yours.* No database, no vector store, no cloud silo. Your entire memory is a folder of Org files. You can read them with any text editor, search them with grep, and back them up however you like. If Passepartout stops existing, your data doesn't disappear. - -- *The LLM can't do damage if you set the rules.* Every action the LLM proposes passes through a deterministic safety gate before it touches a file, runs a command, or sends a message. The LLM suggests; the gate decides. Hallucinations are blocked, not corrected after the fact. - -- *Runs on your hardware.* Works fully offline with Ollama and local models. Cloud providers (OpenRouter, OpenAI, Anthropic, Groq, Gemini, DeepSeek, NVIDIA NIM) are optional add-ons. - -- *Written in Common Lisp.* Code is data. The agent reads its own source the same way it reads a text file — it parses, modifies, and hot-reloads its skills without restarting. One language from the kernel to the TUI to the build system. - -* Architecture - -- [[file:org/core-loop.org][Metabolic Loop]] — Perceive → Reason → Act, the fundamental cognitive cycle -- [[file:org/security-dispatcher.org][Dispatcher]] — 9-vector safety gate: secret scanning, path protection, shell safety, lisp validation, network exfiltration, privacy filtering -- [[file:org/core-memory.org][Memory]] — Single-address-space object store with Merkle-tree integrity and snapshot rollback -- [[file:org/core-skills.org][Skill Engine]] — 20 hot-reloadable skills loaded at boot, each an independent Org file -- [[file:org/gateway-tui.org][TUI]] — Croatoan-based terminal interface connected via framed TCP protocol -- [[file:org/system-model.org][LLM Dispatch]] — Central dispatch for model inference requests - * Project Documentation -| Document | Answers | -|----------|---------| -| [[file:docs/USER_MANUAL.org][User Manual]] | How do I use it? | -| [[file:docs/ARCHITECTURE.org][Architecture]] | How does it work inside? | -| [[file:docs/DESIGN_DECISIONS.org][Design Decisions]] | Why was it built this way? | -| [[file:docs/ROADMAP.org][Roadmap]] | Where is it going? When? | -| [[file:docs/ROADMAP.org][TODO]] | Who is doing what? | -| [[file:docs/CONTRIBUTING.org][Contributing]] | How do I contribute? | +| Document | Answers | +|-------------------------------------------+-------------------------------------------------------| +| [[file:docs/USER_MANUAL.org][User Manual]] | How do I use it? | +| [[file:docs/ARCHITECTURE.org][Architecture]] | How does it work inside? | +| [[file:docs/DESIGN_DECISIONS.org][Design Decisions]] | Why was it built this way? | +| [[file:docs/ROADMAP.org][Roadmap]] | Where is it going? When? | +| [[file:docs/CONTRIBUTING.org][Contributing]] | How do I contribute? | * License diff --git a/docs/.#ROADMAP.org b/docs/.#ROADMAP.org deleted file mode 120000 index e1b6f57..0000000 --- a/docs/.#ROADMAP.org +++ /dev/null @@ -1 +0,0 @@ -user@amr.1092521:1777807168 \ No newline at end of file diff --git a/docs/ARCHITECTURE.org b/docs/ARCHITECTURE.org index afc86e7..6325f63 100644 --- a/docs/ARCHITECTURE.org +++ b/docs/ARCHITECTURE.org @@ -13,34 +13,28 @@ Passepartout divides cognition along two axes: **Foreground vs Background** (ini The Probabilistic engine proposes. The Deterministic engine verifies and executes. No proposal from the LLM touches a file, runs a command, or sends a message without passing through at least one deterministic gate. -* Code Map +* Architectural Layers -The project is organized into ~org/~ (source of truth) and ~lisp/~ (generated by tangle). - -** Core pipeline (loaded by ASDF, committed to git) - -| File | Purpose | -|------------------------------+--------------------------------------------------------------------| -| ~org/core-defpackage.org~ | Package definition and export list | -| ~org/core-skills.org~ | Skill engine: ~defskill~ macro, topological sorter, jailed loading | -| ~org/core-communication.org~ | Framed TCP protocol, actuator registry, daemon server | -| ~org/core-memory.org~ | ~memory-object~ struct, Merkle hashing, snapshots, persistence | -| ~org/core-context.org~ | Foveal-peripheral rendering, context assembly for LLM | -| ~org/core-loop-perceive.org~ | Stage 1: normalize raw signals into pipeline format | -| ~org/core-loop-reason.org~ | Stage 2: LLM proposal + deterministic verification | -| ~org/core-loop-act.org~ | Stage 3: dispatch approved actions to actuators | -| ~org/core-loop.org~ | Orchestration: process-signal, heartbeat, main entry point | -| ~org/system-diagnostics.org~ | Boot-time health check, doctor CLI | +** Core Pipeline (loaded by ASDF — the harness) +- package definition: defpackage, cognitive tools, logging +- memory: memory-object struct, Merkle hashing, snapshots, persistence +- context: foveal-peripheral rendering, context assembly for LLM +- pipeline: perceive → reason → act stages, orchestrator, heartbeat +- skills engine: defskill macro, topological sorter, jailed loading +- communication: framed TCP protocol, actuator registry, daemon server +- diagnostics: health checks, doctor CLI ** Skills (loaded at runtime by the skill engine) +- gateway: TUI, CLI, messaging (Telegram, Signal) +- system-model: provider dispatch, router, embeddings, model explorer +- security: dispatcher (safety gate), policy, permissions, validator, vault +- programming: Lisp, Org, literate tools, REPL, standards +- system: config, archivist, self-improve, memory introspection, shell actuator, event-orchestrator, context-manager, setup -| Category | Files | Purpose | -|------------------+-----------------------------------------------------------------------------------------------------------------------------------+---------------------------------| -| **gateway-** | ~gateway-cli~, ~gateway-messaging~, ~gateway-tui~ | External communication channels | -| **system-model-** | ~system-model-provider~, ~system-model~, ~system-model-router~, ~system-model-embedding~, ~system-model-explorer~ | LLM infrastructure | -| **security-** | ~security-dispatcher~, ~security-policy~, ~security-permissions~, ~security-vault~, ~security-validator~ | Safety and authorization | -| **programming-** | ~programming-lisp~, ~programming-org~, ~programming-standards~, ~programming-literate~, ~programming-repl~ | Lisp and Org tooling | -| **system-** | ~system-config~, ~system-archivist~, ~system-self-improve~, ~system-memory~, ~system-actuator-shell~, ~system-event-orchestrator~ | Background services | +** Clients (connect to daemon via framed TCP protocol) +- TUI: Croatoan-based terminal interface (model-view architecture, dirty-flag rendering) +- CLI: pipe-friendly command-line gateway +- Emacs: elisp bridge speaking the wire protocol (planned v0.4.0) * Pipeline Flow @@ -62,6 +56,54 @@ Each stage can produce feedback signals that loop back to Perceive (e.g., a tool A depth counter prevents infinite loops. If a signal's depth exceeds 10, it is silently dropped. This is the circuit breaker for runaway recursive cycles. +* Foveal-Peripheral Context Model + +When the agent assembles context for the LLM, it does not send the entire memory. It renders a sparse outline using three rules: + +1. *Depth ≤ 2* — the root node and its immediate children are always included (title and properties only, no content). +2. *Foveal focus* — the node the user is currently interacting with is rendered in full, including its body content and all descendants. +3. *Semantic relevance* — any node whose embedding vector has cosine similarity ≥ threshold (default 0.75) to the foveal node is rendered in full. + +Nodes that don't match any rule are rendered as title-only — a single Org headline with its :ID: property. This keeps active context between 2,000–4,000 tokens for typical memex sizes, versus 50,000–150,000 tokens for a full serialization. The embedding vectors that power semantic retrieval are computed at ingest time (~ingest-ast~ in core-memory.lisp) and can use local models (Ollama), cloud APIs (OpenAI embeddings), or a zero-dependency lexical fallback (trigram Jaccard similarity). + +For the rationale behind sparse-tree rendering and why this architecture outperforms "load everything" systems, see Design Decisions: Org-Mode as Unified AST. + +* Dispatcher Gate Stack + +Every action the LLM proposes passes through a stack of deterministic gates before execution. Gates are registered as skills with ~defskill~ and sorted by priority (highest first) in ~cognitive-verify~ (core-loop-reason.lisp). + +| Priority | Gate | What It Checks | +|----------+---------------------------+----------------------------------------------------------| +| 600 | security-permissions | Tool permission table (allow/ask/deny per tool) | +| 600 | security-vault | Credential storage integrity | +| 500 | security-policy | Requires :explanation on every action | +| 150 | security-dispatcher | 9-vector safety: secrets, paths, shell, lisp, network, | +| | (the Dispatcher) | privacy, high-impact approval | +| 95 | security-validator | Protocol schema validation | +| 100 | system-archivist | Scribe and Gardener maintenance on heartbeat | +| 80 | system-event-orchestrator | Cron job dispatch on heartbeat | + +Gates return either the action (passed through unchanged), a rejection (:LOG or :EVENT with block reason), or an approval request (:EVENT with :level :approval-required). Rejections feed back to the LLM as a rejection trace — the model sees what it proposed, which gate blocked it, and why, and retries with that context (up to 3 retries). Approval requests create Flight Plan Org nodes requiring human review via the HITL workflow. + +Every gate is a pure Common Lisp function. Verification costs 0 LLM tokens. Contrast with prompt-based guardrails (Claude Code, OpenClaw, Hermes Agent) which consume 100–500 LLM tokens per verification. + +For the rationale behind deterministic vs prompt-based safety, see Design Decisions: The Probabilistic-Deterministic Split and The Dispatcher as Learning System. + +* Embedding & Semantic Retrieval Pipeline + +Every memory-object can carry an embedding vector for semantic search. The pipeline: + +1. *Ingest* — ~ingest-ast~ (core-memory.lisp) calls ~embeddings-compute~ on new objects, storing the vector in ~memory-object-vector~. +2. *Queue* — objects with stale vectors are queued via ~mark-vector-stale~. The ~embed-all-pending~ cron job (every 10 minutes, :REFLEX tier) drains the queue and recomputes vectors. +3. *Retrieval* — ~context-awareness-assemble~ (core-context.lisp) passes the foveal node's vector to ~context-object-render~. Nodes with cosine similarity ≥ threshold against the foveal vector are rendered in full rather than as title-only. + +Three backends are available, selected via ~EMBEDDING_PROVIDER~: +- :local — Ollama-compatible /api/embeddings endpoint (e.g., nomic-embed-text) +- :openai — OpenAI /v1/embeddings API (e.g., text-embedding-3-small) +- :hashing — zero-dependency lexical fallback using trigram Jaccard similarity (replaced SHA-256 hashing in v0.4.0 because cryptographic hashes maximise output divergence — the opposite of what a similarity metric needs) + +For the design rationale, see Design Decisions: Token Economics and Performance Advantage. + * Skill Lifecycle 1. *Discovery:* ~skill-initialize-all~ scans the skills directory, globs for ~*.lisp~ files (excluding ~core-*~ files which are loaded by ASDF) @@ -76,7 +118,7 @@ A depth counter prevents infinite loops. If a signal's depth exceeds 10, it is s All communication between the daemon and its gateways (TUI, CLI, Emacs) uses length-prefixed plists over TCP: ``` -00002C(:TYPE :EVENT :PAYLOAD (:ACTION :handshake :VERSION "0.2.0")) +00002C(:TYPE :EVENT :PAYLOAD (:ACTION :handshake :VERSION "0.3.0")) ``` The 6-character hex prefix encodes the payload length. The payload is a ~prin1~-serialized plist. ~*read-eval*~ is bound to nil on the receiving end to prevent code injection. @@ -89,3 +131,7 @@ The 6-character hex prefix encodes the payload length. The payload is a ~prin1~- | ~:META~ | plist | ~:SOURCE~, ~:SESSION-ID~, ~:reply-stream~ | | ~:PAYLOAD~ | plist | Action-specific data (~:SENSOR~, ~:ACTION~, ~:TEXT~) | | ~:DEPTH~ | integer | Recursion counter for loop prevention | + +The protocol lifecycle begins with a handshake: the daemon sends a :handshake action with its version, and the client responds with its capabilities. After handshake, either side can send any message type. The daemon never initiates a disconnect — clients poll for messages and reconnect on EOF. + +Planned for v0.6.3: streaming chunk frames (~:type :stream-chunk~) carrying partial LLM output. The final chunk is an empty string signalling end-of-stream, enabling interrupt-and-redirect from the client side. diff --git a/docs/CHANGELOG.org b/docs/CHANGELOG.org deleted file mode 100644 index 6a717ad..0000000 --- a/docs/CHANGELOG.org +++ /dev/null @@ -1,71 +0,0 @@ -#+TITLE: Changelog -#+STARTUP: content - -* v0.2.1 — Rename, Safety, and Deployment (2026-05-02) -This release renames the project to Passepartout, adds content-level safety gates, professionalizes deployment, and documents every function with full explanatory prose. - -** Project Rename -- **Passepartout:** Project renamed from OpenCortex to Passepartout. All files, packages, functions, and environment variables updated. -- **Org/lisp split:** Source of truth lives in ~org/~, tangled to ~lisp/~. Core files committed, skills generated at configure time. -- **31 org files:** Every file renamed to ~category-subject.org~ convention. Harness and skills unified under one directory. - -** Safety -- **Secret Exposure Gate:** Content scanning for API keys, PEM blocks, PGP keys, credentials, and tokens in all outgoing text. -- **Path Protection:** File reads blocked for ~.env~, SSH keys, PEM/PGP, cloud configs, and credential stores. -- **Shell Safety:** Destructive commands (~rm -rf /~, ~dd~, ~mkfs~, ~shred~) and injection patterns (backtick, ~$()~) blocked with timeout and output limits. -- **Lisp Validation Gate:** Writes to ~.lisp~ and ~.org~ files validated for syntax errors before they reach disk. -- **REPL Verification Lint:** Warns if defuns are written without REPL prototyping. - -** Deployment -- **Multi-distro:** Automatic detection of Debian vs Fedora, correct package names and managers. -- **systemd service:** User-level auto-start on boot via ~passepartout install service~. -- **Backup/Restore:** ~passepartout backup~ and ~passepartout restore~ commands. -- **Docker:** Updated to ~debian:trixie-slim~, fixed build context. -- **CI/CD:** GitHub Actions workflows for lint, test, and release. Gitea deploy workflow fixed. - -** Engineering Process -- **REPL-first Lifecycle:** Two-track workflow: Org-first for prose and tests, REPL-first for implementation. Every function prototyped in the REPL before reaching Org. -- **Verification Loop:** Bouncer rejects bad lisp; rejection trace feeds back to LLM for self-correction. -- **System-prompt-augment:** Skills can inject domain-specific mandates into the LLM prompt via ~:system-prompt-augment~. - -** Documentation -- **Literate Prose Restored:** Every Org file now has an Architectural Intent overview and explanatory prose before each function block, following the style established in the v0.1.0 era. -- **AGENTS.md:** Thinned to a routing layer — the skill org files are authoritative. - -** Contributors -- **gitignore:** ~skills/*.lisp~ and ~tests/*.lisp~ as generated artifacts (source of truth is ~.org~). -- **DeepSeek and NVIDIA NIM:** Added as LLM providers (OpenAI-compatible). Use ~DEEPSEEK_API_KEY~ and ~NVIDIA_API_KEY~ env vars. - -* v0.2.0 - Interactive Refinement (2026-04-29) -This release focuses on professionalizing the environment and enhancing the agent's structural capabilities. - -** Features -- **Enhanced Lisp/Org Utilities:** Structural editing, REPL evaluation, and automated formatting to ensure code integrity. -- **Namespace Standardization:** Refactored utilities into =utils-org= and =utils-lisp= for predictable discovery. -- **Autonomous Mandates:** Implemented =GEMINI.md= for local agentic enforcement of engineering standards. -- **Onboarding Wizard:** Modular Lisp setup for multiple LLM providers. -- **Professional TUI:** Styled, scrollable interface with improved diagnostics. - -* v0.1.0 - The Autonomous Foundation (2026-04-20) -This is the initial MVP release of the ~passepartout~. It establishes a secure, auditable Lisp kernel for a personal operating system. - -** Features -- **Unified Envelope Architecture:** Actuator-agnostic protocol that decouples routing metadata from cognitive payloads, ensuring all clients (TUI, Emacs, CLI, Matrix) are treated as equal citizens. -- **Metabolic Pipeline:** Robust Perceive-Reason-Act loop with selective memory rollbacks and graceful shutdown handling. -- **Verification Lock:** Mandatory skill enforcement via environment configuration. System halts if security policies or bouncers fail to load. -- **Foveal-Peripheral Context:** High-resolution focus on active tasks with low-resolution skeletal awareness of the rest of the Memex. -- **The Bouncer:** Last-mile deterministic security gate with Deep Packet Inspection for secrets and network exfiltration. -- **Autonomous Scribe:** Background distillation worker that turns daily journal entries into evergreen Zettelkasten notes. Verified to distill atomic concepts autonomously. -- **Autonomous Gardener:** Heartbeat-driven worker that repairs broken links and identifies orphaned nodes in the Memex graph. -- **Unified Onboarding:** Single-command installation (~passepartout.sh~) with Docker support, OS detection, and automated dependency resolution. -- **Channel-Aware TUI:** Interactive Croatoan-based terminal client with clean, human-readable formatting for tool results and system logs. -- **CLI Gateway:** Local TCP socket server for pipe-friendly interaction and frictionless first contact. - -** Licensing & Community -- **AGPLv3 License:** Passepartout is now officially licensed under the GNU Affero General Public License v3.0. -- **Contributor License Agreement:** Implemented a broad CLA (~CLA.org~) for long-term project sustainability. - -** Architectural Shift -- Transitioned to **Literate Granularity**: Every function and invariant is now formally documented in its own Org block. -- **Provider Agnosticism:** Implemented a dynamic LLM cascade (OpenRouter, Ollama, etc.) removing all hardcoded backend dependencies. -- **Thin Harness Philosophy:** Decoupled the kernel from specific editors or third-party gateways. diff --git a/docs/DESIGN_DECISIONS.org b/docs/DESIGN_DECISIONS.org index b5973e2..3332046 100644 --- a/docs/DESIGN_DECISIONS.org +++ b/docs/DESIGN_DECISIONS.org @@ -2,11 +2,21 @@ This document captures the rationale behind key architectural choices. It is not a specification - it is a thinking medium for future architects and contributors who need to understand why the system is built this way, not just how. +** Non-Negotiable Identity +- Pure Common Lisp + Org-mode. No JSON. No YAML. No external databases. +- Single-address-space memory (Lisp hash tables in RAM — the agent IS the memory). +- "Thin harness, fat skills" — complexity lives at the edges, not the kernel. +- One agent composed of many skills. Concurrency via bordeaux-threads (shared memory). +- Plists everywhere — homoiconic communication between all components. + +This is the foundational decision from which all other decisions derive. It is not negotiable. Every architectural choice below exists because this identity makes it possible — and in some cases, makes it the only viable path. The single memory space enables Merkle-tree integrity without serialization boundaries. Plists enable the cognitive pipeline to be transparent and inspectable at every stage. Org-mode as the universal format means the agent's memory, the user's notes, and the agent's own source code are the same structure. This identity is the constraint that produces the architecture. + * Design ** One single agent :PROPERTIES: :ID: design-multi-agent-default +:CREATED: [2026-05-07 Wed] :END: The AI industry has developed an intuition toward multi-agent systems as the default solution to hard problems. Multiple agents spawn, delegate, coordinate, debate, and consensus their way toward solutions. This pattern is compelling in demos and genuinely useful in specific contexts - but it has become a default assumption that warrants scrutiny. @@ -28,6 +38,7 @@ Passepartout is single-agent by default not from limitation but from conviction: ** The Unified Memory Argument :PROPERTIES: :ID: design-unified-memory +:CREATED: [2026-05-07 Wed] :END: If single-agent architecture is the decision, unified memory becomes the mechanism that makes it viable. The critical question is not "how many agents" but "how does the agent manage context without saturating." @@ -47,6 +58,7 @@ The unified memory argument is not that infinite context is free. It is that wit ** Org-Mode as Unified AST :PROPERTIES: :ID: design-org-unified-ast +:CREATED: [2026-05-07 Wed] :END: Passepartout makes a bet that most systems consider too expensive to place: that humans and machines should share the same file format. That bet is Org-mode. @@ -80,6 +92,7 @@ This is what "sovereignty" means in technical terms: the user owns the data in a ** Homoiconicity as Foundation :PROPERTIES: :ID: design-homoiconicity +:CREATED: [2026-05-07 Wed] :END: Common Lisp is homoiconic: code and data share the same representation. A Lisp program is a list, and a list is a Lisp program. This is usually presented as a curiosity, an interesting property that enables macros. In Passepartout, it is the foundational enabling property of the entire self-modification architecture. @@ -110,19 +123,10 @@ The implications extend beyond convenience. A system that cannot modify its own This is the final expression of homoiconicity: not just that code is readable as data, or that skills are modifiable, but that the entire system - including the parts that other systems protect - is open to modification. There is no ceiling on self-improvement. The agent can rewrite the very code that rewrites itself. -*Lisp and the AI Dream* - -Lisp was invented in 1958 by John McCarthy with artificial intelligence explicitly in mind. Its design - code as data, runtime mutation, symbols and lists as first-class constructs - was shaped by the belief that a truly intelligent machine would need to reason about and modify its own reasoning. For decades, Lisp machines were the closest thing to thinking machines that existed. - -Then the AI winter came. Symbolic AI fell out of favor. Statistical learning and neural networks dominated. Lisp was relegated to niche applications and academic curiosity. The machine that was designed for AI was never used for the task it was designed for. - -Six decades later, neural networks have arrived at the problem from a different direction. They can learn and generalize, but they hallucinate, cannot explain their reasoning, and cannot safely modify themselves. The neuro-symbolic synthesis - combining neural pattern recognition with symbolic reasoning - is recognized as the path toward AI that is both powerful and trustworthy. - -Lisp's time may finally have come. Not as a replacement for neural networks, but as the governor that makes them safe - the symbolic engine that verifies what the neural engine proposes, the homoiconic substrate that allows the system to inspect, modify, and improve its own reasoning. The machine that was designed for AI in 1958 may be the exact machine needed for AI in 2026 and beyond. - ** The Probabilistic-Deterministic Split :PROPERTIES: :ID: design-probabilistic-deterministic +:CREATED: [2026-05-07 Wed] :END: The architecture divides cognition into two fundamentally different reasoning systems. This is not arbitrary engineering but a structural response to a fundamental truth: probabilistic systems will hallucinate, and you cannot build reliable autonomy on an unreliable foundation. @@ -142,6 +146,7 @@ The split also explains why the system gets safer over time without the LLM impr ** The Dispatcher as Learning System :PROPERTIES: :ID: design-bouncer-learning +:CREATED: [2026-05-07 Wed] :END: The Dispatcher begins as a static guard - a set of rules that block obviously dangerous actions. But defining "obviously" is the hard problem. The agent encounters situations the rules do not anticipate. The Dispatcher must grow. @@ -161,6 +166,7 @@ This is the bootstrap. The system begins dependent on human judgment because it ** The REPL as Cognitive Substrate :PROPERTIES: :ID: design-repl-cognition +:CREATED: [2026-05-07 Wed] :END: A REPL - Read, Eval, Print, Loop - is an interactive programming environment that reads an expression, evaluates it, prints the result, and loops back to read the next expression. It is the opposite of batch processing: where batch compiles and runs a program in one shot, a REPL works one expression at a time, with each evaluation building on all previous ones. The programmer defines a function, calls it, inspects the result, modifies it, and calls it again. The state accumulates. The session is the program. @@ -182,6 +188,7 @@ This is why the REPL becomes more important as the system matures. In early vers ** Observability and the Thought Trace :PROPERTIES: :ID: design-observability +:CREATED: [2026-05-07 Wed] :END: When a human asks why the system made a decision, the answer must be findable. In most AI systems, the reasoning is ephemeral - it exists in the model's activations and disappears when the session ends. In Passepartout, every significant cognitive event is written to an Org buffer as it happens. @@ -197,6 +204,7 @@ Without observability, the system is a black box that happens to produce correct ** Literate Programming as Discipline :PROPERTIES: :ID: design-literate-programming +:CREATED: [2026-05-07 Wed] :END: The decision to use Org-mode as the source of truth for code, not just documentation, is not a ceremonial preference. It is a constraint mechanism that enforces better engineering habits at the cost of convenience. @@ -218,6 +226,7 @@ The literate programming discipline is not about producing documentation. It is ** The Evaluation Harness :PROPERTIES: :ID: design-evaluation-harness +:CREATED: [2026-05-07 Wed] :END: SOTA parity is meaningless without measurement. A system that claims to match commercial agents must demonstrate it through reproducible benchmarks, not through feature checklists. The evaluation harness is the apparatus by which Passepartout proves its capabilities. @@ -233,6 +242,7 @@ The harness also supports regression testing on the skill set. Every skill is te ** The MCP Strategy :PROPERTIES: :ID: design-mcp-strategy +:CREATED: [2026-05-07 Wed] :END: The Model Context Protocol (MCP) is a standard for connecting AI systems to external tools and data sources. It defines how a client requests tools from a server, how the server exposes its capabilities, and how the client invokes them. The ecosystem is growing: MCP servers exist for GitHub, Slack, Postgres, filesystem access, and much more. @@ -248,6 +258,7 @@ Passepartout's native client is smaller, faster, and more maintainable. The MCP ** Local-First Architecture :PROPERTIES: :ID: design-local-first +:CREATED: [2026-05-07 Wed] :END: Passepartout is designed to run on the user's machine, on their hardware, with their data, without requiring an internet connection. This is not a deployment option - it is an architectural commitment. The system must be able to reason, plan, and act using only the resources available locally. @@ -260,6 +271,8 @@ The symbolic engine does not require a network connection. The Prolog/Datalog re This does not mean Passepartout refuses to use cloud services when available and appropriate. It means cloud services are optional enhancements, not architectural requirements. The core is local. The user can choose to add cloud LLM providers for more capable inference, but the system functions without them. +*On live images and binaries.* Passepartout's primary delivery path is source code running in a live SBCL process. The REPL is available. Skills hot-reload. The cognitive loop runs in an image that is mutable, inspectable, and homeiconic — the user can connect with SLIME, trace functions, inspect memory objects, and modify the system while it runs. A ~save-lisp-and-die~ binary is provided as a convenience for platforms where SBCL cannot be installed (corporate laptops, shared hosts). The binary is the same image saved to disk with Swank pre-loaded — it is not a sealed container. The REPL works. Skills hot-reload. The binary is a packaging format, not an architectural decision. The system is constitutionally open in both delivery paths. + * Token Economics and Performance Advantage :PROPERTIES: :ID: design-token-economics @@ -367,7 +380,11 @@ Passepartout at 4K effective context: ~67 MB KV cache. Competitor at 128K: ~2.1 | Min viable local model | 3-4B params, 4K ctx | 30-70B params, 32K+ ctx | 30-70B params, 32K+ ctx | 7-13B params, 8K+ ctx | | Min VRAM for local | 4-6 GB | 16-32 GB | 24-48 GB | 8-16 GB | +*Note:* Observations about OpenClaw and Hermes Agent are based on their public documentation and repositories as of 2026-05. OpenClaw (github.com/openclaw/openclaw) is a TypeScript personal AI assistant by @steipete with a Node.js gateway, 25+ messaging channels, and Canvas/voice companion apps. Hermes Agent (github.com/NousResearch/hermes-agent) is a Python fork by Nous Research with a built-in learning loop, full TUI, and sub-agent delegation. Both use prompt-based safety guardrails rather than deterministic gates. Architectural claims should be re-verified as these projects evolve. + *Conclusion:* Passepartout's architecture is designed to produce 2-3x token savings for coding, 13-24x for knowledge management, and 2x for life management at v1.0.0 maturity. The three structural advantages — sparse trees, deterministic safety, and REPL verification — compound. The critical risk is implementation gap: achieving the retrieval precision, dispatcher learning, and REPL integration depth required to realize the design. + +*Note:* The token savings projections in this section (2–3x for coding, 13–24x for knowledge management) are architectural estimates based on the sparse-tree retrieval and deterministic safety mechanisms. They have not yet been empirically verified. A token audit harness will produce measured comparisons at v0.5.0 (Token Economics & Prompt Efficiency). Until then, the README cites the mechanisms (sparse-tree rendering, deterministic gates) rather than specific magnitudes. * Open Questions and Risks 1. *Retrieval accuracy is the bottleneck.* If sparse tree retrieval loads the wrong subtree (low-similarity but causally relevant), the LLM makes unfixable errors. The architecture assumes embedding quality is "good enough" — this is untested at scale. diff --git a/docs/ROADMAP.org b/docs/ROADMAP.org index 2e24c9d..c5234e7 100644 --- a/docs/ROADMAP.org +++ b/docs/ROADMAP.org @@ -4,34 +4,43 @@ * The Evolutionary Roadmap -The roadmap is designed working backwards from SOTA parity (v1.0.0), guiding each version toward a fully autonomous, self-editing agent. Each version builds on the previous, with features designed to be implemented in pure Common Lisp + Org-mode. - -The TODO states in each version's Tasks section are the authoritative task tracker. The feature tables describe what each version delivers. - -** Non-Negotiable Identity -- Pure Common Lisp + Org-mode. No JSON. No YAML. No external databases. -- Single-address-space memory (Lisp hash tables in RAM — the agent IS the memory). -- "Thin harness, fat skills" — complexity lives at the edges, not the kernel. -- One agent composed of many skills. Concurrency via bordeaux-threads (shared memory). -- Plists everywhere — homoiconic communication between all components. - -** Version Roadmap - Understanding Passepartout as a function in time is not nostalgia. It is architectural guidance. Every decision in v0.x should be made with awareness of where the system is going. Code written today becomes the substrate for v3.0. Skills designed today become the vocabulary the symbolic engine speaks tomorrow. The probabilistic beginning is not a weakness to overcome. It is the bootstrap. The system learns the domain through probabilistic inference, and that learned knowledge becomes the seed for the symbolic engine. By the time the symbolic engine takes over, it has a rich knowledge graph to reason about, grown from thousands of probabilistic interactions. -This is how you build a reasoning machine: start with a learner, make it learn to verify, let verification become the core, remove the learner once it has learned enough. +This is how you build a reasoning machine: start with a learner, make it learn to verify by watching itself and its user, let verification become the core. Every blocked action becomes a rule. Every approved exception becomes a pattern. The symbolic layer grows at the probabilistic layer's expense. Remove the learner once it has learned enough. -** Versioning Convention +Each version expands the deterministic layer. The Dispatcher writes rules from approved exceptions. Shadow mode runs trial executions. Tool permission tiers mature from simple allow/deny to nuanced context-aware policies. The agent becomes less likely to attempt dangerous actions not because it is smarter but because the guard has more complete information. + +The roadmap is designed working backwards from SOTA parity (v1.0.0), guiding each version toward a fully autonomous, self-editing agent. Each version builds on the previous, with features designed to be implemented in pure Common Lisp + Org-mode. + +The TODO states in each version's Tasks section are the authoritative task tracker. The feature tables describe what each version delivers. Feature releases increment the minor version (v0.X.0). Bugfix and hardening releases increment the patch version (v0.X.Y). This ensures that security patches and critical fixes are visible in the version number and can ship independently of feature work. No feature release ships without its prerequisite hardening releases resolved. -*** v0.1.0: The Autonomous Foundation — RELEASED 2026-04-20 +** File Update Checklist + +When a version's state changes (DONE → tested → released), update these locations: + +1. ~ROADMAP.org~ — mark item DONE, update LOGBOOK timestamp +2. ~README.org~ — update Current Capabilities table (add new Stable rows for shipped features, remove Planned rows that have shipped) +3. ~~.env.example~ — update version references as needed +4. ~lisp/core-communication.lisp~ — update the ~make-hello-message~ version string (current: ~"0.2.0"~) +5. ~passepartout~ (bash entry point) — update version reference + +On release: +1. Tag the release on GitHub +2. Extract DONE items from ROADMAP (all items with LOGBOOK timestamps since the last release tag) and use as the release notes body +3. If a ~CHANGELOG.md~ is needed for packaging tools, auto-generate it from ROADMAP DONE items + +** v0.1.0: The Autonomous Foundation — RELEASED 2026-04-20 +:PROPERTIES: +:RETROSPECTIVE: [2026-05-07 Wed] +:END: The secure, auditable Lisp kernel. All core infrastructure in place. -**** DONE Perceive-Reason-Act pipeline +*** DONE Perceive-Reason-Act pipeline :PROPERTIES: :ID: id-06f10b9a-4054-4dea-a927-b0935fbdcd2f :CREATED: [2026-03-22 Sun] @@ -40,7 +49,9 @@ The secure, auditable Lisp kernel. All core infrastructure in place. - State "DONE" from "TODO" [2026-04-20 Mon] :END: -**** DONE Skills engine with jailed loading +This established the three-stage cognitive cycle that all later features plug into. The pipeline is the invariant — skills, gates, actuators, and clients all compose through it. + +*** DONE Skills engine with jailed loading :PROPERTIES: :ID: id-dc83944f-3923-4142-b324-c317dacd6b0b :CREATED: [2026-03-22 Sun] @@ -49,7 +60,9 @@ The secure, auditable Lisp kernel. All core infrastructure in place. - State "DONE" from "TODO" [2026-04-20 Mon] :END: -**** DONE Policy skill (6 invariants) +This made the "thin harness, fat skills" identity operational. Skills loading into jailed packages (v0.1.0) is the foundation for the skill sandbox mode (v0.3.2) and the Skill Creator (v0.8.0). + +*** DONE Policy skill (6 invariants) :PROPERTIES: :ID: id-929c84b7-d6ae-42b9-a8b5-d9df962db826 :CREATED: [2026-03-22 Sun] @@ -58,7 +71,9 @@ The secure, auditable Lisp kernel. All core infrastructure in place. - State "DONE" from "TODO" [2026-04-20 Mon] :END: -**** DONE Memory (memory-object + Merkle hashing) +This established the "explanation required" invariant that gates stack above. The policy gate (priority 500) runs first and sets the precedent that every action must justify itself. + +*** DONE Memory (memory-object + Merkle hashing) :PROPERTIES: :ID: id-3a96b384-cacf-4da0-8faa-1647739feba9 :CREATED: [2026-03-22 Sun] @@ -67,7 +82,9 @@ The secure, auditable Lisp kernel. All core infrastructure in place. - State "DONE" from "TODO" [2026-04-20 Mon] :END: -**** DONE Scribe + Gardener background workers +The Merkle tree with content-addressed hashing made copy-on-write snapshots (v0.2.0) and MVCC concurrency (v0.6.1) possible. The hash-as-identity property also feeds directly into the foveal-peripheral model's semantic retrieval. + +*** DONE Scribe + Gardener background workers :PROPERTIES: :ID: id-3f618a38-ec23-4034-ba3c-ef272e212e2b :CREATED: [2026-03-22 Sun] @@ -76,7 +93,9 @@ The secure, auditable Lisp kernel. All core infrastructure in place. - State "DONE" from "TODO" [2026-04-20 Mon] :END: -**** DONE LLM gateway (OpenRouter, Ollama) +These background workers established the heartbeat-driven maintenance pattern. The event orchestrator (v0.3.0) generalizes this into hooks and cron jobs. + +*** DONE LLM gateway (OpenRouter, Ollama) :PROPERTIES: :ID: id-f5d870e2-cbd2-4c00-a8d4-174ab4118afc :CREATED: [2026-04-11 Sat] @@ -85,7 +104,9 @@ The secure, auditable Lisp kernel. All core infrastructure in place. - State "DONE" from "TODO" [2026-04-20 Mon] :END: -**** DONE Shell actuator, Emacs bridge, credentials vault +The provider-agnostic cascade pattern established in v0.1.0 makes the model-tier router (v0.3.0), privacy-aware routing (v0.3.0), and consensus loop (v0.10.0) possible — they all build on the same ~backend-cascade-call~ abstraction. + +*** DONE Shell actuator, Emacs bridge, credentials vault :PROPERTIES: :ID: id-7ca3167f-8353-4bb7-8b97-c039017716b0 :CREATED: [2026-04-11 Sat] @@ -94,7 +115,9 @@ The secure, auditable Lisp kernel. All core infrastructure in place. - State "DONE" from "TODO" [2026-04-20 Mon] :END: -**** DONE FiveAM test suite +The actuator registry pattern makes MCP tools (v0.7.0) possible — they register the same way. + +*** DONE FiveAM test suite :PROPERTIES: :ID: id-925d4180-764b-4219-8bdc-8e1849572da1 :CREATED: [2026-04-11 Sat] @@ -103,18 +126,16 @@ The secure, auditable Lisp kernel. All core infrastructure in place. - State "DONE" from "TODO" [2026-04-20 Mon] :END: -*** v0.2.0: Interactive Refinement — RELEASED 2026-04-29 +The test infrastructure established in v0.1.0 becomes the TDD runner (v0.7.1) and the SWE-bench harness (v0.9.0). + +** v0.2.0: Interactive Refinement — RELEASED 2026-04-29 +:PROPERTIES: +:RETROSPECTIVE: [2026-05-07 Wed] +:END: The "Brain" meets the "Machine." Standardization and professionalization of the user interface and environment. -*v0.2.0 through v0.3.0: The Dispatcher Learns* - -Each version expands the deterministic layer. The Dispatcher writes rules from approved exceptions. Shadow mode runs trial executions. Tool permission tiers mature from simple allow/deny to nuanced context-aware policies. The agent becomes less likely to attempt dangerous actions not because it is smarter but because the guard has more complete information. - -This is the bootstrapping phase. The system learns by watching itself and its user. Every blocked action becomes a rule. Every approved exception becomes a pattern. The symbolic layer grows at the probabilistic layer's expense. - - -**** DONE Professional TUI (Croatoan-based, styled, scrollable) +*** DONE Text User Interface (Croatoan-based, styled, scrollable) :PROPERTIES: :ID: id-57cef382-fe14-42e6-aade-03e05e3e920b :CREATED: [2026-04-28 Tue] @@ -123,7 +144,9 @@ This is the bootstrapping phase. The system learns by watching itself and its us - State "DONE" from "TODO" [2026-04-29 Wed] :END: -**** DONE Self-editing (error detection, surgical fix, hot-reload) +The Croatoan-based TUI with model-view separation and dirty-flag rendering is the foundation for all TUI improvements: word wrap in v0.3.3, gate trace in v0.4.0, tool visualization in v0.7.0, and streaming in v0.6.3. + +*** DONE Self-editing (error detection, surgical fix, hot-reload) :PROPERTIES: :ID: id-459b8275-9979-4d0f-8d61-a9af883930d4 :CREATED: [2026-04-23 Wed] @@ -132,7 +155,9 @@ This is the bootstrapping phase. The system learns by watching itself and its us - State "DONE" from "TODO" [2026-04-29 Wed] :END: -**** DONE Enhanced utilities (structural Lisp/Org manipulation + REPL) +The surgical edit + tangle + hot-reload pipeline (text replace → tangle → compile → load) established the self-modification capability that makes the Skill Creator (v0.8.0) safe — skills are generated, tangled, loaded, and verified in the same loop. + +*** DONE Enhanced utilities (structural Lisp/Org manipulation + REPL) :PROPERTIES: :ID: id-23f37c0d-4e77-4dc3-ab43-52a5987eb426 :CREATED: [2026-04-23 Wed] @@ -141,7 +166,9 @@ This is the bootstrapping phase. The system learns by watching itself and its us - State "DONE" from "TODO" [2026-04-29 Wed] :END: -**** DONE Onboarding wizard (modular Lisp setup for LLM providers) +Structural Lisp/Org manipulation tools are the primitives the self-improve module (v0.2.0) and the programming skills (literate block extraction, syntax validation) build on. + +*** DONE Onboarding wizard (modular Lisp setup for LLM providers) :PROPERTIES: :ID: id-bd497de7-3533-4056-b89f-2c992d2ea28b :CREATED: [2026-04-28 Tue] @@ -150,7 +177,9 @@ This is the bootstrapping phase. The system learns by watching itself and its us - State "DONE" from "TODO" [2026-04-29 Wed] :END: -**** DONE Memory rollback (snapshot and restore) +The setup wizard established the "works out of the box" constraint that the gateway QA (v0.4.0) and Emacs bridge (v0.4.0) onboarding flows follow. + +*** DONE Memory rollback (snapshot and restore) :PROPERTIES: :ID: id-fd2fb6e3-03e7-4e22-b9e9-a7eecfd06718 :CREATED: [2026-04-12 Sun] @@ -159,7 +188,16 @@ This is the bootstrapping phase. The system learns by watching itself and its us - State "DONE" from "TODO" [2026-04-29 Wed] :END: -**** DONE Secret Exposure Gate, Shell Safety, Lisp Validation +Copy-on-write snapshots (deep-copying the memory hash table on every write) gave the pipeline crash recovery. The snapshot mechanism is the root of MVCC concurrency (v0.6.1). + +** v0.3.0: Event Orchestration + HITL — DONE, UNRELEASED + +Unified control plane, Human-in-the-Loop state management, and backfill remediation +for stubs and gaps from v0.1.0/v0.2.0. All features are implemented but not yet +published. The security hardening patches (v0.3.1–0.3.3) will ship as follow-up +point releases before v0.4.0 feature work begins. + +*** DONE Secret Exposure Gate, Shell Safety, Lisp Validation :PROPERTIES: :ID: id-aa53c128-195b-42d4-9838-2def59faf7cf :CREATED: [2026-05-02 Sat] @@ -168,7 +206,7 @@ This is the bootstrapping phase. The system learns by watching itself and its us - State "DONE" from "TODO" [2026-05-02 Sat] :END: -**** DONE Multi-distro deployment (Debian+Fedora, systemd, Docker) +*** DONE Multi-distro deployment (Debian+Fedora, systemd, Docker) :PROPERTIES: :ID: id-783df999-f7fe-45c8-896d-2fd07c604d64 :CREATED: [2026-05-02 Sat] @@ -177,7 +215,7 @@ This is the bootstrapping phase. The system learns by watching itself and its us - State "DONE" from "TODO" [2026-05-02 Sat] :END: -**** DONE Project rename to Passepartout (files, packages, env vars) +*** DONE Project rename to Passepartout (files, packages, env vars) :PROPERTIES: :ID: id-91724874-aa0d-4804-9220-8bc5551f1366 :CREATED: [2026-05-02 Sat] @@ -186,7 +224,7 @@ This is the bootstrapping phase. The system learns by watching itself and its us - State "DONE" from "TODO" [2026-05-02 Sat] :END: -**** DONE 31 org files with full literate prose +*** DONE 31 org files with full literate prose :PROPERTIES: :ID: id-597b2a92-aac6-481a-b2c4-4f9842ced97c :CREATED: [2026-05-02 Sat] @@ -195,7 +233,7 @@ This is the bootstrapping phase. The system learns by watching itself and its us - State "DONE" from "TODO" [2026-05-02 Sat] :END: -**** DONE Human-in-the-Loop (HITL) +*** DONE Human-in-the-Loop (HITL) CLOSED: [2026-05-03 Sun 14:00] :PROPERTIES: :ID: id-hitl-complete @@ -208,7 +246,7 @@ Continuation-based interaction. The agent can suspend its cognitive loop to ask permission or clarification and resume precisely where it left off. Builds on the dispatcher's existing Flight Plan mechanism. -**** DONE Event Orchestrator (unified hooks+cron+routing) +*** DONE Event Orchestrator (unified hooks+cron+routing) :PROPERTIES: :ID: id-d35aea3d-2e5f-4a12-a9b0-1c2d3e4f5a6b :CREATED: [2026-05-02 Sat 14:00] @@ -223,7 +261,7 @@ Unified control plane for hooks, cron, and complexity-based routing. - Hooked into heartbeat for cron processing - Rule-based tier classifier (overrideable via ~*tier-classifier*~) -**** DONE Context Manager (project scoping) +*** DONE Context Manager (project scoping) CLOSED: [2026-05-05 Tue] :PROPERTIES: :ID: id-context-manager-scoping @@ -238,7 +276,7 @@ Stack-based project focusing with persistence. - ~/focus~/~/scope~/~/unfocus~ TUI commands - Context stack persisted to ~~/.cache/passepartout/context.lisp~, auto-restores on boot -**** DONE Model-Tier Routing (cost optimization) +*** DONE Model-Tier Routing (cost optimization) CLOSED: [2026-05-03 Sun 16:00] :PROPERTIES: :ID: id-model-tier-routing @@ -252,10 +290,10 @@ Extend ~*model-selector*~ for quadrant-based routing with per-slot provider casc - Quadrant tagging (foreground/background × probabilistic/deterministic) - Complexity classifier (code/plan/chat/background slots), each with its own provider cascade - Model-selector skill registers into =*model-selector*= hook -Deferred to v0.4.0: budget tracking per request, per-session cost monitoring. -Deferred to v0.9.0: TUI /config command for cascade configuration (env vars for now). +Deferred to v0.5.0: budget tracking per request, per-session cost monitoring. +Deferred to v0.10.0: TUI /config command for cascade configuration (env vars for now). -**** DONE Memory Scope Segmentation +*** DONE Memory Scope Segmentation CLOSED: [2026-05-03 Sun 16:30] :PROPERTIES: :ID: id-memory-scope-segmentation @@ -268,7 +306,7 @@ Extend memory-object with ~:scope~ property. - ~:memex~ (permanent knowledge), ~:session~ (ephemeral), ~:project~ (current work) - Scope-aware retrieval in memory layer -**** DONE Asynchronous Embedding Gateway +*** DONE Asynchronous Embedding Gateway CLOSED: [2026-05-05 Tue] :PROPERTIES: :ID: id-async-embedding @@ -290,11 +328,11 @@ Provider-agnostic vector generation (Ollama, OpenAI, hashing fallback). *Note:* The default ~:hashing~ backend uses SHA-256-derived vectors. SHA-256 is a cryptographic hash with the avalanche property — one-bit input differences produce entirely different outputs. This makes it a correct integrity check (Merkle tree) -but an incorrect similarity function (semantic retrieval). v0.3.3 replaces it with +but an incorrect similarity function (semantic retrieval). v0.4.0 replaces it with a zero-dependency lexical similarity algorithm that actually captures textual overlap while remaining offline-capable. -**** DONE TUI Experience (Daily Driver Quality) +*** DONE TUI Experience (Daily Driver Quality) CLOSED: [2026-05-05 Tue] :PROPERTIES: :ID: id-tui-experience @@ -313,7 +351,7 @@ All P0-P4 items implemented: - P4: Tab completion for all ~/~~ commands - P4: Configurable theme (~*tui-theme*~ plist, ~~/theme~~ command) -**** DONE v0.2.x Backfill Remediation (stubs and gaps) +*** DONE v0.2.x Backfill Remediation (stubs and gaps) CLOSED: [2026-05-03 Sun] :PROPERTIES: :ID: id-v02x-remediation @@ -335,7 +373,7 @@ CLOSED: [2026-05-03 Sun] - P3: Variable name drift normalization (*memory* vs *memory-store*, *skills-registry* vs *skill-registry*) - P4: Eliminate STYLE-WARNINGs from setup output (reorder defuns for same-file forward references; accept cross-skill references) -**** DONE Project Renaming (Bouncer → Dispatcher) +*** DONE Project Renaming (Bouncer → Dispatcher) :PROPERTIES: :ID: id-9e779580-287b-b3d1-37b9-bcefd750bf9e :CREATED: [2026-05-01 Fri 15:40] @@ -345,13 +383,8 @@ CLOSED: [2026-05-03 Sun] :END: The Dispatcher's role has evolved beyond security guard. It is the seed of the deterministic engine — it learns to execute procedures without invoking the neural net. -*** v0.3.x: Security Hardening -Before any feature work proceeds, three classes of vulnerability are patched. These are not feature releases — they are the floor the system must stand on before v0.4.0 feature development begins. The versioning reflects this: patch releases (v0.3.Y) are reserved for fixes with zero architectural impact. - -*A note on parser safety and competitive positioning:* SBCL defaults to ~*read-eval* t~, which means the =#.= reader macro can execute arbitrary Lisp during parsing. Three code paths in the current codebase read untrusted input without binding ~*read-eval* nil~ — the LLM output parser, the memory snapshot loader, and the system eval actuator. This is not a theoretical risk: a single hallucinated or adversarial LLM output containing =#.(shell "dangerous command")= bypasses all nine vectors of the Dispatcher's safety gate before any gate ever sees the action. No other SOTA agent parses unstructured model output with an eval-capable reader — they use JSON schemas, function-calling APIs, or at minimum bind ~*read-eval* nil~. Fixing this is three lines of Lisp and gives Passepartout an immediate safety advantage: the same deterministic safety gates that other agents lack are now structurally guaranteed to see every action before it executes. - -**** v0.3.1 — Parser RCE elimination +*** v0.3.1 — TODO Parser RCE elimination Rationale: SBCL's default ~*read-eval* accessor is ~t~, enabling the ~#.~ reader macro to execute arbitrary Lisp forms during parsing. Three code paths in the current codebase process untrusted input with ~read-from-string~ or ~read~ without binding ~*read-eval*~ to ~nil~. Each represents a remote code execution vector that bypasses all deterministic safety gates — the Dispatcher's shell safety check, path protection, secret scanning, and network exfiltration detection never execute because the malicious form is evaluated during parsing, before the action plist is even constructed. @@ -360,7 +393,7 @@ Rationale: SBCL's default ~*read-eval* accessor is ~t~, enabling the ~#.~ reader - Wrap ~read-from-string~ in ~action-system-execute~ (core-loop-act.lisp:62) with ~(let ((*read-eval* nil)) ...)~ — the ~:system :eval~ path executes untrusted payload code. Explicitly assert that this path requires the Dispatcher's approval gate. - Add FiveAM test: inject ~"(#.(shell \"echo pwned\"))"~ into the ~think()~ pipeline and assert no shell execution occurs. -**** v0.3.2 — Shell safety & actuator sandboxing +*** v0.3.2 — TODO Shell safety & actuator sandboxing Rationale: The ~:system :eval~ actuator path is currently unchecked by the Dispatcher's approval gate — only ~:shell~ and ~:tool "shell"~ trigger HITL. The shell actuator wraps commands through double ~bash -c~ nesting (~system-actuator-shell.lisp:10~), where Lisp's ~format~ with ~s~ produces S-expression-safe strings, not shell-safe strings. A command containing quotes or substitution characters can break out. Additionally, skill files loaded via ~skill-initialize-all~ execute arbitrary Lisp in jailed packages — a skill file containing ~(uiop:run-program "dangerous")~ executes immediately on load before any gate can inspect it. @@ -369,7 +402,24 @@ Rationale: The ~:system :eval~ actuator path is currently unchecked by the Dispa - Add skill sandbox mode for ~skill-initialize-all~: load each skill's code into a temporary jailed package, run the registered trigger function in isolation, verify it imports no restricted symbols (from CL package: ~run-program~, ~shell~, ~run-shell-command~), then promote to the live registry on pass. - Add FiveAM test: register a skill containing ~(uiop:run-program "echo test")~ in the body and verify the sandbox blocks its promotion. -**** v0.3.3 — Semantic retrieval activation +*** v0.3.3 — TODO TUI Critical Fixes + +Rationale: The TUI is Passepartout's only interface. OpenClaw distributes across 25+ messaging channels with voice, Canvas, and macOS/iOS apps. Hermes Agent ships multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output in its TUI. Passepartout's Croatoan TUI must carry the product alone, and it currently lacks word wrap, cursor movement, resize handling, connection-loss feedback, a quit command, and persistent history. None of these fixes require daemon changes — they are pure client-side Croatoan work that closes the gap from "proof of concept" to "daily driver." + +- Word wrap in ~view-chat~: every LLM response longer than the terminal width is silently truncated to one line. Croatoan supports multi-line rendering; ~view-chat~ must calculate per-message line height, adjust visible-message count accordingly, and scroll per message-line rather than per message. For very long messages, add a pager mode where pressing Enter on a message opens it in a scrollable overlay. +- Left/Right cursor in input: add ~:left~ and ~:right~ key handlers that move a cursor position index within the ~:input-buffer~ list. Characters are inserted at the cursor position, not always appended. Backspace deletes at the cursor position. +- SIGWINCH handler: register a terminal resize signal. On resize, re-measure the root window, destroy and recreate the three sub-windows (~sw~, ~cw~, ~iw~), set all dirty flags to ~t~, and force a full redraw. +- Connection-loss detection: the reader thread currently polls ~recv-daemon~ silently on EOF. On disconnection, queue a ~:disconnected~ event, set ~:connected~ to ~nil~, clear ~:busy~, add a red system message "Connection lost — run /reconnect to retry." The ~:disconnected~ event dirties the status bar to show the status indicator. +- ~/quit~ command + persistent history: on ~/quit~, save ~:input-history~ to ~~/.cache/passepartout/history~ (one line per entry, most recent first), send a goodbye handshake to the daemon, close the socket, and exit the main loop cleanly. On startup, load history from the save file if it exists. +- Scroll offset clamping: clamp ~:scroll-offset~ to ~(max 0 (- msg-count visible-lines))~. The status bar shows ~"msgs:12/45"~ (visible / total) rather than ~"msgs:45"~ (total only) so the user knows when they've scrolled past the oldest message. +- Message list storage: replace the O(n²) ~(nth i msgs)~ list indexing with a simple adjustable vector. ~add-msg~ appends; ~view-chat~ iterates with ~aref~. The vector is resized as needed. Same API surface, 100x speedup on message-heavy sessions. +- Add FiveAM tests: word-wrap produces correct line count for a 200-character string at 80-column width; cursor left/right wraps at buffer boundaries; SIGWINCH preserves message state; ~/quit~ saves and restores history. + +** v0.4.0: Production Hardening + +The features in this version were originally sequenced as v0.3.x patches but represent feature-level scope. They activate the architectural advantages designed in v0.1.0–v0.3.0, harden the self-build safety boundary, and expand Passepartout's interaction surfaces beyond the terminal TUI. Each feature depends on infrastructure already in place — the wiring, the sandbox, the gate trace — and activates it. + +*** TODO Semantic retrieval activation Rationale: Two independent failures prevent the foveal-peripheral semantic retrieval path from ever firing. First, ~context-awareness-assemble~ never passes ~:foveal-vector~ to ~context-object-render~, so the renderer receives ~nil~ for ~foveal-vector~ and the similarity calculation always returns 0.0. Second, the default ~:hashing~ embedding backend uses SHA-256 (a cryptographic hash with the avalanche property) as a similarity function. SHA-256 is designed to produce entirely different outputs for nearly identical inputs — the property that makes it secure for integrity verification is precisely what makes it useless for semantic retrieval. A content-addressed Merkle tree correctly uses SHA-256 for identity; a retrieval engine needs a similarity function, not an identity function. The infrastructure for real embeddings (~local~ with Ollama, ~openai~ with the embeddings API) is fully implemented and working — this release activates the last-mile wiring and replaces the semantically blind default with a zero-dependency algorithm that actually captures textual overlap. @@ -379,15 +429,77 @@ Rationale: Two independent failures prevent the foveal-peripheral semantic retri - Add ~EMBEDDING_PROVIDER~ guidance to the setup wizard: explain that ~:hashing~ is the default offline fallback, ~:local~ requires Ollama with ~nomic-embed-text~, and ~:openai~ uses the paid embeddings API. - Add FiveAM test: ingest two semantically related nodes ("implement login form" and "add password authentication"), verify cosine similarity > 0.0 with the trigram backend. -** Competitive Advantage Analysis — v0.3.x Summary +*** TODO Self-build safety boundary -Safety is Passepartout's strongest differentiator, but the current codebase undermines it at the parser level. Fixing the three ~*read-eval*~ gaps means the nine deterministic safety vectors actually see every action before execution. No competitor — not Claude Code, not OpenClaw, not Hermes — has a comparable stacked gate architecture. By fixing the parser vulnerability, the architecture's safety claim becomes structurally true rather than aspirationally documented. +Rationale: Self-building (the agent modifying its own source code) begins at v0.7.1 when the tool ecosystem and test runner are in place. But self-building without path-level write protection means the agent can modify the very pipeline code that is currently executing — the ~core-*~ files that implement the Perceive-Reason-Act cycle, the Merkle-tree memory, the skill engine loader, and the Dispatcher gate stack itself. A hallucination or a logic error during self-building that corrupts ~core-loop-reason.lisp~ destroys the agent's ability to reason about and fix the corruption. The "thin harness" is not privileged code in the architectural sense (homoiconicity means any code can be modified at runtime), but it must be *protected* code — modifications to the harness require a human in the loop, enforced by the Dispatcher's path-protection gate, not by convention. -The semantic retrieval fix (one line to wire ~foveal-vector~, one backend replacement) activates the foveal-peripheral model's full power: deep nodes that are topically related to the user's focus now surface automatically. Without this, the context model is "dumb truncation at depth 2." With it, it's genuine semantic awareness — and since the retrieval is deterministic (in-image vector math, zero LLM tokens), the cost advantage over competitors' LLM-assisted search compounds with every query. +This is the corollary to "thin harness, fat skills": the harness is thin enough to be auditable by a human, and the Dispatcher ensures it stays that way. Skills and system modules expand freely; the core contracts to a minimal, protected kernel. -The shell and actuator fixes close the remaining execution-surface gaps. The skill sandbox mode creates a loading boundary that no current agent framework provides — skills are verified before they join the running image, not trusted by convention. +- Add ~core-*~ patterns to ~*dispatcher-protected-paths*~: ~core-*.org~, ~core-*.lisp~, and their tangled equivalents. Any file write, file read-that-prefaces-a-write, or shell command targeting these paths triggers the Dispatcher's blocking gate. +- The blocked action produces a Flight Plan (HITL approval required). The human reviews the proposed core change in an Org buffer before approving. This is the same mechanism that governs shell commands and network exfiltration — the core protection is a path-specific instance of the existing gate, not a new gate. +- Implement a ~SELF_BUILD_MODE~ env var. When ~SELF_BUILD_MODE=true~ (default ~false~): + - Core path protection is active (writes blocked, HITL required) + - Non-core writes proceed through the standard Dispatcher gate (permissions table + policy + Bouncer) + - ~SELF_BUILD_MODE=false~ disables core protection entirely — useful during initial development when the human is manually editing core files and doesn't want every save to trigger a Flight Plan +- Telemetry: track self-build actions (core modifications proposed, core modifications approved, core modifications denied). This is the dataset that the Dispatcher's learning system uses in v3.0.0 to understand which core modifications are safe enough to automate. +- Add FiveAM test: simulate a write to ~core-loop.lisp~, verify the Dispatcher returns a ~:LOG~ rejection with ~"protected path"~ in the message. -*** v0.4.0: Token Economics & Prompt Efficiency +*** TODO TUI Differentiator Visualization + +Rationale: Three architectural elements exist today in the daemon that no competitor can render — the Dispatcher gate trace, the foveal-peripheral focus map, and the rules-learned counter. All three run in pure Lisp with 0 LLM tokens. None are visible to the user. Making them visible turns Passepartout's architecture from an internal mechanism into a trust-building UX — the user sees exactly which safety gates passed, exactly what the agent is focusing on, and exactly how many rules the Dispatcher has learned from their decisions. No competitor can ship this because none has deterministic gates to trace, foveal-peripheral context to map, or a rule-synthesizing Dispatcher to count. + +- Gate trace per action: extend the daemon's response plist to include ~:gate-trace~ — a list of ~(:gate :result <:passed | :blocked | :approval>)~ entries produced by ~cognitive-verify~. The TUI renders each entry as a colored line below the corresponding agent message: green ~✓ Dispatcher: path allowed~, red ~✗ Dispatcher: blocked (shell safety)~, yellow ~→ HITL required: /approve HITL-ab12~. Gate trace lines are dim and collapsible (press Tab on a message to toggle trace visibility). This turns the invisible nine-vector safety gate into the user's primary trust mechanism. +- Focus map in status bar: add a second status bar line showing ~[Focus: core-loop.lisp:think()] [Scope: passepartout] [3 related nodes]~. The daemon already tracks ~foveal-id~ and ~*scope-resolver*~ in the signal plist; the TUI reads these from the most recent response and renders them. Related node count comes from the number of objects with cosine similarity ≥ threshold in the last context assembly. This shows the user *what the agent is looking at* — the single biggest trust gap in AI agents. +- Rule counter in status bar: ~[Rules: 47]~. The Dispatcher's ~*hitl-pending*~ hash table and approved/disallowed memory-object entries provide the count — every HITL decision that produces a rule increments it. The TUI reads the count from a new daemon response field ~:rule-count~. The user watches the counter tick up as they teach the agent their preferences. +- Expanded theme: replace the 7-flat-color ~*tui-theme*~ with a 25-color layered system organized by message category (roles, content types, tool visibility, gate states, status). See the design discussion for the full color mapping. Implement a ~/theme ~ command that swaps between named presets (~dark~, ~light~, ~solarized~, ~gruvbox~). Theme change persists to disk and reloads on next session. +- Add FiveAM tests: gate trace renders correctly for pass/block/approval states; focus map updates when ~foveal-id~ changes; rule counter increments on HITL approval. + +*** TODO Gateway QA, Discord, Slack + Emacs Bridge + +Rationale: Passepartout currently has Telegram and Signal gateways in the codebase, both untested. The setup wizard has Slack as a configurable option with no implementation. Two messaging channels is not competitive — OpenClaw has 25+, Hermes Agent has 6+. But more critically: the Lisp crowd is Passepartout's natural audience, and they live in Emacs. An Emacs bridge that speaks the framed TCP protocol is trivial to implement (the protocol is ~200 lines of Lisp; porting to elisp is straightforward) and turns every Emacs buffer into a Passepartout interaction surface. This is not the deep Emacs integration of v0.10.2 (where the agent controls Emacs) — this is Emacs controlling the agent over TCP. The Emacs user selects a region, hits ~M-x passepartout-send-region~, and the agent responds in a dedicated buffer. They never leave their editor. + +Gateway: +- Integration tests for Telegram gateway: mock the Telegram Bot API, verify message send (POST ~/sendMessage~) and receive (GET ~/getUpdates~) round-trip. Verify HITL commands (~/approve~, ~/deny~) are intercepted before injection. +- Integration tests for Signal gateway: mock ~signal-cli~ output, verify JSON message parsing and polling loop. Verify send path constructs correct ~signal-cli send~ arguments. +- Add Discord gateway: Discord Bot API (REST + Gateway WebSocket for real-time messages). Register bot, handle ~MESSAGE_CREATE~ events, send via ~POST /channels/{id}/messages~. Map Discord mentions to ~:user-input~ signals. HITL commands work identically to Telegram. +- Add Slack gateway: Slack Events API + Web API. Subscribe to ~message.im~ events, send via ~chat.postMessage~. Reuse the SLACK_TOKEN config key already present in the setup wizard. +- Each gateway is a skill under ~passepartout.skills.gateway-~ — jail-loaded, hot-reloadable, sandbox-verified. +- Gateway configuration surfaced in the setup wizard: after entering a token, offer "send a test message to yourself" as a connection verification step. Surface the result as a green ✓ or red ✗ with the error detail. +- Gateway status displayed in ~messaging-list~: platform, configured (yes/no), gateway active (yes/no), last message received (timestamp). + +Emacs Bridge: +- Elisp package: ~passepartout.el~. Connects to daemon on localhost:9105 via ~make-network-process~ (TCP). +- Sends: framed plist protocol identical to the TUI (~frame-message~ ported to elisp — write hex length prefix, write prini'd plist). The daemon does not know or care whether the client is the Croatoan TUI, the CLI, or Emacs. +- Receives: daemon responses arrive in a ~passepartout-response~ buffer. Each response is rendered as an Org headline: role prefix, timestamp, content. Gate trace (from v0.4.0) is rendered as property drawer entries under the headline. +- ~M-x passepartout-send-region~: sends the selected region as a ~:user-input~ signal with the current buffer's file path as context. +- ~M-x passepartout-send-buffer~: sends the entire buffer. +- ~M-x passepartout-focus~: sets the foveal focus to the Org headline at point (extracts ~:ID:~ property, sends ~:point-update~ signal). Equivalent to the TUI's ~/focus~ command. +- ~M-x passepartout-approve~ / ~M-x passepartout-deny~: prompts for HITL token and sends approval/denial. +- Agent modifies an Org file → Emacs receives ~:buffer-update~ via the bridge → the buffer is refreshed (~revert-buffer~ or targeted replacement). +- The Emacs bridge is the daily driver for Lisp users. The TUI remains for non-Emacs users and for the differentiator visualizations. Emacs users get the gate trace and focus map as Org property drawers in the response buffer — same data, elisp-native rendering. + +**** TODO Native embedding inference + +Rationale: The foveal-peripheral model depends on vector similarity to surface semantically related nodes. Without vectors, retrieval is depth-2 truncation with no semantic boosting. The trigram Jaccard fallback provides real lexical signal — "login bug" shares trigrams with "authentication error" — but cannot surface nodes with zero lexical overlap ("password reset flow" vs "login broken"). A real embedding model closes this gap. Embedding inference is 10x simpler than chat LLM inference: single forward pass, no autoregressive decoding, no KV cache, no streaming. The CFFI binding is ~150 lines of Lisp. + +- FFI binding to llama.cpp's embedding API (~150 lines of CFFI). Call ~llama_get_embeddings~ after a single forward pass. No KV cache, no sampling, no streaming required — embedding models are BERT-family, single-pass. +- Ship all-MiniLM-L6-v2 (23MB GGUF) as the bundled embedding model. 384-dimensional vectors, CPU-friendly (<100ms per document on any modern CPU), produces semantically meaningful vectors with zero external dependencies. +- ~embedding-backend-local~ detects the bundled model at configure time. If present, uses in-process inference by default. Falls back to Ollama (~EMBEDDING_PROVIDER=local~) or OpenAI (~EMBEDDING_PROVIDER=openai~) if the model is missing or the user prefers external inference. +- ~EMBEDDING_PROVIDER~ default becomes ~:native~ (the bundled model). The model file lives at ~~/.local/share/passepartout/models/all-MiniLM-L6-v2.gguf~, downloaded on first ~configure~ if not present. +- The trigram Jaccard backend remains as a further fallback for environments where even 23MB is too large (embedded, resource-constrained). SHA-256 hashing is removed entirely — it was semantically blind. +- Add FiveAM test: embedding a document with the native backend produces a 384-dimensional float vector; identical documents produce identical vectors. + +*** Competitive Advantage Analysis — v0.4.0 Summary + +Production hardening is the process of turning architectural potential into operational strength. The semantic retrieval fix activates the foveal-peripheral model's full power: deep nodes that are topically related to the user's focus now surface automatically. Without this, the context model is "dumb truncation at depth 2." With it, it's genuine semantic awareness — and since the retrieval is deterministic (in-image vector math, zero LLM tokens), the cost advantage over competitors' LLM-assisted search compounds with every query. + +The self-build safety boundary is a capability no competitor provides: the agent cannot modify its own brain stem without human review. The ~core-*~ path protection means the Dispatcher draws a line at the filesystem level, not the policy document level. Claude Code, OpenClaw, and Hermes all allow agents to modify their own source files without distinction between application code and runtime code. Passepartout's Dispatcher prevents modification of the very pipeline that implements the Perceive-Reason-Act cycle, the Merkle-tree memory, the skill engine loader, and the Dispatcher gate stack itself. This is the operational realization of "thin harness, fat skills" — the harness is thin enough to be auditable by a human, and the Dispatcher ensures it stays that way. + +The TUI differentiator visualizations are Passepartout's permanent UX advantage. The gate trace, focus map, and rule counter are UX elements that only make sense in Passepartout's architecture — deterministic gates, foveal-peripheral context, and Dispatcher rule synthesis exist nowhere else. No competitor can ship this because none has deterministic gates to trace, foveal-peripheral context to map, or a rule-synthesizing Dispatcher to count. Combined with the TUI critical fixes from v0.3.3, the TUI is competitive on usability and uniquely informative on safety and context transparency. + +The messaging gateways and Emacs bridge expand Passepartout's interaction surface from a single terminal TUI to four surfaces: terminal, Telegram/Signal/Discord/Slack messaging, Emacs, and voice (via the voice gateway in v0.7.3). The Emacs bridge is strategically critical — the Lisp crowd is Passepartout's natural audience, and they live in Emacs. An Emacs bridge that speaks the framed TCP protocol turns every Emacs buffer into a Passepartout interaction surface. Combined with the gate trace and focus map rendered as Org property drawers in the response buffer, Emacs users get the same differentiator visualizations as TUI users — same data, elisp-native rendering. + +** v0.5.0: Token Economics & Prompt Efficiency The architecture's single largest gap versus SOTA: Passepartout currently spends tokens like a research prototype. Every ~think()~ call rebuilds and retransmits the full system prompt — IDENTITY + TOOLS + CONTEXT + LOGS + SKILL_AUGMENTS — with no caching, no budget, and no incremental assembly. The foveal-peripheral model prunes memory content but doesn't touch the fixed overhead. With 20+ skills by v1.0.0, system prompt overhead alone could reach 3,000–8,000 tokens per call before user input is even processed. @@ -395,37 +507,49 @@ Competitors (Claude Code, OpenClaw, Copilot) all implement some form of prefix c **Design insight: why token economics is the structural differentiator.** Passepartout's sparse-tree rendering and deterministic safety gates should produce 2–3x fewer tokens than competitors for equivalent coding tasks, and 13–24x fewer for knowledge management. But without caching and budget enforcement, the fixed overhead per call eats these savings. A coding session that touches 30 files with competent context management costs ~72K tokens (Passepartout) versus ~185K (Claude Code). Without caching, the Passepartout number climbs toward ~150K because every call retransmits the static prefix. The architectural advantage exists in theory but requires operational plumbing to materialize. -**** TODO Tokenizer integration +*** TODO Tokenizer integration - Integrate a tokenizer for at minimum the model families used in the provider cascade (cl100k_base for OpenAI, claude-3 tokenizer for Anthropic). Options: FFI binding to tiktoken via CFFI, or a pure-Lisp port of the BPE tokenizer for cl100k_base (the encoding table is ~100KB, the algorithm is ~100 lines). - Expose ~(count-tokens text &key model)~ as a core utility. - Use for three purposes: context budget enforcement (reject assembly if over limit), cost estimation (tokens × provider price), and prompt optimization (measure which sections of the system prompt consume the most budget). -**** TODO Prompt prefix caching +*** TODO Prompt prefix caching - Split the system prompt into a static prefix (IDENTITY string, TOOLS section, LOGS format header) and a dynamic suffix (CONTEXT render, current log entries, skill augments, user prompt). - Track a hash of the static prefix; only retransmit when it changes (skill load/unload, identity config change). On cache hit, send the cached prefix with the dynamic suffix appended. - Implement the Anthropic prompt-caching header protocol for providers that support it (claude-3-* models, up to 90% discount on cached tokens). For OpenAI, the automatic caching layer handles prefix detection without explicit headers. - Log cache hit/miss rate to telemetry for cost tracking. -**** TODO Incremental context assembly +*** TODO Incremental context assembly - Cache the last rendered ~context-awareness-assemble~ string with metadata: foveal-id at render time, scope, last memory modification timestamp. - On ~think()~ invocation: if foveal-id, scope, and memory-modification-timestamp are unchanged since the cached render, return the cached string. This eliminates re-rendering on heartbeat ticks, tool-output feedback loops, and multi-turn conversations where the user hasn't changed focus. - Invalidate the cache on any ~ingest-ast~ call, any ~org-modify~, or any focus change. - For heartbeats specifically: skip context assembly entirely — the heartbeat sensor bypasses the reason gate (returns early in ~loop-gate-reason:154~), so building awareness for a signal that won't call the LLM is pure waste. Add an early return in ~think()~ for ~:heartbeat~ / ~:delegation~ sensors. -**** TODO Per-call token budget +*** TODO Per-call token budget - ~CONTEXT_MAX_TOKENS~ env var (default: 16384, half of a 32K context window to leave room for model response). - In ~think()~: compute total token count (static prefix + dynamic context + user prompt). If over budget, progressively trim: first truncate system logs to 5 lines, then drop skill augments from non-triggered skills, then if still over, downgrade peripheral nodes to title-only (disable ~:foveal-vector~ path, render strict depth ≤ 2). - Log budget violations to telemetry with the trimmed-token count for diagnostics. - The goal: Passepartout never silently exceeds a model's context window. Silent truncation by the model API produces undefined behavior (mid-thought cutoff, lost instructions). A system that knows it's over budget can degrade intentionally. -**** TODO Cost tracking +*** TODO Cost tracking - Per-provider pricing lookup table: input/output token costs for each model in the provider cascade (gpt-4o-mini, claude-3-5-sonnet, deepseek-chat, llama-3.1-70b, groq-llama, etc.). - After each ~backend-cascade-call~: compute cost as (input_tokens × input_price + output_tokens × output_price), log to session accumulator, emit ~:cost-update~ telemetry event. - Per-session cumulative cost stored in memory (~*session-cost*~ plist: ~(:total :by-provider :by-task )~). -- TUI status bar shows current session cost (optional, off by default, toggled via ~/cost~ command). +- TUI status bar shows current session cost (optional, off by default, toggled via ~/cost~ command). The cost counter renders as ~[Session: $0.12]~ in the status bar, updating after each ~backend-cascade-call~. Color: green when under 50% of daily budget, yellow at 50-90%, red above 90%. - ~COST_BUDGET_DAILY~ env var with soft cap — warning injected into system prompt when approaching budget, HITL gate on any single action exceeding 25% of remaining budget. -** Competitive Advantage Analysis — v0.4.0 Summary +**** TODO Self-configuring setup binary + +Rationale: The current ~passepartout configure~ flow is a bash script that detects Debian or Fedora, installs packages, installs Quicklisp, tangles Org sources, and runs the setup wizard. It handles 2 distro families. It fails on everything else. A self-configuring setup with a small LLM expands coverage to "anything with a package manager" without shipping gigabytes of model data. The key constraint: the LLM follows a decision tree for setup, it does not improvise. This keeps setup reliable while expanding coverage. + +- The setup binary (~passepartout-setup~) is a ~save-lisp-and-die~ executable (~100MB: SBCL runtime + core Lisp code + native embedding inference from v0.4.0 + 23MB embedding model). No SBCL install required. No Quicklisp. No bash script. The user runs one file. +- Deterministic path (default, always runs first): the same distro detection, package installation, and configuration logic from today's bash script, reimplemented in Lisp. Handles Debian and Fedora families. Covers the common case without touching an LLM. +- LLM-assisted path (optional, activates on deterministic failure): downloads Qwen2.5-0.5B (~500MB GGUF, pinned by hash, cached to ~~/.local/share/passepartout/models/~). The model reads command output, classifies success/failure/recoverable-error from a finite set of outcomes, and selects the next corrective action from a constrained decision tree. On unrecognized failures, generates a diagnostic for the user. +- Model hash verification: the GGUF file is pinned by SHA-256 hash. If the hash doesn't match (wrong version, corrupted download), fall back to deterministic setup with a warning. The bootstrap tool must not fail silently because of a model mismatch. +- After setup completes, the binary exits. The user runs ~passepartout daemon~ to start the full system (a live SBCL process, not a sealed binary — REPL, hot-reload, self-modification all available). +- The setup binary is a bridge. It gets the system installed and configured, then gets out of the way. The final system is a live Lisp image, not a sealed binary. +- Add FiveAM test: the deterministic path succeeds on a system with all dependencies pre-installed; the LLM-assisted path correctly classifies 10 common package-manager error messages. + +*** Competitive Advantage Analysis — v0.5.0 Summary Token economics is the dimension where the architecture's theoretical advantage becomes operationally real. The foveal-peripheral model and deterministic gates reduce the tokens *needed* per task; prompt caching and incremental assembly reduce the tokens *spent* per task. Combined, the 2–3x coding savings and 13–24x knowledge management savings in the DESIGN_DECISIONS token analysis become achievable rather than aspirational. @@ -433,13 +557,13 @@ The cost tracking and budget enforcement are defensive advantages: no competitor The minimum viable local model advantage is structural: at 2,000–4,000 effective tokens (foveal-peripheral + caching), a 7–8B parameter model on consumer hardware is a daily driver. Competitors at 32K+ effective tokens require 70B+ parameter models and 16–32 GB VRAM. Passepartout runs on a laptop GPU where competitors need a data center card or cloud API. -*** v0.5.0: Signal Pipeline & Concurrency +** v0.6.0: Signal Pipeline, Concurrency & Streaming The current pipeline is strictly sequential — one signal traverses Perceive → Reason → Act before the next signal begins. Background tasks (heartbeat, embedding cron, gardener scans) compete with foreground interactions. A heartbeat that fires during a long tool chain is queued. A Telegram message during a multi-step planning cycle is queued. The system feels sluggish under concurrent load even though the symbolic operations are near-instant (SBCL hash table lookups are microseconds) — the bottleneck is the single-pipeline architecture, not the hardware. -**Design insight: why concurrency matters for an agent that is "one brain."** Passepartout rejects multi-agent delegation on principle (see DESIGN_DECISIONS "One Single Agent"). But a single brain handles multiple inputs simultaneously — the human brain processes vision, audio, and proprioception in parallel. Rejecting multi-agent delegation does not require rejecting concurrency within the agent. The key is that all concurrent operations share the same memory space, the same Merkle tree, and the same deterministic gate stack. They are threads of one cognition, not separate agents. +*Design insight: why concurrency matters for an agent that is "one brain."* Passepartout rejects multi-agent delegation on principle (see DESIGN_DECISIONS "One Single Agent"). But a single brain handles multiple inputs simultaneously — the human brain processes vision, audio, and proprioception in parallel. Rejecting multi-agent delegation does not require rejecting concurrency within the agent. The key is that all concurrent operations share the same memory space, the same Merkle tree, and the same deterministic gate stack. They are threads of one cognition, not separate agents. -**** TODO Priority-queue signal processing +*** TODO Priority-queue signal processing - Replace the linear ~process-signal~ call chain with a priority-ordered signal queue. The queue is a sorted plist-list consumed by the main loop. Priority tiers: - ~:user-input~ / ~:chat-message~ — highest priority (the user is waiting) - ~:approval-required~ — high (HITL re-injections need quick resolution) @@ -449,8 +573,9 @@ The current pipeline is strictly sequential — one signal traverses Perceive - Coalesce duplicate heartbeats: if the queue already contains a ~:heartbeat~ signal when a new one arrives, discard the older one (no value in processing stale ticks). Keep at most one pending heartbeat at any time. - The main loop drains the highest-priority signal from the queue, processes it through the pipeline, and repeats. If the pipeline produces feedback (tool-output → think), the feedback is enqueued at its appropriate priority — it may preempt background signals but won't interrupt the current signal mid-processing. - Add telemetry: average queue depth by priority tier, max wait time per tier. +- TUI ~/reconnect~ command: when the connection-loss detection from v0.3.3 fires, the user can reconnect without restarting the TUI. The command closes the stale socket, re-runs ~connect-daemon~ with its retry backoff, and restores the ~:connected~ state on success. -**** TODO MVCC memory concurrency +*** TODO MVCC memory concurrency - Replace ~*memory-store*~ (mutable global hash table) with a versioned Merkle-root pointer. The root is an ~(or null merkle-node)~ struct containing the tree and a monotonic version counter. - Read threads snapshot the root before beginning their pipeline cycle. All object lookups dereference through the snapshot — they see a consistent view of memory regardless of concurrent writes. Reads never block. - Write threads (ingest-ast, org-modify, snapshot-memory) build new object hashes, construct a new Merkle root, and CAS-replace the global root pointer. If another thread won the CAS race (root version changed), the loser re-reads the new root, replays its changes on the updated tree, and retries the CAS. @@ -458,13 +583,25 @@ The current pipeline is strictly sequential — one signal traverses Perceive - Remove the single-threaded pipeline assumption: previously, ~process-signal~ was safe because nothing else wrote to ~*memory-store*~ during its execution. With MVCC, multiple signals can process concurrently because each has its own snapshot. The ~*loop-interrupt-lock*~ becomes ~*signal-queue-lock*~ (protecting only the queue, not the memory). - Test: concurrent ingest-ast from two threads writing to different memory objects, verify both commits succeed without corruption. -**** TODO Structured output enforcement +*** TODO Structured output enforcement - Add a plist validation step between ~markdown-strip~ and ~read-from-string~ in ~think()~. Before attempting to parse, validate: (a) the output starts with ~(~ or ~[~, (b) it contains balanced delimiters (count opens vs closes), (c) it doesn't contain ~#.~ (redundant after v0.3.1 ~*read-eval* nil~ but defense-in-depth). - On validation failure: construct a rejection trace (similar to the existing deterministic gate rejection feedback) and re-inject into the LLM prompt. The trace includes the raw output and a diagnostic ("Your response did not produce a valid plist. Ensure it starts with ( and has balanced parentheses."). - Configurable ~LLM_OUTPUT_RETRIES~ (default 2). After exhausting retries, fall through with the raw text as a ~:MESSAGE~ action (current behavior). - Track parse-failure rate per provider in telemetry. Use to guide provider cascade ordering: a provider with 20% parse-failure rate falls behind one with 2%. +- If retries are exhausted without a parseable plist, the TUI renders the raw LLM output in a dimmed, collapsible region labeled "Parse failure — could not interpret this response." The user can inspect what the model produced. -** Competitive Advantage Analysis — v0.5.0 Summary +*** v0.6.3 — TODO Streaming responses + +Rationale: Every competitor streams — Hermes Agent specifically lists "streaming tool output" as a feature, OpenClaw streams via messaging channels, Claude Code streams via terminal. A spinner followed by a wall of text is v0.1-era UX for an LLM chat interface. Streaming was originally sequenced in the evaluation release (after evaluation harness and computer use), but it depends only on the daemon protocol (chunked frames) and TUI rendering — neither require tools, planning, evaluation, or vision. Moving it to v0.6.3 means Passepartout streams before it ships tools, because streaming makes the existing chat experience competitive. + +- Add a new frame type (~:type :stream-chunk~) to the daemon-TUI protocol. Chunks are variable-length strings carrying partial LLM output. The final chunk is an empty string, signalling end-of-stream. +- ~provider-openai-request~: for providers that support streaming (OpenRouter, OpenAI, Anthropic, Groq, local), send ~"stream": true~ in the request body. Read the SSE stream, extract ~delta.content~ from each chunk, and call a new ~*stream-callback*~ function with the partial text. +- The TUI renders partial output in the chat window as it arrives, appending characters to the in-progress agent message. The "…thinking" spinner is replaced by live, building text. +- Interrupt-and-redirect: the user pressing a key (Esc or any printable char) during streaming injects an interrupt signal. The partial response is captured as the agent's message, the LLM call is cancelled (HTTP connection closed), and the user's keystroke becomes new input. This replaces the current full-process ~SIGINT~ with a graceful mid-response redirect. +- The TUI message for a streamed response shows a ~[streaming]~ indicator that changes to a timestamp when the stream completes. If interrupted, the indicator changes to ~[interrupted]~. +- Add FiveAM tests: stream-chunk framing round-trips correctly; interrupt during streaming produces a valid partial message; the TUI correctly renders progressive chunks vs a completed message. + +*** Competitive Advantage Analysis — v0.6.0 Summary The priority queue eliminates the perception of sluggishness that concurrent load creates. A user typing a query never waits for a heartbeat tick to finish — their signal jumps the queue. The coalescing of duplicate heartbeats eliminates wasted processing. This is table-stakes UX for a daily-driver agent. @@ -472,31 +609,49 @@ MVCC concurrency on the Merkle tree is genuinely novel for an AI agent. Most age Structured output enforcement bridges the gap between "Passepartout uses plists, not JSON" and "LLMs sometimes produce malformed syntax." It gives the system the same reliability guarantee that JSON mode gives competitors — the output will parse — without introducing JSON into the architecture. -*** v0.6.0: Tool Ecosystem (MCP-Native) +Streaming responses (v0.6.3) close the last remaining table-stakes UX gap with Hermes Agent and Claude Code. The "…thinking" spinner is replaced with live text. Interrupt-and-redirect means the user can course-correct mid-response instead of waiting for a wrong answer to complete. Combined with the TUI critical fixes (v0.3.3) and differentiator visualizations (v0.4.0), the TUI is competitive on responsiveness and uniquely informative on safety and context transparency. -The original roadmap placed MCP at v0.7.0 and planned "10+ cognitive tools" built from scratch for v1.0.0. This is inverted: the ecosystem already provides 50+ tools (filesystem, git, postgres, slack, github, web search, memory servers). Building bespoke tools from scratch duplicates work the community has already done and tested. Passepartout's advantage is not in tool *implementation* but in tool *orchestration* — the deterministic gate stack that verifies every tool invocation before execution. +** v0.7.0: Tool Ecosystem (MCP-Native) + Voice Gateway -**Why MCP matters for competitive positioning:** Claude Code's native tools (Read, Write, Edit, Bash, Grep, Glob, WebSearch) are implemented in TypeScript within the Claude Code runtime. They are not extensible — you cannot add a tool without modifying the runtime. OpenClaw's tools are similarly baked into the Node.js process. By building a native MCP client, Passepartout gains tool breadth that exceeds both competitors (50+ tools via the MCP ecosystem versus ~10 native tools) without building a single tool implementation. The tool quality is maintained by the ecosystem; the safety verification is maintained by Passepartout's gate stack. This division of labor is the right architecture for a small team building a competitor to well-funded commercial agents. +The original roadmap placed MCP at v0.8.0 and planned "10+ cognitive tools" built from scratch for v1.0.0. This is inverted: the ecosystem already provides 50+ tools (filesystem, git, postgres, slack, github, web search, memory servers). Building bespoke tools from scratch duplicates work the community has already done and tested. Passepartout's advantage is not in tool *implementation* but in tool *orchestration* — the deterministic gate stack that verifies every tool invocation before execution. -**** TODO MCP native client +*Why MCP matters for competitive positioning:* Claude Code's native tools (Read, Write, Edit, Bash, Grep, Glob, WebSearch) are implemented in TypeScript within the Claude Code runtime. They are not extensible — you cannot add a tool without modifying the runtime. OpenClaw's tools are similarly baked into the Node.js process. By building a native MCP client, Passepartout gains tool breadth that exceeds both competitors (50+ tools via the MCP ecosystem versus ~10 native tools) without building a single tool implementation. The tool quality is maintained by the ecosystem; the safety verification is maintained by Passepartout's gate stack. This division of labor is the right architecture for a small team building a competitor to well-funded commercial agents. + +*** TODO MCP native client - Pure Common Lisp MCP client: parse JSON-RPC messages from MCP servers over stdio or SSE. No Python bridge, no Node.js subprocess. The client runs in the same Lisp image as the agent — zero serialization overhead between the agent and the MCP layer. - Implement the MCP protocol lifecycle: initialize handshake, list tools, call tool, handle notifications. Each MCP server registers its tools as entries in Passepartout's ~*cognitive-tool-registry*~ at connection time — the LLM's tool belt prompt automatically expands to include them. - ~MCP_SERVERS~ env var: comma-separated paths to MCP server config files (JSON). Each config specifies the server command, args, and env vars. Example: =MCP_SERVERS=~/.config/passepartout/mcp/filesystem.json,~/.config/passepartout/mcp/git.json=. - Tool invocation route: LLM proposes a tool call → Dispatcher verifies against permission table → MCP client serializes call as JSON-RPC → server executes → result deserialized back to plist → returned to LLM as tool output. The Dispatcher does not distinguish between native tools and MCP tools — the gate stack is uniform. - Register the MCP client as a skill (~defskill~~:passepartout-mcp-client~) so it can be hot-reloaded. The MCP client is not core infrastructure — it is a skill that extends the tool ecosystem. -**** TODO Core MCP tools (from existing roadmap items) -- Git Steward (deferred from old v0.4.0): status, diff, commit, push, branch via the MCP Git server. Policy gate enforces commit-before-modify: any file write to a git-tracked directory must be preceded by a diff review. -- Web Research (deferred from old v0.6.0): headless browser via Puppeteer/Playwright MCP server. Text extraction, screenshot capture, page interaction. -- Interactive PTY (deferred from old v0.5.0): stream long-running process output to context window, async interrupt control. +*** TODO Core MCP tools (from existing roadmap items) +- Git Steward (deferred from old v0.5.0): status, diff, commit, push, branch via the MCP Git server. Policy gate enforces commit-before-modify: any file write to a git-tracked directory must be preceded by a diff review. +- Web Research (deferred from old v0.7.0): headless browser via Puppeteer/Playwright MCP server. Text extraction, screenshot capture, page interaction. +- Interactive PTY (deferred from old v0.6.0): stream long-running process output to context window, async interrupt control. -**** TODO Environment Steward +*** TODO TUI tool visualization +- Tool invocation rendering: when the agent invokes a tool, the TUI renders a color-coded, collapsible region. Pre-execution: ~[Running: bash "npm test"...]~ in magenta with a dim spinner. Post-execution: ~✓ bash: tests passed (1.2s)~ in green, or ~✗ bash: exit code 1~ in red with the error output expanded below. +- Tool output is collapsed by default (single line summary). Pressing Enter on a tool invocation row toggles expansion to show the full output. +- Diff display: when a file write or git diff is involved, render the diff with standard ~+~ (green) / ~-~ (red) coloring. The diff is shown as a compact inline block with 3 lines of context around each change. +- Gate trace for tool invocations: each tool call shows its Dispatcher gate results inline (gate trace from v0.4.0), so the user sees both the tool execution and which safety gates allowed or blocked it. + +*** TODO Environment Steward - Detect "command not found" in shell actuator output. - Search system PATH and package manager registries for the missing command. - Propose installation command and retry the failed action on user approval. - Cache resolved dependency paths to avoid repeated searches. -** Competitive Advantage Analysis — v0.6.0 Summary +*** v0.7.3 — TODO Voice Gateway + +Rationale: OpenClaw ships voice wake words and talk mode on macOS/iOS/Android via ElevenLabs. Hermes Agent has voice memo transcription. Both treat voice as a first-class channel. Passepartout's daemon already handles text — voice is an I/O format conversion. Speech-to-text turns audio into ~:user-input~ signals. Text-to-speech turns agent responses into audio. The architecture requires no changes; the voice gateway is a skill that wraps existing REST APIs. + +- Speech-to-text: POST audio to OpenAI Whisper API (~/v1/audio/transcriptions~) or local Whisper via Ollama. Receive text. Inject as a ~:user-input~ signal into the pipeline. The daemon processes it identically to a typed message. +- Text-to-speech: POST text to ElevenLabs REST API (~/v1/text-to-speech/{voice-id}~) with stream response. Also support system ~say~ (macOS) / ~espeak~ (Linux) as zero-dependency fallbacks. +- TUI voice toggle: ~/voice on~ enables voice capture, shows a ~🎤~ (listening) indicator in the status bar. ~/voice off~ returns to text-only. The microphone capture runs in a dedicated thread that feeds audio chunks to the speech-to-text backend. +- Voice mode in messaging gateways: on Telegram and Discord, the voice gateway transcribes voice messages into text and injects them as ~:user-input~ signals. Agent responses can be optionally spoken back via text-to-speech if the user's message included a voice note (reply in kind). +- The voice gateway is a skill (~defskill~~:passepartout-gateway-voice~). No core daemon changes required. The daemon receives text signals whether they originated from a keyboard, a messaging app, or a microphone. + +*** Competitive Advantage Analysis — v0.7.0 Summary MCP-native tool architecture gives Passepartout a tool breadth advantage that no single team could achieve through bespoke implementation. The MCP ecosystem is growing faster than any individual agent's tool set. By connecting to it rather than competing with it, Passepartout's tool count scales with the ecosystem — every new MCP server is a new Passepartout tool. @@ -504,32 +659,35 @@ The Dispatcher's tool permission table (allow/ask/deny) applies uniformly to MCP The Git policy gate (commit-before-modify) is a safety feature no competitor provides. It prevents the most common agent failure mode: modifying files without preserving the prior state. Combined with memory snapshots (v0.2.0), this gives every action a dual audit trail: the git history and the memory object history. -*** v0.7.0: Planning, Self-Modification & Deterministic Routing +v0.7.1 is also the threshold at which Passepartout can safely self-build — modify its own source files outside the core pipeline. The ~core-*~ path protection from v0.4.0 ensures the agent cannot destroy its own brain stem during self-building; the TDD runner catches regressions before commit; the Git policy gate preserves every state change. Together, these four releases (v0.4.0, v0.5.0, v0.6.2, v0.7.1) form the safety, economic, reliability, and tool stack that makes self-hosting viable. -v0.6.0 provides the tools. v0.7.0 provides the brain that orchestrates them. The two releases are sequenced this way because planning without tools is architecture without construction — the plans describe actions the system cannot execute. With tools in place, planning becomes actionable. +The voice gateway (v0.7.3) adds parity with OpenClaw's voice features without architectural changes — speech-to-text and text-to-speech are thin REST wrappers that feed text signals into the existing pipeline. Combined with the Emacs bridge (v0.4.0) and messaging gateways (v0.4.0), Passepartout supports four interaction surfaces by v0.7.3: terminal (TUI), messaging apps, Emacs, and voice. Each surface is a thin client speaking the same framed TCP protocol to the same daemon. -**Design insight: the inverted tier classifier.** The current tier classifier routes "rm", "write-file", and "shell" to ~:REFLEX~ (no LLM). This routes the most dangerous operations to the path with the least oversight. It should be inverted: ~:REFLEX~ handles deterministic lookups (list TODOs, check file existence, query memory), ~:COGNITION~ handles text processing and summarization, ~:REASONING~ handles planning and code generation. Dangerous operations should always route through ~:REASONING~ where the full LLM cycle and Dispatcher gate stack apply. v0.7.1 fixes this. +** v0.8.0: Planning, Self-Modification & Deterministic Routing -**** TODO Long-horizon planning (task tree DAG) +*Design insight: the inverted tier classifier.* The current tier classifier routes "rm", "write-file", and "shell" to ~:REFLEX~ (no LLM). This routes the most dangerous operations to the path with the least oversight. It should be inverted: ~:REFLEX~ handles deterministic lookups (list TODOs, check file existence, query memory), ~:COGNITION~ handles text processing and summarization, ~:REASONING~ handles planning and code generation. Dangerous operations should always route through ~:REASONING~ where the full LLM cycle and Dispatcher gate stack apply. v0.8.1 fixes this. + +*** TODO Long-horizon planning (task tree DAG) - Decompose complex tasks into Org-mode headline trees. Each task node is a memory-object with terminal states: ~:todo~ → ~:next-action~ → ~:in-progress~ → ~:done~ / ~:blocked~ / ~:stuck~. - The LLM generates the initial task tree from the user's request. The REASONING tier processes each leaf task sequentially, updating node states as it progresses. - Parent nodes summarise child results: when all children of a node reach ~:done~, the parent is promoted to ~:done~ with a synthesised summary. When any child reaches ~:stuck~, the parent is promoted to ~:blocked~ with the blocking child's diagnostic. - Branch pruning: if a child is ~:stuck~ after three retries with different LLM providers, the parent re-plans the branch — the LLM generates alternative decomposition paths for the blocked sub-task. - Task trees persist as Org headlines in ~/memex/system/tasks/~. Survive restarts. Visible to the user as editable Org files. +- TUI task tree visualization: a collapsible Org headline tree rendered in the chat area. Each node shows its terminal state with a colored indicator (~○~ todo, ~▶~ next-action, ~◉~ in-progress, ~✓~ done, ~✗~ blocked, ~⏸~ stuck). Nodes expand/collapse on Enter. The tree updates in real time as the agent progresses through subtasks. This is visible in the TUI as an async status region that appears when the agent is executing a long-horizon plan and collapses to a single summary line when complete. -**** TODO Tier classifier fix +*** TODO Tier classifier fix - Invert the current classifier: ~:REFLEX~ = deterministic lookups only (memory query, file-exists-p, check time, list TODOs by tag). ~:COGNITION~ = text processing, summarization, simple Q&A, note formatting. ~:REASONING~ = planning, code generation, multi-step task execution, dangerous operations. - Track classifier accuracy via telemetry: for each classified action, record whether the classification was appropriate (did the ~:REFLEX~ action actually succeed without LLM? did a ~:REASONING~ action turn out to be a simple lookup?). - The classifier function is overrideable via ~*tier-classifier*~, allowing users or skills to customize routing. - The classifier should be a skill, not core infrastructure — reloadable and replaceable without restart. -**** TODO Skill Creator +*** TODO Skill Creator - LLM drafts complete skill org-file from natural language description. - Mandatory pipeline: (a) syntax validation via ~lisp-syntax-validate~, (b) sandbox-load in temporary jailed package (v0.3.2), (c) run registered trigger function against mock contexts, (d) run registered deterministic gate against mock proposals, (e) on pass, promote to live registry under ~passepartout.skills.~. - Required ~:repl-verified~ flag on all ~defun~ forms — the existing Dispatcher lint check (core-loop-act.lisp:152–161) warns on writes without verification. The Skill Creator enforces this at creation time. - Skills are the primary extension mechanism for users. The Skill Creator makes skill authoring accessible to non-Lisp-programmers: describe what you want in English, the LLM drafts the Org file, the system verifies it, and the skill is live. This is how Passepartout grows its capability surface without requiring the user to learn Common Lisp. -** Competitive Advantage Analysis — v0.7.0 Summary +*** Competitive Advantage Analysis — v0.8.0 Summary The task tree DAG with terminal states and branch pruning is Passepartout's planning primitive — analogous to Claude Code's TODO list but structural (Org headlines with parent-child relationships) rather than flat. The advantage: subtask dependencies are explicit in the tree structure, so the agent knows that task C depends on tasks A and B without having to rediscover this from context. Parent summarisation means the LLM can check high-level progress without re-reading every child's output — a token savings multiplier on long-running tasks. @@ -537,68 +695,64 @@ The tier classifier fix is a safety correctness issue. The current inverted clas The Skill Creator is the mechanism by which Passepartout escapes the "team of Lisp programmers" constraint. Most agent frameworks require Python/TypeScript to extend. Passepartout's extension language is English — the LLM writes the Lisp, the system verifies it. The sandbox-load and verification pipeline (from v0.3.2) make this safe: a skill that fails verification never enters the running image. -*** v0.8.0: Evaluation, Vision & Streaming +** v0.9.0: Evaluation & Vision -With tools (v0.6.0) and planning (v0.7.0) in place, the agent can execute complex multi-step tasks. v0.8.0 answers two questions: (1) how do we *prove* it works? (SWE-bench evaluation harness), and (2) what capabilities does the user actually experience? (vision for UI interaction, streaming for responsive TUI). +With tools (v0.7.0) and planning (v0.8.0) in place, the agent can execute complex multi-step tasks. v0.9.0 answers two questions: (1) how do we *prove* it works? (SWE-bench evaluation harness), and (2) can the agent interact with visual interfaces? (computer use / vision). Streaming has been moved to v0.6.3 — it depends only on the daemon protocol, not on evaluation or vision. -**** TODO SWE-bench harness +*** TODO SWE-bench harness - Automated pipeline: clone a repository from SWE-bench dataset, parse the GitHub issue, feed the issue description into Passepartout's cognitive loop, track the resolution trajectory as an Org headline tree, apply the generated patch, run the repository's test suite, score success (tests pass yes/no). - Trajectory persistence: each benchmark run produces an Org file under ~/memex/system/benchmarks/~ recording every ~think()~ call, every tool invocation, every Dispatcher decision, and the final test result. The trajectory is auditable — a human can read why the agent made each decision and where it went wrong on failures. - Regression mode: run the same benchmark after each version release. Track score trends. A version that regresses on SWE-bench does not ship. -- Target: competitive score with Claude Code and OpenClaw on SWE-bench-verified by v1.0.0. The evaluation harness ships in v0.8.0 so there are two full version cycles to iterate and improve before v1.0.0 ships. +- Target: competitive score with Claude Code and OpenClaw on SWE-bench-verified by v1.0.0. The evaluation harness ships in v0.9.0 so there are two full version cycles to iterate and improve before v1.0.0 ships. -**** TODO Computer Use / Vision +*** TODO Computer Use / Vision - Screenshot capture: X11 (~xwd~ / ~import~) and Wayland (~grim~) bridge. The agent requests a screenshot of a specific window or the full desktop. - Vision model integration: send screenshot to a vision-capable model (GPT-4V, Claude 3.5, Gemini 2.0 Flash). The model analyzes UI elements and returns structured descriptions. - Coordinate-based interaction: ~xdotool~ / ~ydotool~ for click and type commands at specific screen coordinates. Dispatcher approval gate applies — screen interaction requires HITL by default, overridable per-application via permission table. - Use case: the user says "open Firefox, search for the Passepartout GitHub repo, and star it." The agent captures screenshots, identifies UI elements via the vision model, and issues click/type commands. Each step is verified by a follow-up screenshot to confirm the action succeeded. -**** TODO Streaming responses -- Stream LLM output from the daemon to the TUI via the existing TCP protocol. Add a new frame type (~:type :stream-chunk~) that carries partial response text. -- The TUI renders partial output in the chat window, replacing the "…thinking" spinner with live text. The user sees the agent's response as it's generated, character by character. -- Early termination: the user can press ~^C~ during streaming to interrupt the LLM and inject an interrupt signal. The partial response is captured, the LLM call is cancelled if the provider supports it, and the agent resumes with the user's interruption as new input. -- Streaming also enables progressive tool execution: if the LLM output contains a tool call, the system can begin executing it before the full response is complete (speculative execution, rolled back if the remainder of the response invalidates the call). - -** Competitive Advantage Analysis — v0.8.0 Summary +*** Competitive Advantage Analysis — v0.9.0 Summary SWE-bench evaluation is the industry standard for coding agent capability claims. Without it, "SOTA parity" is a marketing claim. With it, "SOTA parity" is a number. The harness's trajectory persistence is a differentiator: most evaluation harnesses produce a pass/fail score. Passepartout's produces a complete Org-mode audit trail showing exactly where the reasoning succeeded or failed. This turns benchmarking into a debugging tool — failed trajectories point directly to the skill, gate, or model that needs improvement. Vision + screen interaction is table stakes for competing with Claude Code's computer use feature. The Passepartout advantage: every screen interaction passes through the Dispatcher gate stack. A vision model might hallucinate a UI element that doesn't exist — the follow-up screenshot verification catches this deterministically. Competitors' computer use features lack this verification step — they trust the vision model's output. -Streaming is a UX requirement, not a capability requirement. But UX determines adoption. A chat interface that shows the response building in real time feels responsive; a spinner followed by a wall of text feels slow. Streaming with early termination also saves tokens: if the user sees the agent going in the wrong direction, they can interrupt before the full response is generated and paid for. +** v0.10.0: Consensus, GTD & Deep Emacs Integration -*** v0.9.0: Consensus, GTD & Deep Integration +Near-SOTA. The agent has tools, planning, evaluation, and streaming. v0.10.0 adds reliability (consensus), productivity methodology (GTD), and environment depth (Emacs integration). -Near-SOTA. The agent has tools, planning, evaluation, and streaming. v0.9.0 adds reliability (consensus), productivity methodology (GTD), and environment depth (Emacs integration). - -**** TODO Consensus loop +*** TODO Consensus loop - Multi-provider parallel inference for critical decisions. When the action's impact score exceeds a threshold (file writes outside home directory, shell commands that touch /etc, git pushes to main), the system sends the same prompt to 2–3 independent providers. - Disagreement detection: compare the structured outputs (actions proposed by each provider). If all providers propose the same action (or semantically equivalent actions), proceed with the highest-confidence result. If providers disagree, flag the action for HITL approval and present the user with each provider's proposal and confidence score. - Confidence scoring: when providers agree, use the agreement level as a confidence metric for telemetry. Track which provider combinations produce the highest agreement rates for which task types. - Cost-aware: consensus mode doubles/triples cost for the action. Only trigger when the action's impact exceeds the cost threshold. Configurable via ~CONSENSUS_THRESHOLD~ — actions below the threshold use single-provider mode. +- TUI consensus display: when consensus mode fires, the TUI shows a collapsible region listing each provider, its model, its proposal, and its confidence score. Agreement is rendered as ~✓ 3/3 providers agree~ in green; disagreement as ~✗ 2/3 providers agree (1 disagrees)~ in yellow with the dissenting proposal expanded for review. The user can accept the majority or inspect the dissent before approving. -**** TODO GTD integration +*** TODO GTD integration - Full GTD cycle: capture (inbox → process), clarify (what is this? is it actionable?), organize (project, next action, reference, someday/maybe, trash), reflect (weekly review), engage (context-appropriate action lists). - Org properties: ~:TRIGGER:~ (what context makes this actionable — @home, @office, @computer, @phone), ~:BLOCKER:~ (what task must complete first). - Weekly review: the agent scans all projects and tasks, surfaces stalled items, suggests next actions, and generates a review Org file for the user. The review is produced deterministically (no LLM — pure Org tree traversal) and takes zero tokens. +- TUI agenda view: a ~/agenda~ command renders the user's Org-agenda (scheduled items, deadlines, habits) as a formatted scrollable region within the chat area. The agent can reference agenda context in its responses without the user having to paste their schedule. -**** TODO Deep Emacs integration -- Bidirectional sync: Emacs saves a file → daemon memory updates via ~:buffer-update~ signal. Daemon modifies a file → Emacs buffer reflects the change via the Emacs bridge (file-watch or explicit refresh command). -- Org-agenda awareness: the agent can query the user's agenda view (scheduled items, deadlines, habits) and incorporate agenda context into planning decisions. "What should I work on today?" considers the agenda, not just the task tree. -- Clock time tracking: the agent can start/stop clocks on Org headlines. Produces clock tables for time reporting. -- Refile and archive: the agent can refile headlines between Org files and archive completed items to ~/memex/archives/~. Archive decisions are proposed by the LLM and verified by the Dispatcher (archive policy: DONE items older than 30 days, DONE items with no open child tasks). +*** TODO Deep Emacs integration -** Competitive Advantage Analysis — v0.9.0 Summary +Rationale: The Emacs bridge (v0.4.0) treats Emacs as a Passepartout client — the user sends text, Emacs displays responses. This is the first direction: Emacs → Passepartout. The deep integration is the second direction: Passepartout → Emacs. The agent reads the user's agenda, clocks time on tasks, refiles headlines, and archives completed work. This builds on the TCP bridge already in place from v0.4.0 — the agent now initiates commands to Emacs, not just responds to user input. -The consensus loop is not unique (OpenClaw has a similar feature), but Passepartout's implementation benefits from the structured output enforcement in v0.5.2 — comparing plists for semantic equivalence is simpler and more reliable than comparing free-text responses. +- Org-agenda awareness: the agent queries the user's agenda view (scheduled items, deadlines, habits) and incorporates agenda context into planning decisions. "What should I work on today?" considers the agenda, not just the task tree. +- Clock time tracking: the agent starts/stops clocks on Org headlines. Produces clock tables for time reporting. This enables the agent to answer "how long did I spend on that feature?" +- Refile and archive: the agent refiles headlines between Org files and archives completed items to ~/memex/archives/~. Archive decisions are proposed by the LLM and verified by the Dispatcher (archive policy: DONE items older than 30 days, DONE items with no open child tasks). + +*** Competitive Advantage Analysis — v0.10.0 Summary + +The consensus loop is not unique (OpenClaw has a similar feature), but Passepartout's implementation benefits from the structured output enforcement in v0.6.2 — comparing plists for semantic equivalence is simpler and more reliable than comparing free-text responses. The GTD integration and Emacs integration are Passepartout's "unfair advantages" — no competitor has either. Claude Code and Copilot are development tools, not life management tools. Org-mode is the bridge: the same format that holds the agent's memory holds the user's tasks, calendar, and notes. The GTD cycle operates on the same Org trees that the foveal-peripheral model renders into LLM context. There is no import/export, no separate task database, no format conversion. The agent's world model IS the user's Org files. This is the unified format thesis from the DESIGN_DECISIONS document made operational — and it's a capability that JSON-based agents structurally cannot replicate. -*** v1.0.0: SOTA Parity (verified) +** v1.0.0: SOTA Parity (verified) -Feature-complete, benchmark-verified, production-hardened. All capabilities from v0.3.0 through v0.9.0 integrated and tested end-to-end. +Feature-complete, benchmark-verified, production-hardened. All capabilities from v0.3.0 through v0.10.0 integrated and tested end-to-end. -v1.0.0 is not a feature release — it is a verification release. Every feature from the v0.x series is tested under concurrent load, resource starvation, adversarial input, and benchmark scoring. The evaluation harness (v0.8.0) provides the scoring apparatus; v1.0.0 is the scored release. +v1.0.0 is not a feature release — it is a verification release. Every feature from the v0.x series is tested under concurrent load, resource starvation, adversarial input, and benchmark scoring. The evaluation harness (v0.9.0) provides the scoring apparatus; v1.0.0 is the scored release. | Area | Parity Target | Verification Method | |-------------------+---------------------------------------------+---------------------------------------| @@ -606,12 +760,16 @@ v1.0.0 is not a feature release — it is a verification release. Every feature | Planning | Task tree DAG with terminal states | Multi-step integration tests | | Tool ecosystem | 15+ MCP tools + native shell + git | MCP protocol compliance tests | | Context window | Semantic search + foveal-peripheral + caching| Token budget vs competitor audit | -| Safety | 9-vector Dispatcher + policy + permissions | Chaos testing (v0.8.0) | -| Multi-step tasks | Task trees with terminal states | SWE-bench score (v0.8.0 harness) | +| Safety | 9-vector Dispatcher + policy + permissions | Chaos testing (v0.9.0) | +| Multi-step tasks | Task trees with terminal states | SWE-bench score (v0.9.0 harness) | | Code editing | Full file read/write via MCP + Org | SWE-bench-verified subset | -| Memory | Vector recall + Merkle integrity + MVCC | Concurrency stress test (v0.5.1) | +| Memory | Vector recall + Merkle integrity + MVCC | Concurrency stress test (v0.6.1) | | Emacs integration | Full org-mode control (exceeds Claude Code) | Org-agenda round-trip test | | Streaming | Partial output + early termination | TUI UX latency benchmark | +| TUI | Word wrap, cursor, gate trace, focus map, | TUI integration test suite (v0.3.3, v0.4.0) | +| | rule counter, cost counter, streaming | | +| Packaging | Source install (primary) + save-lisp-and-die | Install test matrix across distros | +| | binary for constrained platforms | | | Offline | 100% local capable (7-13B model) | Air-gapped integration test | | Cost | 2-3x fewer tokens than competitors | SWE-bench token audit | | Concurrency | Priority queue + MVCC + parallel signals | Concurrent load test (3 users + bg) | @@ -634,21 +792,53 @@ The key insight at v1.0.0: Passepartout does not beat competitors at everything. But it is still fundamentally probabilistic at its core. The symbolic engine verifies and constrains, but the generative engine is still the primary reasoning source. The architectural transition to symbolic-first reasoning happens in v3.0.0. -*** v2.0.0: Lisp Machine Emergence +** v2.0.0: Lisp Machine Emergence +v2.0.0 is where Passepartout stops being a daemon with clients and becomes the environment. The agent's cognitive loop, the user's editor, the user's shell, and the user's browser run in the same Common Lisp image. The Dispatcher gate stack verifies every action regardless of who initiated it — user or agent. The distinction between "tool" and "self" dissolves. -This version is not about the symbolic engine - it is about tools. The agent stops running inside Emacs and starts replacing it. Lish (Lisp shell) emerges: a shell that speaks plists, not POSIX. Org-mode buffers become the file system. Org-babel becomes the REPL. The agent is no longer a passenger in Emacs - it is the operating system. +**Why this version matters for UX parity.** v0.4.0 through v1.0.0 give Passepartout four interaction surfaces (TUI, messaging apps, Emacs, voice). This is competitive with Hermes Agent's TUI + messaging but not with OpenClaw's 25+ channels + Canvas + macOS/iOS apps + voice wake words. v2.0.0 doesn't try to match OpenClaw's breadth. It inverts the problem: instead of building more clients, it builds a platform where the agent's environment and the user's environment are the same process, separated not by a sandbox but by the Dispatcher gate stack. The editor IS the agent's prompt. The shell IS the agent's actuator. The browser IS the agent's web research tool. There are no clients — there is one Lisp image, one address space, one org-mode file system. This is only possible because the deterministic safety gates are in-process pure Lisp functions rather than out-of-process sandboxes. -The key insight is that the agent's interface and the agent's brain become the same thing. In earlier versions, there is a clear separation: the agent produces output, the TUI displays it. In v2.0.0, the distinction blurs. The agent's thoughts are displayed in Org buffers that are also the interface that the agent manipulates. +**Components:** -This is the Emacs cannibalization phase. Not hostile replacement but evolution - Emacs was always a Lisp machine, and v2.0.0 completes the metamorphosis. +| Component | What it replaces | Technology | Status at v1.0.0 | +|-----------------+------------------------+-------------------------------------+-------------------| +| Lish editor | Emacs, VS Code | McCLIM or Croatoan-based TUI | TUI exists (Croatoan) | +| Lish shell | Bash, zsh | Common Lisp REPL (already Lisp) | Shell actuator exists | +| Nyxt browser | Chrome, Firefox | Nyxt (Common Lisp browser) | Playwright MCP (v0.7.0) | +| Web interface | Notion, Obsidian Publish | Org → HTML static site generator | Not started | +| Daemon + memory | "the OS" | SBCL image + Org files | Production (v1.0.0) | -From Lisp-using agent to true Lisp machine. Agent IS the Emacs process. +*** Lish — the Common Lisp editor -- Lish: Lisp editor — Org-mode as IDE. Org-babel for interactive evaluation. Full REPL in TUI. -- Lish: Shell replacement — Lisp-based shell that speaks plists. Org-mode buffers as file system. +Not elisp. Not Emacs. A multi-threaded Common Lisp editor built on McCLIM or the Croatoan TUI infrastructure. The complete system prompt lives in an Org buffer — the agent's identity (~AGENTS.md~ equivalent), its skill registry, its memory, and its reasoning are visible and editable as Org text. The user modifies the agent's prompt and the agent reflects the change immediately — the prompt is a file in memory, not a hidden string in a config. -*** v3.0.0: Neurosymbolic Maturity +Org-babel for interactive evaluation: source blocks in Org files are executable. The user evaluates a ~#+begin_src lisp~ block and the result appears inline. The agent evaluates blocks to verify code before writing. The REPL is not a separate window — it is the Org buffer in which the agent and user both work. + +The editor and the agent share the same Lisp image. The editor is not a client that connects to a daemon — it IS the daemon process. The TUI from v0.4.0 (with word wrap, streaming, gate trace, focus map) is the editor's rendering surface. The Emacs bridge (v0.4.0) remains for users who prefer Emacs until Lish matures. + +*** Nyxt — the Common Lisp browser + +Nyxt is a browser written in Common Lisp that renders web content in buffers. In v2.0.0, it replaces the Playwright MCP bridge (v0.7.0) as the agent's web interaction surface. The agent controls Nyxt by calling Nyxt functions directly — no subprocess, no serialization, no protocol. Navigation, form filling, data extraction, screenshot capture all happen within the same Lisp image. + +This matters because Playwright (Node.js subprocess, v0.7.0) and vision (screenshot + LLM analysis, v0.9.1) give the agent web access but with a process boundary. Nyxt eliminates the boundary. The agent's browser session shares memory with the agent's cognitive loop. The agent can inspect the DOM as Lisp data structures. It can respond to page events by injecting signals into its own pipeline. + +The browser also serves as the user's web interface. Org files in ~/memex are rendered as static HTML by a zero-JS Org-to-HTML converter running in Nyxt. The user browses their memex through the same browser the agent uses to research the web. No separate web server. No deployment. It's a directory on disk rendered by a local browser. + +*** Lish — the Lisp shell + +Bash is a text-stream protocol. Passepartout speaks plists. The Lish shell replaces text streams with structured data — every command returns a plist, not a byte stream. Pipe becomes function composition. Scripts become Lisp functions that operate on memory objects directly. + +The agent and the user share the same shell. The user types ~(list-todos :tag "@urgent")~. The agent proposes ~(shell "npm run build")~. The Dispatcher verifies both. The shell is not a separate process — it is a REPL connected to the same Lisp image as the agent's cognitive loop. + +Org-mode buffers become the file system. The user's memex (~/memex/) is browsable as a tree of Org headlines. File operations (read, write, list, search) operate on Org AST nodes, not byte streams. A "directory listing" is a tree of headlines. A "file read" is a subtree rendered as text. + +Bash remains available as a backend for running external commands, but it is not the primary interface. The agent and the user interact with the system through plists, not through text parsing. + +*** Strategic timeline + +The Emacs bridge (v0.4.0) is the temporary bridge. It gives Lisp users a native Emacs experience while Lish is being built. By v2.0.0, Lish is mature enough to replace it for users who prefer the integrated environment. Emacs users who prefer Emacs keep the bridge — Lish does not require abandoning Emacs. It offers an alternative built on the same principles (single address space, plists everywhere, Org-mode as AST) but without elisp's limitations (single-threaded, no bordeaux-threads, no shared memory with the agent). + +** v3.0.0: Neurosymbolic Maturity Deterministic planner takes the wheel. LLM relegated to semantic translation. @@ -665,20 +855,20 @@ Self-correcting gates replace the learned Bouncer rules. The system learns not j The implications are significant. Hallucination becomes structurally impossible because the symbolic engine will not accept a fact that contradicts its knowledge graph. Safety becomes provable because the formal verification layer can prove properties about the system's behavior. Self-improvement becomes stable because the agent modifies skills that are then verified before execution. -*** v4.0.0: AI Stack Internalized +** v4.0.0: AI Stack Internalized The agent understands its own weights. No external inference. - Llama.cpp in Lisp: FFI binding. No Python subprocess. Pure Common Lisp inference. - Weights as sexps: Neural weights as Lisp data structures. Homoiconic model introspection. -*** v5.0.0: Hardware +** v5.0.0: Hardware The Lisp machine becomes physical. RISC-V with tagged architecture, hardware-enforced type checking, FPGA prototype for the symbolic core. The agent runs not in emulation but on silicon purpose-built for the architecture. This is the long horizon. The symbolic engine runs on logic ASICs optimized for symbolic computation. The neural engine runs on GPU or purpose-built matrix math hardware. Lisp orchestrates both, enforcing at the hardware level what it enforced at the software level in earlier versions. -*** v6.0.0: True Agency +** v6.0.0: True Agency World models, temporal reasoning, goal persistence across restarts. diff --git a/docs/USER_MANUAL.org b/docs/USER_MANUAL.org index c245296..fd99409 100644 --- a/docs/USER_MANUAL.org +++ b/docs/USER_MANUAL.org @@ -4,7 +4,7 @@ #+FILETAGS: :docs:manual: * Introduction -Welcome to Passepartout v0.1.0 (The Autonomous Foundation). Passepartout is a neurosymbolic AI agent and a Lisp Machine operating system designed to autonomously maintain your Memex (knowledge base) and interact with you via multiple, equal-citizen interfaces. +Welcome to Passepartout. Passepartout is a neurosymbolic AI agent and a Lisp Machine operating system designed to autonomously maintain your Memex (knowledge base) and interact with you via multiple, equal-citizen interfaces. * Installation Passepartout is bootstrapped via a single shell script. @@ -12,17 +12,10 @@ Passepartout is bootstrapped via a single shell script. ** Quick start (curl) #+begin_src bash -curl -fsSL https://raw.githubusercontent.com/amrgharbeia/passepartout/main/passepartout.sh | bash -s configure +curl -fsSL https://raw.githubusercontent.com/amrgharbeia/passepartout/main/passepartout | bash -s configure #+end_src -** From a clone - -#+begin_src bash -git clone https://github.com/amrgharbeia/passepartout.git ~/projects/passepartout -~/projects/passepartout/passepartout.sh configure -#+end_src - -Both methods will: +This will: 1. Install system dependencies (SBCL, Emacs, git, curl, socat — detected for Debian or Fedora) 2. Install Quicklisp (Common Lisp package manager) 3. Tangle literate Org sources into runnable Lisp @@ -41,26 +34,54 @@ The system is configured via a `.env` file in the project root. Essential variab Because of the Unified Envelope Architecture, the kernel treats all clients as interchangeable. You must first boot the background daemon: #+begin_src bash -./passepartout.sh --boot & +./passepartout --boot & #+end_src ** Terminal User Interface (TUI) For a rich, split-pane terminal experience: #+begin_src bash -./passepartout.sh tui +./passepartout tui #+end_src ** Command Line Interface (CLI) For raw, pipe-friendly interaction: #+begin_src bash -./passepartout.sh cli +./passepartout cli #+end_src -** Emacs Integration -Passepartout functions as your "foveal vision" inside Emacs. -1. Ensure `org-agent.el` is loaded. -2. Run `M-x passepartout-connect`. -3. Interact via the `*passepartout-chat*` buffer. +** TUI Commands + +When connected via the TUI, the following commands are available (type them in the input area and press Enter): + +| Command | Action | +|-----------------------+--------------------------------------------------------| +| ~/help~ | List all available commands | +| ~/focus ~ | Set the agent's foveal focus to a project by name | +| ~/scope memex~ | Set scope to full memex (all projects visible) | +| ~/scope session~ | Set scope to current session only | +| ~/scope project~ | Set scope to focused project only | +| ~/unfocus~ | Clear the foveal focus | +| ~/approve HITL-xxxx~ | Approve a pending HITL action by its token | +| ~/deny HITL-xxxx~ | Deny a pending HITL action by its token | +| ~/theme ~ | Switch theme (dark, light, solarized, gruvbox) | +| ~/cost~ | Toggle session cost display in status bar | +| ~/voice on~ | Enable voice capture (planned v0.7.3) | +| ~/voice off~ | Disable voice capture | +| ~/quit~ | Save history and exit (planned v0.3.3) | + +For multi-line input, start the line with ~\~ then press Enter to insert a newline without sending. + +** Human-in-the-Loop Approval + +When the Dispatcher blocks a high-risk action (shell command, network call, core file modification), it creates a Flight Plan requiring your approval. + +1. The TUI displays a yellow message: ~→ HITL required: /approve HITL-ab12~ +2. Review the proposed action in the Dispatcher trace (expand with Tab) +3. Type ~/approve HITL-ab12~ to approve, or ~/deny HITL-ab12~ to deny +4. Approved actions are re-injected into the pipeline and executed +5. Denied actions are discarded and the Dispatcher records the decision as a permanent rule + +Each approval or denial teaches the Dispatcher — the rule counter in the status bar (~[Rules: 47]~) increments with every decision. * The Memex Structure Passepartout assumes a local folder structure representing your "Memex". @@ -75,17 +96,31 @@ Passepartout assumes a local folder structure representing your "Memex". The ~configure~ command supports both Debian-based (Ubuntu, Pop, Mint) and Fedora-based (RHEL, Rocky) distributions. It detects your distro automatically and installs the correct packages. #+begin_src bash -./passepartout.sh configure # interactive -./passepartout.sh configure --non-interactive # headless -./passepartout.sh configure --with-firewall # also open port 9105 +./passepartout configure # interactive +./passepartout configure --non-interactive # headless +./passepartout configure --with-firewall # also open port 9105 #+end_src After configuration, you can re-run ~configure~ any time to add providers or link gateways. +** Binary install (save-lisp-and-die) + +For platforms where SBCL cannot be installed (corporate laptops, shared hosts, constrained environments), a self-contained binary is provided: + +#+begin_src bash +curl -fsSL https://github.com/amrgharbeia/passepartout/releases/latest/download/passepartout -o passepartout +chmod +x passepartout +./passepartout daemon +#+end_src + +This binary bundles SBCL, all required Lisp code, native embedding inference, and a Swank server on port 4005. The experience is identical to a source install — the REPL is available, skills hot-reload, and the image is mutable. Memory survives snapshots. + +The binary is a convenience for constrained platforms. It is not a sealed container. The system remains constitutionally open — connect with SLIME, trace functions, inspect memory objects, modify the system while it runs. + ** systemd service (auto-start on boot) #+begin_src bash -./passepartout.sh install service +./passepartout install service #+end_src Installs a user-level systemd unit that starts the daemon on login. Logs are available via ~journalctl --user -u passepartout.service -f~. @@ -93,7 +128,7 @@ Installs a user-level systemd unit that starts the daemon on login. Logs are ava To remove: #+begin_src bash -./passepartout.sh uninstall service +./passepartout uninstall service #+end_src ** Docker @@ -110,7 +145,7 @@ This builds an image from ~debian:trixie-slim~ with all dependencies pre-install ** Backup #+begin_src bash -./passepartout.sh backup ~/my-backup.tar.gz +./passepartout backup ~/my-backup.tar.gz #+end_src Backs up the config, data, and memex directories. @@ -118,7 +153,31 @@ Backs up the config, data, and memex directories. ** Restore #+begin_src bash -./passepartout.sh restore ~/my-backup.tar.gz +./passepartout restore ~/my-backup.tar.gz #+end_src -Restores from a backup file. Run ~passepartout doctor~ afterward to verify integrity. \ No newline at end of file +Restores from a backup file. Run ~passepartout doctor~ afterward to verify integrity. + +* Troubleshooting + +** The daemon won't start +- Check SBCL is installed: ~which sbcl~ +- Run ~passepartout doctor~ to diagnose +- Check port 9105 is free: ~lsof -i :9105~ +- Check the log output for errors + +** The TUI connects but shows "Disconnected" +- The daemon may have crashed. Run ~passepartout daemon~ in another terminal +- If the daemon is running, check it's listening: ~lsof -i :9105~ +- Use ~/reconnect~ (planned v0.6.0) to reconnect without restarting the TUI + +** The LLM returns garbage or fails to respond +- Run ~passepartout doctor~ to verify your LLM provider keys +- Check ~PROVIDER_CASCADE~ in your ~.env~ file +- Try switching models: edit ~.env~ and restart the daemon +- If using local models via Ollama, verify Ollama is running: ~ollama list~ + +** Memory fails to load on startup +- Check ~/memory.snap~ exists and is valid S-expression format +- Run ~passepartout doctor~ to diagnose memory integrity +- If corrupted, delete ~/memory.snap~ and restart — the daemon starts with empty memory \ No newline at end of file diff --git a/lisp/core-communication.lisp b/lisp/core-communication.lisp index 9adab0c..ffaae50 100644 --- a/lisp/core-communication.lisp +++ b/lisp/core-communication.lisp @@ -62,7 +62,7 @@ (let ((stream (usocket:socket-stream socket))) (handler-case (progn - (format stream "~a" (frame-message (make-hello-message "0.2.0"))) + (format stream "~a" (frame-message (make-hello-message "0.3.0"))) (finish-output stream) (loop (let ((msg (read-framed-message stream))) diff --git a/lisp/gateway-cli.lisp b/lisp/gateway-cli.lisp index 41109bc..27290f1 100644 --- a/lisp/gateway-cli.lisp +++ b/lisp/gateway-cli.lisp @@ -26,9 +26,9 @@ (fiveam:test test-gateway-cli-input-format "Contract 1: gateway-cli-input injects a properly formed signal without error." (handler-case - (progn (gateway-cli-input "hello") (fiveam:is t)) + (progn (gateway-cli-input "hello") (fiveam:pass)) (error (c) - (fiveam:is nil "gateway-cli-input crashed: ~a" c)))) + (fiveam:fail "gateway-cli-input crashed: ~a" c)))) (handler-case (progn (gateway-cli-input "test-load") (log-message "CLI: Load-time test OK")) diff --git a/lisp/gateway-tui-main.lisp b/lisp/gateway-tui-main.lisp index 1dd7e1e..347fef1 100644 --- a/lisp/gateway-tui-main.lisp +++ b/lisp/gateway-tui-main.lisp @@ -207,7 +207,7 @@ (st :connected) t) (bt:make-thread (lambda () (reader-loop (st :stream))) :name "tui-reader") - (add-msg :system (format nil "* Connected v~a *" "0.2.0")) + (add-msg :system (format nil "* Connected v~a *" "0.3.0")) (return-from connect-daemon t)) (usocket:connection-refused-error (c) (when (= attempt 3) diff --git a/org/core-communication.org b/org/core-communication.org index 2178971..6ebebaf 100644 --- a/org/core-communication.org +++ b/org/core-communication.org @@ -10,7 +10,7 @@ The Communication Protocol defines how Passepartout speaks to the outside world. Every message is an S-expression (plist) prefixed with a 6-character hex length: - 00002C(:TYPE :EVENT :PAYLOAD (:ACTION :handshake :VERSION "0.2.0")) + 00002C(:TYPE :EVENT :PAYLOAD (:ACTION :handshake :VERSION "0.3.0")) This is a deliberate rejection of JSON, Protocol Buffers, or any other serialization format. The message format is Lisp-native because: @@ -151,7 +151,7 @@ The daemon sends a handshake message on connection, then enters a read loop, inj (let ((stream (usocket:socket-stream socket))) (handler-case (progn - (format stream "~a" (frame-message (make-hello-message "0.2.0"))) + (format stream "~a" (frame-message (make-hello-message "0.3.0"))) (finish-output stream) (loop (let ((msg (read-framed-message stream))) diff --git a/org/gateway-cli.org b/org/gateway-cli.org index f48466a..b54a4e4 100644 --- a/org/gateway-cli.org +++ b/org/gateway-cli.org @@ -55,9 +55,9 @@ The CLI Gateway is the simplest interface to Passepartout — raw stdin/stdout o (fiveam:test test-gateway-cli-input-format "Contract 1: gateway-cli-input injects a properly formed signal without error." (handler-case - (progn (gateway-cli-input "hello") (fiveam:is t)) + (progn (gateway-cli-input "hello") (fiveam:pass)) (error (c) - (fiveam:is nil "gateway-cli-input crashed: ~a" c)))) + (fiveam:fail "gateway-cli-input crashed: ~a" c)))) #+end_src ** Load-Time Sanity Check diff --git a/org/gateway-tui-main.org b/org/gateway-tui-main.org index 38119a7..7de71b4 100644 --- a/org/gateway-tui-main.org +++ b/org/gateway-tui-main.org @@ -241,7 +241,7 @@ Event handlers + daemon I/O + main loop. (st :connected) t) (bt:make-thread (lambda () (reader-loop (st :stream))) :name "tui-reader") - (add-msg :system (format nil "* Connected v~a *" "0.2.0")) + (add-msg :system (format nil "* Connected v~a *" "0.3.0")) (return-from connect-daemon t)) (usocket:connection-refused-error (c) (when (= attempt 3) diff --git a/test/integration-tui.sh b/test/integration-tui.sh index d013c0d..39e340b 100755 --- a/test/integration-tui.sh +++ b/test/integration-tui.sh @@ -41,7 +41,11 @@ done # ---- Tests ---- test_cascade_parsing() { - # Via /eval, check that *provider-cascade* contains clean keywords. + # Via /eval, load the provider cascade from the daemon's data dir + # and verify clean keyword parsing (no cl-dotenv quote contamination). + local data_dir="${PASSEPARTOUT_DATA_DIR:-$(dirname $(dirname $0))}" + tmux send-keys -t tui-test "/eval (load (format nil \"~alisp/system-model-provider.lisp\" \"$data_dir/\"))" Enter + sleep 3 tmux send-keys -t tui-test "/eval *provider-cascade*" Enter sleep 3 local pane