v0.8.2: cleanup + prose + structure + decomposition + budget + errors

Phase 1 — dedup + hardening (~9 items):
- Remove duplicate *skill-registry* defvar from core-skills
- Merge *backend-registry* into *probabilistic-backends*, delete backend-register
- Remove inject-stimulus alias, standardize on stimulus-inject
- Add pre-eval sandbox (skill-source-scan) blocks restricted symbols before eval
- Remove dead plist-get function; remove duplicate json-alist-to-plist export
- Fix read-framed-message whitespace DoS (4096-iteration max)
- Add *read-eval* nil to dispatcher-approvals-process read-from-string (RCE)
- Add test-op to ASDF; update .asd version 0.4.3→0.7.2

Phase 2 — prose + contracts + reorder:
- Split ROADMAP: 2623→1089 lines (TODO only), CHANGELOG: 260→1528 lines (full DONE history, 14 versions reverse chron)
- Add Contracts + Overview to 6 channel files + embedding-native + programming-standards + symbolic-scope
- Reorder 28 .org files: Contract → Test Suite → Implementation (TDD order)
- Add 7-phase inline prose to think() in core-reason
- Expand USER_MANUAL: 183→461 lines (10 new sections)

Phase 3 — decomposition + export organization:
- Decompose think() into think-assemble-prompt, think-call-llm, think-parse-response orchestrator
- Organize 188 exports into 16 grouped sections by module

Phase 4 — budget enforcement + error protocol:
- Per-session budget enforcement (SESSION_BUDGET_USD env var, budget-exhausted-p, guard in think-call-llm)
- Error condition hierarchy (6 conditions: pipeline-error, llm-error, gate-error, budget-error, protocol-error)
- Restarts in loop-process: skip-signal, use-fallback, abort-pipeline
This commit is contained in:
2026-05-10 09:07:44 -04:00
parent 27d203ad67
commit 8fd56dece3
68 changed files with 7014 additions and 6521 deletions

View File

@@ -24,11 +24,11 @@ This will:
If you already have Emacs installed, the installer skips it and uses your existing installation.
* Configuration
The system is configured via a `.env` file in the project root. Essential variables include:
The system is configured via a ~.env~ file in the project root. Essential variables include:
- `OPENROUTER_API_KEY`: Your LLM provider key.
- `PROVIDER_CASCADE`: The fallback order for LLM providers (e.g., `openrouter,ollama,anthropic`).
- `MEMEX_DIR`: The absolute path to your knowledge base (defaults to `~/memex`).
- ~OPENROUTER_API_KEY~: Your LLM provider key.
- ~PROVIDER_CASCADE~: The fallback order for LLM providers (e.g., ~openrouter,ollama,anthropic~).
- ~MEMEX_DIR~: The absolute path to your knowledge base (defaults to ~/memex~).
* Interacting with Passepartout
Because of the Unified Envelope Architecture, the kernel treats all clients as interchangeable. You must first boot the background daemon:
@@ -86,8 +86,286 @@ Each approval or denial teaches the Dispatcher — the rule counter in the statu
* The Memex Structure
Passepartout assumes a local folder structure representing your "Memex".
- Core memories and identities are mapped to Org-mode files.
- The `Scribe` background worker distills chronological logs into structured Zettelkasten notes.
- The `Gardener` continuously repairs broken links and flags orphaned nodes.
- The ~Scribe~ background worker distills chronological logs into structured Zettelkasten notes.
- The ~Gardener~ continuously repairs broken links and flags orphaned nodes.
* How Safety Works
Passepartout enforces safety through ten deterministic gates. Every action the agent wants to take — reading a file, running a shell command, sending network traffic — passes through these gates before execution. Critically, all ten gates are pure Lisp functions: they cost zero LLM tokens to evaluate. Safety checking never touches your provider budget.
** The Ten Safety Gates
| Gate | What It Checks |
|------+----------------|
| Lisp syntax | Validates that any Lisp code is well-formed before evaluation |
| Secret file paths | Blocks reads from known secret directories (~.ssh~, ~.env~, ~.aws~, etc.) |
| Self-build core | Prevents modification of the agent's own source and build files |
| Secret content | Scans text output for API keys, tokens, or credential patterns |
| Vault secrets | Guards any secret stored in the encrypted vault |
| Privacy tags | Respects ~@privacy:~ annotations on memory objects and files |
| Privacy text leaks | Scans outgoing text for PII (emails, phone numbers, addresses) |
| Shell safety | Blocks destructive commands (~rm -rf~, ~:(){:|:&};:~, ~mkfs~, ~dd~) |
| Network exfiltration | Blocks outbound traffic carrying private data to unknown hosts |
| High-impact actions | Catches system-level changes (package installs, service restarts, mount) |
** Severity Tiers
Each gate assigns a severity to the action it inspects:
| Severity | Behavior |
|------------+-------------------------------------------------------|
| Catastrophic | Always blocked. No approval possible. |
| Dangerous | Requires HITL approval. Generates a Flight Plan. |
| Moderate | Allowed, but logged. The agent learns from the outcome. |
| Harmless | Always allowed. No logging overhead. |
** What Happens When an Action Is Blocked
When a gate blocks an action, the Dispatcher creates a Flight Plan — a structured record of what the agent wants to do, why it was blocked, and which gate triggered. The Flight Plan is presented to you for review. You can approve it (~/approve~), deny it (~/deny~), or ask the agent to clarify its intent (~/clarify~). Once you approve, the action executes immediately. Once you deny, the Dispatcher records the decision as a permanent rule and will never propose that action again.
* Understanding Context and Focus
Passepartout uses a foveal-peripheral context model, inspired by human vision. This is how the agent decides what to pay attention to in your Memex.
** The Three Levels of Attention
- ~/foveal/~ — What the agent reads deeply and reasons about right now. Anything you explicitly mention, plus the current focused project.
- ~/peripheral/~ — What the agent knows exists (titles, summaries, metadata) but does not read in detail. Everything in scope.
- ~/blind/~ — Outside scope. The agent cannot see or access it.
** Focus Commands
| Command | Effect |
|---------------------+---------------------------------------------------------|
| ~/focus <project>~ | Set the agent's foveal attention to a project |
| ~/scope memex~ | Expand scope to everything in your Memex |
| ~/scope session~ | Narrow scope to just the current conversation |
| ~/scope project~ | Narrow scope to the focused project only |
| ~/unfocus~ | Clear the foveal focus; the agent sees everything at peripheral level |
** The Focus Map
The status bar displays a focus map — a compact representation of what the agent is "looking at." Projects in foveal view are highlighted; peripheral projects are dimmed. When you change focus, the map updates in real time so you always know the agent's current attention budget.
* Skills and What They Do
Skills are hot-reloadable modules that extend the agent's capabilities. Unlike core system files, a bug in a skill degrades the agent but does not kill it — skills can be repaired by the agent itself. Skills are organized into categories by function:
** Core Pipeline
The agent's cognitive loop: Perceive (consume input) → Reason (think with the LLM) → Act (execute tools). This is the central nervous system of the agent.
** Security
~Dispatcher~, ~Policy~, ~Permissions~, ~Validator~, ~Vault~. These skills enforce the safety gates, manage approval workflows, encrypt secrets, and verify that every action conforms to the rules you have set.
** Channels
~TUI~, ~CLI~, ~Telegram~, ~Signal~, ~Discord~, ~Slack~, ~Shell~. Each channel is a separate skill that handles I/O for a specific interface. All channels are equal citizens — the agent treats a message from Telegram identically to one typed in the TUI.
** Programming
~Lisp~, ~Org~, literate tools, ~REPL~, standards libraries. These skills allow the agent to write, evaluate, and reason about Lisp code, manage Org-mode documents, and tangle literate programs into runnable source.
** Symbolic
~Awareness~, ~Scope~, ~Events~, ~Config~, ~Memory~, ~Identity~, ~Time~. These skills manage the agent's internal state: what it knows about itself, what it remembers, how it configures its behavior, and how it tracks time and events.
** Neuro
~Provider~, ~Router~, ~Explorer~. These skills manage the LLM backends. The Provider skill abstracts each LLM API; the Router decides which provider to use based on cost, latency, and availability; the Explorer discovers new providers.
** Embedding
Backends for semantic search and native inference. These skills enable the agent to embed text, search your Memex by meaning rather than exact keyword, and run local inference without network calls.
** Economics
~Tokenizer~, ~Cost Tracker~, ~Token Economics~. These skills count tokens, estimate costs before making LLM calls, track spending across providers, and enforce budget limits.
* The Tool System
The agent has ten cognitive tools — discrete actions it can take to interact with your environment. Each tool maps to a specific capability.
** Read-Only Tools
| Tool | What It Does |
|-------------------+---------------------------------------------|
| ~search-files~ | Search file contents with regex patterns |
| ~find-files~ | Find files by name using glob patterns |
| ~read-file~ | Read the contents of a file on disk |
| ~list-directory~ | List the contents of a directory |
| ~org-find-headline~ | Find a headline in an Org-mode file |
** Write Tools
| Tool | What It Does |
|-------------------+---------------------------------------------|
| ~write-file~ | Create or overwrite a file on disk |
| ~org-modify-file~ | Modify an Org-mode file structurally |
| ~run-shell~ | Execute a shell command |
| ~eval-form~ | Evaluate a Lisp expression |
| ~run-tests~ | Execute a test suite |
** Auto-Approval
Write tools are subject to safety-gate inspection. Read-only tools are auto-approved by default (though the agent still checks for secret-file reads). You can configure per-tool auto-approval in your ~.env~ file with the ~AUTO_APPROVE_TOOLS~ variable:
#+begin_src bash
# Auto-approve read-file and find-files (default)
AUTO_APPROVE_TOOLS=read-file,find-files,list-directory,search-files
#+end_src
* Cost Tracking
Every LLM call costs tokens, and tokens cost money. Passepartout tracks this transparently.
** Token Budgets
Set ~CONTEXT_MAX_TOKENS~ in your ~.env~ file to cap the total context window the agent may use per interaction:
#+begin_src bash
CONTEXT_MAX_TOKENS=128000
#+end_src
The agent will truncate older context rather than exceed this limit.
** Per-Call Cost Tracking
Before every LLM call, the Economics skill estimates the cost (prompt tokens + expected completion tokens) and checks it against your budget. After the call, it records actual usage. The status bar shows your session total.
** The ~/cost~ Command
Toggle cost display in the status bar with ~/cost~. When enabled, you'll see a running total like ~[$0.047]~ showing the estimated cost of the current session.
** Per-Provider Pricing
Different providers charge different rates. The Router skill is aware of this and will choose the cheapest viable provider for each call unless you pin a specific provider:
#+begin_src bash
# Pin to a specific provider
PROVIDER_CASCADE=anthropic
#+end_src
** Prompt Prefix Caching
Providers that support prefix caching (Claude via Anthropic, some OpenRouter models) automatically benefit from it. The agent reuses the system prompt prefix across calls, and the Economics skill tracks the cache-hit savings separately in the cost breakdown.
* Session Control
Passepartout maintains a session history with checkpointed memory snapshots. You can move backward and forward through your session state.
** Undo and Redo
| Command | Effect |
|--------------+----------------------------------------------------------|
| ~/undo~ | Restore the memory to the state before your last action |
| ~/redo~ | Re-apply the last undone action |
| ~/rewind <n>~ | Restore the memory to the state n actions ago |
** What Gets Restored
A session rewind restores three things: file changes (files written or modified are reverted), memory objects (the agent's internal knowledge), and TODO states (the roadmap and task tracking). This means you can safely let the agent explore and experiment — if it goes down a wrong path, rewind and redirect.
* Gate Trace Reference
Below every agent message in the TUI, you'll see colored lines representing the safety-gate trace for that message. These show you exactly which gates ran on the agent's actions and what happened.
| Symbol | Meaning |
|--------+------------------------------------------------------------|
| ~✓~ | Green — the gate passed. The action was allowed. |
| ~✗~ | Red — the gate blocked the action. The reason is shown. |
| ~→~ | Yellow — HITL approval required. A Flight Plan is pending. |
Press ~Ctrl+G~ to toggle gate trace visibility on and off. The most recent gate trace for your last interaction is always available via the ~/why~ command — type ~/why~ and the agent will display the full trace with explanations.
* Tag System
Passepartout uses an Org-mode tag system to annotate and control behavior. Tags are metadata appended to headlines and memory objects.
** Severity Tags
The ~@tag:severity~ tier controls how strictly the safety system handles a tagged item:
| Tag | Behavior |
|------------------+--------------------------------------------------------------|
| ~@tag:block~ | The tagged item is treated as catastrophic — always blocked |
| ~@tag:warn~ | The tagged item triggers HITL approval when accessed |
| ~@tag:log~ | Access is allowed but logged for audit |
** Tag Categories
Configure which tags trigger which behavior with the ~TAG_CATEGORIES~ environment variable:
#+begin_src bash
TAG_CATEGORIES=block:warn:log
#+end_src
** The ~/tags~ Command
Type ~/tags~ to list all tags currently active in the agent's scope, along with their severity levels and the files or memory objects they apply to.
* HITL Deep Dive
When the Safety system blocks an action, a structured workflow begins. Understanding this workflow helps you make informed approval decisions quickly.
** The Flight Plan Lifecycle
1. /Trigger/: A gate rates an action Dangerous or Catastrophic, or a ~@tag:warn~ tag is encountered.
2. /Plan/: The Dispatcher serializes the proposed action into a Flight Plan: what tool, what arguments, what file or command, which gate triggered.
3. /Display/: The TUI shows a yellow prompt with the Flight Plan token (~HITL-ab12~).
4. /Review/: Press ~Tab~ to expand the gate trace and see the full Flight Plan details.
5. /Decision/: You type ~/approve HITL-ab12~ or ~/deny HITL-ab12~.
6. /Execute or Discard/: Approved plans execute immediately. Denied plans are discarded.
7. /Learn/: The Dispatcher increments its rule counter and records the decision as a permanent rule. If you denied an action, the Dispatcher will never propose it again.
** Clarifying Questions
If you are unsure why the agent wants to perform an action, you can ignore the Flight Plan prompt. After three retries without a decision, the agent escalates by injecting a ~/clarify~ message into the pipeline, asking the agent to explain its intent in plain language. You can then approve or deny with full context.
** The Rule Counter
The status bar shows ~[Rules: N]~ — the number of permanent rules the Dispatcher has learned from your decisions. Each approval or denial is a learning event. Over time, the Dispatcher builds a personalized safety profile that reflects your preferences: which actions you always approve, which you always deny, and which you want to review case by case.
* TUI Keybinding Reference
The TUI supports a rich set of keyboard shortcuts for efficient interaction.
** Editing Keys
| Combo | Action |
|-----------+-------------------------------------------|
| ~Ctrl+D~ | Quit the TUI |
| ~Ctrl+U~ | Clear the current input line |
| ~Ctrl+W~ | Delete the word before the cursor |
| ~Ctrl+A~ | Move cursor to beginning of line (Home) |
| ~Ctrl+E~ | Move cursor to end of line |
| ~Ctrl+K~ | Delete from cursor to end of line |
| ~Ctrl+L~ | Redraw the screen |
| ~Ctrl+X+E~ | Open the current input in your external editor (~$EDITOR~) |
| ~Tab~ | Autocomplete commands, themes, and file paths |
** Navigation and Control
| Combo | Action |
|------------------+--------------------------------------------------|
| ~Ctrl+C~ | Interrupt (cascade: stop streaming → stop thinking → quit) |
| ~Ctrl+F~ | Search through message history |
| ~Ctrl+P~ | Open the command palette |
| ~Ctrl+G~ | Toggle gate trace visibility |
| ~Ctrl+X+B~ | Toggle the sidebar (focus map, memory browser) |
| ~Page Up~ | Scroll chat up by 10 lines |
| ~Page Down~ | Scroll chat down by 10 lines |
| ~Up Arrow~ | Previous input in command history |
| ~Down Arrow~ | Next input in command history |
** The Status Bar
The status bar at the bottom of the TUI shows the agent's current state at a glance. Each indicator has a specific meaning:
| Indicator | Meaning |
|------------------+--------------------------------------------------------------------|
| ~[Connected]~ | Green — daemon is reachable on port 9105. Gray — disconnected. |
| ~[Mode: TUI]~ | The current interaction mode (TUI, CLI, Telegram, etc.) |
| ~[Msg: 142]~ | Total messages in the current session |
| ~[↑ 12]~ | Scroll indicator — you are scrolled up 12 lines from the bottom |
| ~[◉]~ | Activity spinner — spinning means the agent is working |
| ~[⟳]~ | Streaming indicator — shown while the agent is generating text |
| ~[$0.047]~ | Session cost (visible when ~/cost~ is toggled on) |
| ~[Rules: 52]~ | Number of permanent HITL rules learned from your decisions |
| ~[prj:my-proj]~ | Current focused project name |
* Deployment
@@ -180,4 +458,4 @@ Restores from a backup file. Run ~passepartout doctor~ afterward to verify integ
** Memory fails to load on startup
- Check ~/memory.snap~ exists and is valid S-expression format
- Run ~passepartout doctor~ to diagnose memory integrity
- If corrupted, delete ~/memory.snap~ and restart — the daemon starts with empty memory
- If corrupted, delete ~/memory.snap~ and restart — the daemon starts with empty memory