passepartout/docs/ARCHITECTURE.org

#+TITLE: Passepartout Architecture
#+AUTHOR: Agent
#+STARTUP: content

* The Four Quadrants

Passepartout divides cognition along two axes: **Foreground vs Background** (initiated by the user vs running autonomously) and **Probabilistic vs Deterministic** (LLM-driven vs pure Lisp logic).

|                | Probabilistic (LLM)                                         | Deterministic (Lisp)                                       |
|----------------+-------------------------------------------------------------+------------------------------------------------------------|
| **Foreground** | Chat responses, task execution, code generation             | Shell execution, file I/O, safety gates, dispatcher checks |
| **Background** | Scribe distillation, vector embedding, autonomous decisions | Heartbeat, cron jobs, memory auto-save, gateway polling    |

The Probabilistic engine proposes. The Deterministic engine verifies and executes. No proposal from the LLM touches a file, runs a command, or sends a message without passing through at least one deterministic gate.

* Architectural Layers

** Core Pipeline (loaded by ASDF — the harness)
- package definition: defpackage, cognitive tools, logging
- memory: memory-object struct, Merkle hashing, snapshots, persistence
- context: foveal-peripheral rendering, context assembly for LLM
- pipeline: perceive → reason → act stages, orchestrator, heartbeat
- skills engine: defskill macro, topological sorter, jailed loading
- communication: framed TCP protocol, actuator registry, daemon server
- diagnostics: health checks, doctor CLI

** Skills (loaded at runtime by the skill engine)
- gateway: TUI, CLI, messaging (Telegram, Signal)
- system-model: provider dispatch, router, embeddings, model explorer
- security: dispatcher (safety gate), policy, permissions, validator, vault
- programming: Lisp, Org, literate tools, REPL, standards
- system: config, archivist, self-improve, memory introspection, shell actuator, event-orchestrator, context-manager, setup

** Clients (connect to daemon via framed TCP protocol)
- TUI: Croatoan-based terminal interface (model-view architecture, dirty-flag rendering)
- CLI: pipe-friendly command-line gateway
- Emacs: elisp bridge speaking the wire protocol (planned v0.4.0)

* Pipeline Flow

Every signal moves through three stages:

```
Signal → Perceive (normalize) → Reason (think + verify) → Act (dispatch)
```

The signal is a plist: ~(:TYPE :EVENT :META (...) :PAYLOAD (:SENSOR :user-input :TEXT "..."))~

1. **Perceive** normalizes raw input from any gateway into a uniform signal
2. **Reason** calls the LLM to generate a proposal, then runs the proposal through all registered deterministic gates (sorted by priority). If a gate rejects the proposal, the rejection trace feeds back to the LLM for self-correction (up to 3 retries)
3. **Act** dispatches the approved action to the registered actuator (~:cli~, ~:tool~, ~:system~, ~:shell~, ~:telegram~, ~:signal~)

Each stage can produce feedback signals that loop back to Perceive (e.g., a tool-execute action produces a ~:tool-output~ event that becomes the next perception).

** Depth limiting

A depth counter prevents infinite loops. If a signal's depth exceeds 10, it is silently dropped. This is the circuit breaker for runaway recursive cycles.

* Foveal-Peripheral Context Model

When the agent assembles context for the LLM, it does not send the entire memory. It renders a sparse outline using three rules:

1. *Depth ≤ 2* — the root node and its immediate children are always included (title and properties only, no content).
2. *Foveal focus* — the node the user is currently interacting with is rendered in full, including its body content and all descendants.
3. *Semantic relevance* — any node whose embedding vector has cosine similarity ≥ threshold (default 0.75) to the foveal node is rendered in full.
4. *Temporal relevance* — nodes modified within a time window (current session, today) are rendered in full. Deadlines and scheduled items approaching within the warning window (default 60 minutes) are surfaced proactively in the awareness context. Nodes older than the window are title-only. This is the temporal dimension of the foveal-peripheral model: prune in time as well as in semantic space.

Nodes that don't match any rule are rendered as title-only — a single Org headline with its :ID: property. This keeps active context between 2,000–4,000 tokens for typical memex sizes, versus 50,000–150,000 tokens for a full serialization. The embedding vectors that power semantic retrieval are computed at ingest time (~ingest-ast~ in core-memory.lisp) and can use local models (Ollama), cloud APIs (OpenAI embeddings), or a zero-dependency lexical fallback (trigram Jaccard similarity).

For the rationale behind sparse-tree rendering and why this architecture outperforms "load everything" systems, see Design Decisions: Org-Mode as Unified AST.

* Dispatcher Gate Stack

Every action the LLM proposes passes through a stack of deterministic gates before execution. Gates are registered as skills with ~defskill~ and sorted by priority (highest first) in ~cognitive-verify~ (core-loop-reason.lisp).

| Priority | Gate                      | What It Checks                                           |
|----------+---------------------------+----------------------------------------------------------|
| 600      | security-permissions      | Tool permission table (allow/ask/deny per tool)          |
| 600      | security-vault            | Credential storage integrity                             |
| 500      | security-policy           | Requires :explanation on every action                    |
| 150      | security-dispatcher       | 11-check safety: lisp, secret path, self-build,          |
|          | (the Dispatcher)          | content exposure, vault, privacy tags, privacy text,      |
|          |                           | shell safety, network exfil, high-impact approval         |
| 95       | security-validator        | Protocol schema validation                               |
| 100      | system-archivist          | Scribe and Gardener maintenance on heartbeat             |
| 80       | system-event-orchestrator | Cron job dispatch on heartbeat                           |

Gates return either the action (passed through unchanged), a rejection (:LOG or :EVENT with block reason), or an approval request (:EVENT with :level :approval-required). Rejections feed back to the LLM as a rejection trace — the model sees what it proposed, which gate blocked it, and why, and retries with that context (up to 3 retries). Approval requests create Flight Plan Org nodes requiring human review via the HITL workflow.

Every gate is a pure Common Lisp function. Verification costs 0 LLM tokens. Contrast with prompt-based guardrails (Claude Code, OpenClaw, Hermes Agent) which consume 100–500 LLM tokens per verification.

For the rationale behind deterministic vs prompt-based safety, see Design Decisions: The Probabilistic-Deterministic Split and The Dispatcher as Learning System.

* Embedding & Semantic Retrieval Pipeline

Every memory-object can carry an embedding vector for semantic search. The pipeline:

1. *Ingest* — ~ingest-ast~ (core-memory.lisp) calls ~embeddings-compute~ on new objects, storing the vector in ~memory-object-vector~.
2. *Queue* — objects with stale vectors are queued via ~mark-vector-stale~. The ~embed-all-pending~ cron job (every 10 minutes, :REFLEX tier) drains the queue and recomputes vectors.
3. *Retrieval* — ~context-awareness-assemble~ (core-context.lisp) passes the foveal node's vector to ~context-object-render~. Nodes with cosine similarity ≥ threshold against the foveal vector are rendered in full rather than as title-only.

Three backends are available, selected via ~EMBEDDING_PROVIDER~:
- :local — Ollama-compatible /api/embeddings endpoint (e.g., nomic-embed-text)
- :openai — OpenAI /v1/embeddings API (e.g., text-embedding-3-small)
- :hashing — zero-dependency lexical fallback using trigram Jaccard similarity (replaced SHA-256 hashing in v0.4.0 because cryptographic hashes maximise output divergence — the opposite of what a similarity metric needs)

For the design rationale, see Design Decisions: Token Economics and Performance Advantage.

* Skill Lifecycle

1. *Discovery:* ~skill-initialize-all~ scans the skills directory, globs for ~*.lisp~ files (excluding ~core-*~ files which are loaded by ASDF)
2. *Sorting:* ~skill-topological-sort~ orders skills by their ~#+DEPENDS_ON:~ declarations
3. *Loading:* Each skill is loaded into a jailed package (~passepartout.skills.<skill-name>~). The loader removes ~in-package~ forms, evaluates the remaining code in the jailed package, and exports symbols matching the skill's short name to ~passepartout~
4. *Registration* The skill's ~defskill~ call creates a ~skill~ struct in ~*skill-registry*~, registering its trigger function, probabilistic prompt generator, deterministic gate, and system-prompt augment
5. *Triggering:* On each cognitive cycle, ~skill-triggered-find~ iterates the registry and returns the highest-priority skill whose trigger matches the context
6. *Hot-reload:* A skill can be replaced at runtime by loading a new version into its jailed package — no restart needed

* Communication protocol Format

All communication between the daemon and its gateways (TUI, CLI, Emacs) uses length-prefixed plists over TCP:

```
00002C(:TYPE :EVENT :PAYLOAD (:ACTION :handshake :VERSION "0.4.0"))
```

The 6-character hex prefix encodes the payload length. The payload is a ~prin1~-serialized plist. ~*read-eval*~ is bound to nil on the receiving end to prevent code injection.

** Standard message envelope:

| Key | Value | Meaning |
|-----|-------|---------|
| ~:TYPE~ | ~:REQUEST~, ~:EVENT~, ~:RESPONSE~, ~:LOG~, ~:STATUS~ | Message category |
| ~:META~ | plist | ~:SOURCE~, ~:SESSION-ID~, ~:reply-stream~ |
| ~:PAYLOAD~ | plist | Action-specific data (~:SENSOR~, ~:ACTION~, ~:TEXT~) |
| ~:DEPTH~ | integer | Recursion counter for loop prevention |

The protocol lifecycle begins with a handshake: the daemon sends a :handshake action with its version, and the client responds with its capabilities. After handshake, either side can send any message type. The daemon never initiates a disconnect — clients poll for messages and reconnect on EOF.

Planned for v0.6.3: streaming chunk frames (~:type :stream-chunk~) carrying partial LLM output. The final chunk is an empty string signalling end-of-stream, enabling interrupt-and-redirect from the client side.