passepartout/docs/ROADMAP.org

#+TITLE: Passepartout Evolutionary Roadmap
#+STARTUP: content
#+FILETAGS: :docs:roadmap:

* The Evolutionary Roadmap

The roadmap is designed working backwards from SOTA parity (v1.0.0), guiding each version toward a fully autonomous, self-editing agent. Each version builds on the previous, with features designed to be implemented in pure Common Lisp + Org-mode.

The TODO states in each version's Tasks section are the authoritative task tracker. The feature tables describe what each version delivers.

** Non-Negotiable Identity
- Pure Common Lisp + Org-mode. No JSON. No YAML. No external databases.
- Single-address-space memory (Lisp hash tables in RAM — the agent IS the memory).
- "Thin harness, fat skills" — complexity lives at the edges, not the kernel.
- One agent composed of many skills. Concurrency via bordeaux-threads (shared memory).
- Plists everywhere — homoiconic communication between all components.

** Version Roadmap

*** v0.1.0: The Autonomous Foundation — RELEASED

The secure, auditable Lisp kernel. All core infrastructure in place.

- Perceive-Reason-Act pipeline (3-stage metabolic loop)
- Skills engine with jailed loading (defskill, topological sort, hot-reload)
- Policy skill (6 invariants)
- Memory (memory-object + Merkle hashing)
- Scribe + Gardener background workers
- LLM gateway (OpenRouter, Ollama)
- Shell actuator, Emacs bridge, credentials vault
- FiveAM test suite

*** v0.2.0: Interactive Refinement — RELEASED

The "Brain" meets the "Machine." Standardization and professionalization of the user interface and environment.

- Professional TUI (Croatoan-based, styled, scrollable)
- Self-editing (detects errors, applies fixes, learns from outcomes)
- Enhanced utilities (structural Lisp/Org manipulation + REPL)
- Onboarding wizard (modular Lisp setup for multiple LLM providers)
- Memory rollback (snap back to known-good state)
- Project renamed to Passepartout
- Secret Exposure Gate, Shell Safety, Lisp Validation Gate
- Multi-distro deployment (Debian + Fedora), systemd service, Docker
- 31 org files with full literate prose

*** v0.3.0: Event Orchestration + HITL

Unified control plane and Human-in-the-Loop state management.

** Tasks

*** DONE Project Renaming (Bouncer → Dispatcher) [2026-05-02 Sat]
The Dispatcher's role has evolved beyond security guard. It is the seed of the deterministic engine — it learns to execute procedures without invoking the neural net.

*** DONE Event Orchestrator (unified hooks+cron+routing)
Unified control plane for hooks, cron, and complexity-based routing.
- *hook-registry* + *cron-registry* + tier classifier
- Hooks via ~#+HOOK:~ Org-mode properties
- Three complexity tiers: ~:REFLEX~ (no LLM), ~:COGNITION~ (light LLM), ~:REASONING~ (full LLM)
- Hooked into heartbeat for cron processing
- Rule-based tier classifier (overrideable via ~*tier-classifier*~)

*** TODO Context Manager (project scoping)
Stack-based context with ~push-context~ / ~pop-context~.
Path resolution relative to current context.
Memory scope: ~:scope~ property on memory-objects (memex/session/project).
Implement lazy-loading proxies for large-scale memory traversal.

*** TODO Model-Tier Routing (cost optimization)
Extend ~*model-selector-fn*~ for complexity-based routing.
- Heartbeats → smallest model
- User input → medium model
- Complex reasoning → large model

*** TODO Memory Scope Segmentation
Extend memory-object with ~:scope~ property.
- ~:memex~ (permanent knowledge), ~:session~ (ephemeral), ~:project~ (current work)
- Scope-aware retrieval in memory layer

*** TODO Asynchronous Embedding Gateway
Provider-agnostic vector generation (Ollama, llama.cpp, OpenAI).
Edits mark nodes as ~:vector :pending~; background worker batches and updates Merkle tree.

*** TODO TUI Experience (Daily Driver Quality)
The TUI is a standalone Croatoan app in ~org/gateway-tui.org~.
None of these changes require daemon modifications — the protocol between TUI and
daemon (port 9105, framed plists) is stable.

- P0: Chat scrollback (Page Up/Down) — ~2h
- P0: Input history (up/down arrows) — ~1h
- P1: Status bar (daemon, model, time) — ~3h
- P1: Message rendering (timestamps, colors, wrapping) — ~2h
- P2: Command palette (/help redesign) — ~4h
- P2: Multi-line input (Shift+Enter) — ~3h
- P3: Background activity indicator — ~2h
- P4: Tab completion for / commands — ~3h
- P4: Configurable theme — ~4h

*** TODO Human-in-the-Loop (HITL)
Continuation-based interaction. The agent can suspend its cognitive loop to ask for
permission or clarification and resume precisely where it left off. Builds on the
dispatcher's existing Flight Plan mechanism.

*** v0.4.0: Long-Horizon Planning + Git Workflows

Structured tracking, failure handling, and course correction for multi-step engineering work.

** Tasks

*** TODO Long-Horizon Planning (task tree DAG)
Decompose complex tasks into Org-mode headline trees.
Terminal states: ~:todo~ → ~:next-action~ → ~:in-progress~ → ~:done~ / ~:blocked~ / ~:stuck~.
Parent summarises child results.
Branch pruning when paths fail.

*** TODO Git Steward (version control integration)
Status, diff, commit, push, branch operations.
Policy enforces commit-before-modify gate.
Log commits to memory.

*** TODO TDD Runner Integration
Run FiveAM tests on file save.
Inject ~:test-failure~ event on red.
Hook into self-fix for auto-repair proposals.

*** TODO Deep Emacs Integration
Full org-agenda awareness: navigate, clock time, refile, archive.
Uses org-element + org-id.

*** v0.5.0: Interactive Actuation & Environment Stewardship

Interactive terminal sessions and autonomous dependency management.

** Tasks

*** TODO Interactive PTY Actuator
Stream long-running process output to the context window (e.g., ~npm run dev~, REPLs).
Async interrupt control (Ctrl+C emulation).

*** TODO The Environment Steward
Autonomously detect missing dependencies ("Command not found").
Propose installation command and retry the failed action.

*** v0.6.0: Concurrency + Creator + GTD

The agent bootstraps itself and manages parallel workstreams.

** Tasks

*** TODO Skill Creator (autonomous skill generation)
LLM drafts complete skill org-file from natural language.
Mandatory: syntax validation → jail-load → test → register.

*** TODO Architect Agent (PRD → PROTOCOL)
Scan ~:STATUS: FROZEN~ PRDs. Generate Phase B PROTOCOL from Phase A.

*** TODO GTD Integration (project tracking)
Full GTD cycle: capture, clarify, organize, reflect, engage.
org-gtd v4.0 DAG (~:TRIGGER:~, ~:BLOCKER:~).

*** TODO Consensus Loop (multi-model agreement)
Run multiple providers for critical decisions.
Compare results, detect disagreements.
Confidence scoring.

*** TODO Web Research (Playwright browsing)
Headless Chromium via Python bridge.
Text extraction, screenshots, Gemini Web UI automation.

*** TODO Memex Management (PARA lifecycle)
Archive DONE tasks, suggest refiling.
Detect orphaned nodes.
PARA/Zettelkasten maintenance.

*** v0.7.0: Visual Grounding & MCP Bridge

Multimodal visual interaction and ecosystem-wide tool compatibility.

** Tasks

*** TODO Computer Use / Vision
Allow the agent to request host OS or browser screenshots.
Analyze UI and issue precise X/Y coordinate click/type commands via X11/Wayland bridge.

*** TODO MCP Gateway Bridge
Lisp-native client for the Model Context Protocol.
Connect Passepartout to external tools and data sources.

*** v0.8.0: The Evaluation Harness

Automated benchmarking to mathematically prove the agent's reasoning capabilities.

** Tasks

*** TODO SWE-Bench Harness
Automated pipeline that clones repositories and feeds GitHub issues.
Track multi-step resolution trajectory, run tests, and score success.

*** v1.0.0: SOTA Parity

Feature-complete agent competitive with commercial agents. All features from v0.2.0 through v0.8.0 combined, verified, and tested end-to-end.

| Area | Parity Target |
|------|--------------|
| Self-improvement | Claude Code self-debug |
| Planning | ULTRAPLAN equivalent |
| Tool ecosystem | 10+ cognitive tools |
| Context window | Semantic search + scope segmentation |
| Safety | 6 Policy invariants + formal verification |
| Multi-step tasks | Task trees with terminal states |
| Code editing | Full file read/write via org manipulation |
| Memory | Vector recall in memory-object |
| Emacs integration | Full org-mode control (exceeds Claude Code) |
| Autonomy | 100% local capable (exceeds Claude Code) |

*** v2.0.0: Lisp Machine Emergence

From Lisp-using agent to true Lisp machine. Agent IS the Emacs process.

- Lish: Lisp editor — Org-mode as IDE. Org-babel for interactive evaluation. Full REPL in TUI.
- Lish: Shell replacement — Lisp-based shell that speaks plists. Org-mode buffers as file system.

*** v3.0.0: Neurosymbolic Maturity

Deterministic planner takes the wheel. LLM relegated to semantic translation.

- Deterministic planner: Pure Lisp task scheduler. No LLM needed for scheduling.
- Self-correcting gates: Gates learn from false positives (user override patterns).

*** v4.0.0: AI Stack Internalized

The agent understands its own weights. No external inference.

- Llama.cpp in Lisp: FFI binding. No Python subprocess. Pure Common Lisp inference.
- Weights as sexps: Neural weights as Lisp data structures. Homoiconic model introspection.

*** v5.0.0: True Agency

World models, temporal reasoning, goal persistence across restarts.

- World models: Predictive models of user behavior, project dynamics, system state.
- Temporal reasoning: Scheduling, deadlines, elapsed duration awareness.
- Goal persistence: Goals survive restarts. Long-term projects in memory-objects.