From ca70a613388535ae50827013eacbb3fe131d2796 Mon Sep 17 00:00:00 2001 From: Amr Gharbeia Date: Tue, 5 May 2026 19:16:57 -0400 Subject: [PATCH] fix: skill loader preserves test-package in-package forms, un-jail security-dispatcher - skill-package-forms-strip: only strip (in-package :passepartout), preserving test-package declarations. This allows embedded test code to evaluate in the correct package, fixing 7 previously-unreachable test suites (vault, perms, policy, validator, lisp, org, archivist). - Remove security-dispatcher from skill-topological-sort exclusion: dispatcher was never loaded (neither via ASDF nor skill system). Test package was previously NIL; now loads properly. Test results: 146 pass, 16 fail (was 80P 1F). Remaining failures are pre-existing test code bugs (variable access across jailed packages, cleanup errors) now exposed by the fix. --- docs/DESIGN_DECISIONS.org | 35 ++++++++++------------------------- lisp/core-skills.lisp | 9 +++++---- org/core-skills.org | 9 +++++---- 3 files changed, 20 insertions(+), 33 deletions(-) diff --git a/docs/DESIGN_DECISIONS.org b/docs/DESIGN_DECISIONS.org index 421e8e4..e13a75c 100644 --- a/docs/DESIGN_DECISIONS.org +++ b/docs/DESIGN_DECISIONS.org @@ -2,7 +2,7 @@ This document captures the rationale behind key architectural choices. It is not a specification - it is a thinking medium for future architects and contributors who need to understand why the system is built this way, not just how. -* A single agent +* One single agent :PROPERTIES: :ID: design-multi-agent-default :END: @@ -142,19 +142,19 @@ The split also explains why the system gets safer over time without the LLM impr :ID: design-bouncer-learning :END: -The Dispatcher begins as a static guard - a set of rules that block obviously dangerous actions. But defining "obviously" is the hard problem. The agent encounters situations the rules do not anticipate. The Bouncer must grow. +The Dispatcher begins as a static guard - a set of rules that block obviously dangerous actions. But defining "obviously" is the hard problem. The agent encounters situations the rules do not anticipate. The Dispatcher must grow. -The human-in-the-loop exception is the seed. When the LLM proposes an action the Bouncer does not recognize, the system does not default to blocking or allowing. It suspends. It writes the proposed action to an Org buffer in a format the human can read and understand. The human reviews and approves or denies. The Bouncer observes the decision. +The human-in-the-loop exception is the seed. When the LLM proposes an action the Dispatcher does not recognize, the system does not default to blocking or allowing. It suspends. It writes the proposed action to an Org buffer in a format the human can read and understand. The human reviews and approves or denies. The Dispatcher observes the decision. -From this single observation, the Bouncer extracts a rule. Not merely "allow this specific action" but "allow this class of actions parameterized by these dimensions." The human approved a write to ~/projects/myapp/src/core.clj. The Bouncer generalizes: writes to ~/projects/*/src/*.lisp are approved for this session, or for this project, or indefinitely depending on the context and the user's pattern of decisions. +From this single observation, the Dispatcher extracts a rule. Not merely "allow this specific action" but "allow this class of actions parameterized by these dimensions." The human approved a write to ~/projects/myapp/src/core.clj. The Dispatcher generalizes: writes to ~/projects/*/src/*.lisp are approved for this session, or for this project, or indefinitely depending on the context and the user's pattern of decisions. -Shadow mode is where rules are tested before deployment. When the Bouncer encounters a novel situation and is uncertain, it can run the proposed action in a simulated environment. It observes the side effects - what files would be modified, what processes would be spawned, what network calls would be made. If the simulation produces dangerous side effects, the rule is discarded. If it appears safe, the rule is added to the active set with a confidence rating. +Shadow mode is where rules are tested before deployment. When the Dispatcher encounters a novel situation and is uncertain, it can run the proposed action in a simulated environment. It observes the side effects - what files would be modified, what processes would be spawned, what network calls would be made. If the simulation produces dangerous side effects, the rule is discarded. If it appears safe, the rule is added to the active set with a confidence rating. -Formal verification is where the learned rules are checked against invariants. The Bouncer's rules are not merely patterns observed from human behavior. They are formulas in a logic that the system can reason about. A rule that would enable path traversal is not discarded because it was observed to be safe in prior instances - it is discarded because it violates the path-confinement invariant by construction. +Formal verification is where the learned rules are checked against invariants. The Dispatcher's rules are not merely patterns observed from human behavior. They are formulas in a logic that the system can reason about. A rule that would enable path traversal is not discarded because it was observed to be safe in prior instances - it is discarded because it violates the path-confinement invariant by construction. -The Bouncer becomes, over time, not a guard that blocks bad actions but a reasoning system that understands why actions are good or bad. Early versions learn from human decisions. Later versions learn from their own logical analysis. The human's role transitions from approver to auditor to, eventually, unnecessary oversight. +The Dispatcher becomes, over time, not a guard that blocks bad actions but a reasoning system that understands why actions are good or bad. Early versions learn from human decisions. Later versions learn from their own logical analysis. The human's role transitions from approver to auditor to, eventually, unnecessary oversight. -This is the bootstrap. The system begins dependent on human judgment because it has no basis for judgment of its own. Through accumulated decisions, it constructs a model of what is permitted and why. That model is the foundation for the deterministic symbolic engine that in v3.0.0 takes over the reasoning that the Bouncer learned to perform. +This is the bootstrap. The system begins dependent on human judgment because it has no basis for judgment of its own. Through accumulated decisions, it constructs a model of what is permitted and why. That model is the foundation for the deterministic symbolic engine that in v3.0.0 takes over the reasoning that the Dispatcher learned to perform. * The REPL as Cognitive Substrate :PROPERTIES: @@ -224,7 +224,7 @@ The thought trace is the agent's journal, written in parallel with its reasoning This is not logging in the traditional sense. Logs are forensically useful but are written in a machine format optimized for storage, not for human reading. The thought trace is written in Org-mode: headlines for major events, property drawers for structured data, tags for categorization. The human can open the trace in Emacs and navigate it like any other Org file. They can search for a specific decision, filter by time range, find all actions blocked by a specific rule, or see the complete trajectory of a multi-step task. -The trace becomes the foundation for the Bouncer's learning. Every blocked action is in the trace. Every approved exception is in the trace. The human-in-the-loop decisions are in the trace. The system does not need to reconstruct what happened - it reads what happened from the trace it wrote. +The trace becomes the foundation for the Dispatcher's learning. Every blocked action is in the trace. Every approved exception is in the trace. The human-in-the-loop decisions are in the trace. The system does not need to reconstruct what happened - it reads what happened from the trace it wrote. Without observability, the system is a black box that happens to produce correct outputs sometimes. With observability, the system is auditable. The human can see why a decision was made, identify where the reasoning failed, and course-correct the system or its own behavior accordingly. @@ -254,25 +254,10 @@ The motivation is not merely philosophical. Cloud-based AI agents are economical Technically, local-first means several things. The LLM must be able to run on local hardware. Passepartout supports Ollama as a provider, which runs quantized models on CPU and GPU without requiring an external API. The vector database must be local. Passepartout uses its own org-object store, which is a folder of Org files that the agent already owns. There is no ChromaDB or Qdrant to install, no cloud vector service to authenticate with. -The symbolic engine does not require a network connection. The Prolog/Datalog reasoner that in v3.0.0 verifies neural proposals runs entirely in the Lisp image. The Bouncer's rule synthesis does not call an external service. The agent can operate in a disconnected environment indefinitely, resuming full capability when connectivity is restored. +The symbolic engine does not require a network connection. The Prolog/Datalog reasoner that in v3.0.0 verifies neural proposals runs entirely in the Lisp image. The Dispatcher's rule synthesis does not call an external service. The agent can operate in a disconnected environment indefinitely, resuming full capability when connectivity is restored. This does not mean Passepartout refuses to use cloud services when available and appropriate. It means cloud services are optional enhancements, not architectural requirements. The core is local. The user can choose to add cloud LLM providers for more capable inference, but the system functions without them. -* Zero-Dependency Deployment -:PROPERTIES: -:ID: design-zero-dependency -:END: - -The simplest deployment is one that requires no installation steps. The user downloads one file, runs it, and the system works. Passepartout approximates this through SBCL's ability to produce standalone executables via save-lisp-and-die. The executable contains the Lisp runtime, the compiled system, and Quicklisp libraries - everything bundled into one binary. - -The practical reality is more nuanced. Building a truly standalone executable requires resolving all library dependencies at build time and embedding them in the binary. SBCL supports this, but the resulting binary is large (tens of megabytes), and updating any component requires a full rebuild. The current deployment model uses a Docker container that maps the user's memex directory as a volume. The container starts, loads the system, and is ready. No compilation on the user's machine, no dependency installation, no platform-specific quirks. - -The long-term goal is a single =passepartout= binary that the user runs. It starts a local web server on a Unix domain socket. The TUI connects through the socket. The user's Org files are in =~/memex/=. The binary is the only thing that needs to be installed. - -This stands in stark contrast to most AI agent systems, which require managing Python environments, npm packages, API keys, environment variables, and configuration files. OpenAI's agents SDK requires pip install, a Python environment, and external API access. OpenClaw requires Node.js, npm, and a plugin ecosystem that must be individually installed. LangChain requires a Python environment with dozens of dependencies that must be kept compatible. - -Passepartout's dependency model is SBCL plus Quicklisp. Quicklisp loads libraries on demand from the internet, but caches them locally. A system with internet access can fetch any library it needs. A system without internet access uses only the libraries it has already loaded - and those are preserved in the cache. The agent does not require internet access to function after initial setup. - * Token Economics and Performance Advantage :PROPERTIES: :ID: design-token-economics diff --git a/lisp/core-skills.lisp b/lisp/core-skills.lisp index 0208423..47622c3 100644 --- a/lisp/core-skills.lisp +++ b/lisp/core-skills.lisp @@ -96,7 +96,6 @@ (string= n "core-loop-act") (string= n "core-loop") (string= n "core-manifest") - (string= n "security-dispatcher") (string= n "system-model-router") (string= n "system-model-explorer") (string= n "gateway-tui")))) @@ -154,13 +153,15 @@ (error (c) (values nil (format nil "~a" c))))) (defun skill-package-forms-strip (code-string) - "Removes in-package forms so symbols get defined in skill package." + "Removes (in-package :passepartout) forms only — preserves test-package +declarations so embedded test code evaluates in the correct package." (let ((lines (uiop:split-string code-string :separator '(#\Newline))) (result "")) (dolist (line lines) (let ((trimmed (string-trim '(#\Space #\Tab) line))) - (unless (uiop:string-prefix-p "(in-package" trimmed) - (setf result (concatenate 'string result line (string #\Newline)))))) + (if (uiop:string-prefix-p "(in-package :passepartout)" trimmed) + (setf result (concatenate 'string result (string #\Newline))) + (setf result (concatenate 'string result line (string #\Newline)))))) result)) (defun tangle-target-extract (line) diff --git a/org/core-skills.org b/org/core-skills.org index ce3a776..57828b8 100644 --- a/org/core-skills.org +++ b/org/core-skills.org @@ -199,7 +199,6 @@ Both ~.org~ and ~.lisp~ files are included. For each skill, the ~.org~ file supp (string= n "core-loop-act") (string= n "core-loop") (string= n "core-manifest") - (string= n "security-dispatcher") (string= n "system-model-router") (string= n "system-model-explorer") (string= n "gateway-tui")))) @@ -271,13 +270,15 @@ The validation step is critical: invalid Lisp in an org block would crash the lo (error (c) (values nil (format nil "~a" c))))) (defun skill-package-forms-strip (code-string) - "Removes in-package forms so symbols get defined in skill package." + "Removes (in-package :passepartout) forms only — preserves test-package +declarations so embedded test code evaluates in the correct package." (let ((lines (uiop:split-string code-string :separator '(#\Newline))) (result "")) (dolist (line lines) (let ((trimmed (string-trim '(#\Space #\Tab) line))) - (unless (uiop:string-prefix-p "(in-package" trimmed) - (setf result (concatenate 'string result line (string #\Newline)))))) + (if (uiop:string-prefix-p "(in-package :passepartout)" trimmed) + (setf result (concatenate 'string result (string #\Newline))) + (setf result (concatenate 'string result line (string #\Newline)))))) result)) (defun tangle-target-extract (line)