tests: TUI integration + cascade parsing — precise LLM diagnostics
Some checks failed
Deploy (Gitea) / deploy (push) Failing after 2s

- TUI agent-responds: hardened to detect and FAIL on cascade/exhausted
  responses (previously a separate WARN-only test that let real
  cascade failures slip through)
- New TUI cascade-parsing test: /eval *provider-cascade* on screen,
  checks for clean keywords (no cl-dotenv quote artifacts)
- Pre-warm step: sbcl --eval '(ql:quickload :passepartout/tui)'
  before launching tmux, cuts TUI startup from ~120s to ~10s
- Removed test_agent_not_cascade_failure (absorbed into agent-responds)
- New integration test: test-provider-cascade-parsing verifies
  PROVIDER_CASCADE entries are keywords without quotes, matching
  registered backends — catches the exact cl-dotenv quote bug
- Fixed stop-daemon ghost symbol (removed export) and paren bug
- Contract section updated with numbered Phase 2/3 items
This commit is contained in:
2026-05-06 08:56:07 -04:00
parent 9362c56678
commit 750918527d
4 changed files with 180 additions and 107 deletions

View File

@@ -2,7 +2,9 @@
This document captures the rationale behind key architectural choices. It is not a specification - it is a thinking medium for future architects and contributors who need to understand why the system is built this way, not just how.
* One single agent
* Design
** One single agent
:PROPERTIES:
:ID: design-multi-agent-default
:END:
@@ -23,7 +25,7 @@ But the default assumption that complex reasoning tasks are best solved by multi
Passepartout is single-agent by default not from limitation but from conviction: for reasoning-heavy work where coherence matters, a unified memory space and single decision-making locus are architectural assets, not constraints.
* The Unified Memory Argument
** The Unified Memory Argument
:PROPERTIES:
:ID: design-unified-memory
:END:
@@ -42,7 +44,7 @@ Context window limits are largely a symptom of lazy architecture. The default ap
The unified memory argument is not that infinite context is free. It is that with proper architecture, effective infinite context is achievable without the synchronization and fragmentation costs of multi-agent systems.
* Org-Mode as Unified AST
** Org-Mode as Unified AST
:PROPERTIES:
:ID: design-org-unified-ast
:END:
@@ -75,7 +77,7 @@ The unified format is what makes the memory architecture work. The agent's memor
This is what "sovereignty" means in technical terms: the user owns the data in a format they can access, and the agent operates on the data in the same format they own.
* Homoiconicity as Foundation
** Homoiconicity as Foundation
:PROPERTIES:
:ID: design-homoiconicity
:END:
@@ -118,7 +120,7 @@ Six decades later, neural networks have arrived at the problem from a different
Lisp's time may finally have come. Not as a replacement for neural networks, but as the governor that makes them safe - the symbolic engine that verifies what the neural engine proposes, the homoiconic substrate that allows the system to inspect, modify, and improve its own reasoning. The machine that was designed for AI in 1958 may be the exact machine needed for AI in 2026 and beyond.
* The Probabilistic-Deterministic Split
** The Probabilistic-Deterministic Split
:PROPERTIES:
:ID: design-probabilistic-deterministic
:END:
@@ -137,7 +139,7 @@ This separation is the source of Passepartout's safety guarantee. Other agents a
The split also explains why the system gets safer over time without the LLM improving. The deterministic engine accumulates rules. The LLM proposes actions, the engine evaluates them against a growing rule set. Early versions block obvious dangers. Later versions block sophisticated attacks that were previously unknown. The safety grows logarithmically with the number of interactions, not linearly with model capability.
* The Dispatcher as Learning System
** The Dispatcher as Learning System
:PROPERTIES:
:ID: design-bouncer-learning
:END:
@@ -156,7 +158,7 @@ The Dispatcher becomes, over time, not a guard that blocks bad actions but a rea
This is the bootstrap. The system begins dependent on human judgment because it has no basis for judgment of its own. Through accumulated decisions, it constructs a model of what is permitted and why. That model is the foundation for the deterministic symbolic engine that in v3.0.0 takes over the reasoning that the Dispatcher learned to perform.
* The REPL as Cognitive Substrate
** The REPL as Cognitive Substrate
:PROPERTIES:
:ID: design-repl-cognition
:END:
@@ -177,7 +179,22 @@ Third, the REPL is a shared substrate. When the agent evaluates code, that code
This is why the REPL becomes more important as the system matures. In early versions, it is a development tool. In v0.6.0 and beyond, it becomes a cognitive tool: the agent explores hypotheses by evaluating them, verifies the output of sub-agents by inspecting live state, and tests modifications before committing them to the knowledge graph.
* Literate Programming as Discipline
** Observability and the Thought Trace
:PROPERTIES:
:ID: design-observability
:END:
When a human asks why the system made a decision, the answer must be findable. In most AI systems, the reasoning is ephemeral - it exists in the model's activations and disappears when the session ends. In Passepartout, every significant cognitive event is written to an Org buffer as it happens.
The thought trace is the agent's journal, written in parallel with its reasoning. When the probabilistic engine generates a proposal, the trace records the input, the prompt, and the raw output. When the deterministic engine evaluates it, the trace records which rules were checked, which passed, which failed, and why. When an action is executed, the trace records the timestamp, the user who approved it (if human-in-the-loop), and the outcome.
This is not logging in the traditional sense. Logs are forensically useful but are written in a machine format optimized for storage, not for human reading. The thought trace is written in Org-mode: headlines for major events, property drawers for structured data, tags for categorization. The human can open the trace in a text editor and navigate it like any other Org file. They can search for a specific decision, filter by time range, find all actions blocked by a specific rule, or see the complete trajectory of a multi-step task.
The trace becomes the foundation for the Dispatcher's learning. Every blocked action is in the trace. Every approved exception is in the trace. The human-in-the-loop decisions are in the trace. The system does not need to reconstruct what happened - it reads what happened from the trace it wrote.
Without observability, the system is a black box that happens to produce correct outputs sometimes. With observability, the system is auditable. The human can see why a decision was made, identify where the reasoning failed, and course-correct the system or its own behavior accordingly.
** Literate Programming as Discipline
:PROPERTIES:
:ID: design-literate-programming
:END:
@@ -198,7 +215,7 @@ Together, these constraints create a development experience that is slower in th
The literate programming discipline is not about producing documentation. It is about producing code whose correctness has been verified by the act of explaining it.
* The Evaluation Harness
** The Evaluation Harness
:PROPERTIES:
:ID: design-evaluation-harness
:END:
@@ -213,22 +230,7 @@ Beyond SWE-bench, the harness includes chaos testing. The system is subjected to
The harness also supports regression testing on the skill set. Every skill is tested against a suite of known inputs and expected outputs. When a modification is proposed to any skill - whether through manual editing or the agent's own self-modification - the test suite runs first. A skill that fails its tests is rejected before it can propagate to the running image. This is not a convenience - it is the mechanism by which self-modification remains safe. The agent can propose changes, but the harness verifies them before the changes take effect.
* Observability and the Thought Trace
:PROPERTIES:
:ID: design-observability
:END:
When a human asks why the system made a decision, the answer must be findable. In most AI systems, the reasoning is ephemeral - it exists in the model's activations and disappears when the session ends. In Passepartout, every significant cognitive event is written to an Org buffer as it happens.
The thought trace is the agent's journal, written in parallel with its reasoning. When the probabilistic engine generates a proposal, the trace records the input, the prompt, and the raw output. When the deterministic engine evaluates it, the trace records which rules were checked, which passed, which failed, and why. When an action is executed, the trace records the timestamp, the user who approved it (if human-in-the-loop), and the outcome.
This is not logging in the traditional sense. Logs are forensically useful but are written in a machine format optimized for storage, not for human reading. The thought trace is written in Org-mode: headlines for major events, property drawers for structured data, tags for categorization. The human can open the trace in Emacs and navigate it like any other Org file. They can search for a specific decision, filter by time range, find all actions blocked by a specific rule, or see the complete trajectory of a multi-step task.
The trace becomes the foundation for the Dispatcher's learning. Every blocked action is in the trace. Every approved exception is in the trace. The human-in-the-loop decisions are in the trace. The system does not need to reconstruct what happened - it reads what happened from the trace it wrote.
Without observability, the system is a black box that happens to produce correct outputs sometimes. With observability, the system is auditable. The human can see why a decision was made, identify where the reasoning failed, and course-correct the system or its own behavior accordingly.
* The MCP Strategy
** The MCP Strategy
:PROPERTIES:
:ID: design-mcp-strategy
:END:
@@ -243,7 +245,7 @@ The alternative is to build MCP wrappers in Python or TypeScript and bridge to L
Passepartout's native client is smaller, faster, and more maintainable. The MCP client is a skill, not a core component. It can be reloaded, replaced, or removed without restarting the agent. The agent can add new MCP tool integrations by loading new skills, not by deploying new infrastructure.
* Local-First Architecture
** Local-First Architecture
:PROPERTIES:
:ID: design-local-first
:END:
@@ -277,7 +279,7 @@ The three structural multipliers are:
These compound. A coding session touching 20 files, performing 10 actions, and triggering 3 errors saves ~50,000-100,000 tokens compared to the same session with Claude Code.
** Per-Task Type Analysis
** Per-Task Type Guesstemates
*** Coding (debugging, refactoring, PR review)

View File

@@ -3,13 +3,13 @@
(ql:quickload :usocket :silent t))
(defpackage :passepartout-integration-tests
(:use :cl :fiveam :passepartout)
(:use :cl :passepartout)
(:export #:integration-suite))
(in-package :passepartout-integration-tests)
(def-suite integration-suite :description "Integration tests across process boundaries")
(in-suite integration-suite)
(fiveam:def-suite integration-suite :description "Integration tests across process boundaries")
(fiveam:in-suite integration-suite)
(defvar *daemon-port* nil)
@@ -27,7 +27,7 @@
(passepartout:start-daemon :port *daemon-port*)
(sleep 2)
,@body)
(handler-case (passepartout:stop-daemon) (error ())))))
(values)))
(defun daemon-connect ()
(let* ((sock (usocket:socket-connect "127.0.0.1" *daemon-port*))
@@ -47,14 +47,14 @@
(when (> (get-universal-time) deadline) (return nil))
(sleep 0.1))))
(test test-daemon-starts
(fiveam:test test-daemon-starts
"Contract 1: daemon binds port and sends valid handshake."
(with-daemon ()
(multiple-value-bind (stream sock) (daemon-connect)
(is (open-stream-p stream))
(usocket:socket-close sock))))
(test test-pipeline-user-input
(fiveam:test test-pipeline-user-input
"Contract 2: :user-input traverses pipeline and produces a response."
(with-daemon ()
(multiple-value-bind (stream sock) (daemon-connect)
@@ -66,7 +66,7 @@
(is (not (null resp)) "Expected a response")))
(usocket:socket-close sock)))))
(test test-pipeline-heartbeat
(fiveam:test test-pipeline-heartbeat
"Contract 2: heartbeat signals do not crash the daemon."
(with-daemon ()
(multiple-value-bind (stream sock) (daemon-connect)
@@ -76,7 +76,7 @@
(usocket:socket-close sock))
(pass))))
(test test-tcp-round-trip
(fiveam:test test-tcp-round-trip
"Contract 3: framed health-check survives TCP round-trip."
(with-daemon ()
(multiple-value-bind (stream sock) (daemon-connect)
@@ -88,7 +88,7 @@
(is (member (getf resp :type) '(:HEALTH-RESPONSE)))))
(usocket:socket-close sock)))))
(test test-daemon-survives-junk
(fiveam:test test-daemon-survives-junk
"Contract 3: daemon does not crash on junk input."
(with-daemon ()
(multiple-value-bind (stream sock) (daemon-connect)
@@ -101,7 +101,7 @@
(is (open-stream-p stream2))
(usocket:socket-close sock2))))
(test test-skill-registry-populated
(fiveam:test test-skill-registry-populated
"Contract 4: *skill-registry* is populated after daemon start."
(with-daemon ()
(is (hash-table-p passepartout::*skill-registry*))
@@ -109,7 +109,7 @@
"Expected at least 1 skill in registry, got ~a"
(hash-table-count passepartout::*skill-registry*))))
(test test-shell-safe-echo
(fiveam:test test-shell-safe-echo
"Contract 5: safe shell command does not crash the daemon."
(with-daemon ()
(multiple-value-bind (stream sock) (daemon-connect)
@@ -120,7 +120,7 @@
(usocket:socket-close sock))
(pass))))
(test test-shell-dangerous-blocked
(fiveam:test test-shell-dangerous-blocked
"Contract 5: rm -rf / is blocked by the security dispatcher."
(with-daemon ()
(multiple-value-bind (stream sock) (daemon-connect)
@@ -131,7 +131,7 @@
(usocket:socket-close sock))
(pass))))
(test test-cli-gateway-input
(fiveam:test test-cli-gateway-input
"Contract 6: text via TCP produces a response."
(with-daemon ()
(multiple-value-bind (stream sock) (daemon-connect)
@@ -142,7 +142,7 @@
(usocket:socket-close sock))
(pass))))
(test test-gateway-registry
(fiveam:test test-gateway-registry
"Contract 7: gateway-registry-initialize is available."
(with-daemon ()
(is (fboundp 'gateway-registry-initialize))
@@ -162,7 +162,7 @@
(format t " [SKIP] ~a not set~%" ,env-var)
(skip "~a not set" ,env-var))))
(test test-provider-openai-request
(fiveam:test test-provider-openai-request
"Contract Phase2: provider-openai-request returns :success with valid API key."
(skip-unless "OPENROUTER_API_KEY"
(let ((result (provider-openai-request "Say hello" "Be brief."
@@ -172,14 +172,28 @@
(eq (getf result :status) :error))
"Expected :success or :error, got: ~a" result))))
(test test-backend-cascade-real
(fiveam:test test-backend-cascade-real
"Contract Phase2: backend-cascade-call returns string content with real provider."
(skip-unless "OPENROUTER_API_KEY"
(let ((passepartout::*provider-cascade* '(:openrouter)))
(let ((result (backend-cascade-call "Say hello" :system-prompt "Be brief.")))
(is (stringp result) "Expected string response, got: ~a" result)))))
(test test-messaging-link-unlink
(fiveam:test test-provider-cascade-parsing
"Contract Phase2: PROVIDER_CASCADE env var parses to clean keywords matching backends."
(provider-cascade-initialize)
(let ((cascade passepartout::*provider-cascade*))
(is (listp cascade) "Cascade must be a list")
(is (>= (length cascade) 1) "Cascade must have at least one entry")
(dolist (entry cascade)
(is (keywordp entry) "Entry ~s must be a keyword" entry)
(let ((name (symbol-name entry)))
(is (not (find #\" name)) "Entry ~s must not contain double-quote" entry)
(is (not (find #\' name)) "Entry ~s must not contain single-quote" entry)))
(is (some (lambda (e) (gethash e passepartout::*probabilistic-backends*)) cascade)
"At least one cascade entry must match a registered backend")))
(fiveam:test test-messaging-link-unlink
"Contract Phase2: messaging-link stores token, configured-p returns T, unlink removes it."
(with-daemon ()
(messaging-link :test-platform :token "fake-token-123")
@@ -189,19 +203,19 @@
(is (not (gateway-configured-p :test-platform))
"Expected test-platform to be unconfigured after unlinking")))
(test test-gateway-configured-p-false
(fiveam:test test-gateway-configured-p-false
"Contract Phase2: gateway-configured-p returns nil for unknown platform."
(with-daemon ()
(is (not (gateway-configured-p :nonexistent-platform-xyz)))))
(test test-gateway-start-messaging
(fiveam:test test-gateway-start-messaging
"Contract Phase2: gateway registry initializes with expected platforms."
(with-daemon ()
(gateway-registry-initialize)
(is (hash-table-p passepartout::*gateway-registry*))
(is (>= (hash-table-count passepartout::*gateway-registry*) 1))))
(test test-flight-plan-message-format
(fiveam:test test-flight-plan-message-format
"Contract Phase3: dispatcher-flight-plan-create returns valid message."
(with-daemon ()
(load (merge-pathnames ".local/share/passepartout/lisp/security-dispatcher.lisp"
@@ -216,7 +230,7 @@
(is (string= "PLAN" (getf attrs :TODO)))
(is (member "FLIGHT_PLAN" (getf attrs :TAGS) :test #'string-equal))))))
(test test-emacs-daemon-connect
(fiveam:test test-emacs-daemon-connect
"Contract Phase3: Emacs daemon is reachable via emacsclient."
(handler-case
(let ((result (uiop:run-program '("emacsclient" "--eval" "(+ 1 2)")
@@ -224,4 +238,4 @@
:ignore-error-status t)))
(is (search "3" result) "Expected '3' from emacsclient, got: ~a" result))
(error (c)
(skip "Emacs daemon not available: ~a" c))))
(skip "Emacs daemon not available: ~a" c)))))

View File

@@ -22,11 +22,20 @@ Phase 1 — In-process daemon (no external credentials):
6. CLI gateway: text injected via TCP reaches the pipeline.
7. Gateway registry: ~gateway-registry-initialize~ is available.
Phase 2 — LLM + messaging (gated on env vars, future):
Provider cascade, timeout, response parsing; messaging link/unlink.
Phase 2 — LLM + messaging:
Phase 3 — External processes (tmux + Emacs, future):
TUI rendering, /eval, connection drop; Emacs Flight Plan, node insertion.
8. Provider cascade: ~PROVIDER_CASCADE~ entries are clean keywords
matching registered backends (no quote contamination).
9. Backend cascade: real provider returns string content.
Phase 3 — TUI via tmux:
10. Agent response: TUI ↛ daemon ↛ LLM round-trip produces non-cascade
agent text on screen.
11. Cascade inspection: ~/eval *provider-cascade*~ shows clean keywords
on TUI screen (no quote artifacts).
12. Eval command: ~/eval (+ 1 2)~ displays ~~=> 3~~ on screen.
13. Status bar: rendered screen shows ~~msgs:~~ in status bar.
** Boundaries
@@ -44,13 +53,13 @@ Shared test harness: package, suite, helpers, and ~with-daemon~.
(ql:quickload :usocket :silent t))
(defpackage :passepartout-integration-tests
(:use :cl :fiveam :passepartout)
(:use :cl :passepartout)
(:export #:integration-suite))
(in-package :passepartout-integration-tests)
(def-suite integration-suite :description "Integration tests across process boundaries")
(in-suite integration-suite)
(fiveam:def-suite integration-suite :description "Integration tests across process boundaries")
(fiveam:in-suite integration-suite)
(defvar *daemon-port* nil)
@@ -68,7 +77,7 @@ Shared test harness: package, suite, helpers, and ~with-daemon~.
(passepartout:start-daemon :port *daemon-port*)
(sleep 2)
,@body)
(handler-case (passepartout:stop-daemon) (error ())))))
(values)))
(defun daemon-connect ()
(let* ((sock (usocket:socket-connect "127.0.0.1" *daemon-port*))
@@ -94,7 +103,7 @@ Shared test harness: package, suite, helpers, and ~with-daemon~.
Verifies the daemon starts, binds its port, and sends a valid handshake.
#+begin_src lisp
(test test-daemon-starts
(fiveam:test test-daemon-starts
"Contract 1: daemon binds port and sends valid handshake."
(with-daemon ()
(multiple-value-bind (stream sock) (daemon-connect)
@@ -107,7 +116,7 @@ Verifies the daemon starts, binds its port, and sends a valid handshake.
Sends a ~:user-input~ event and verifies the pipeline produces a response.
#+begin_src lisp
(test test-pipeline-user-input
(fiveam:test test-pipeline-user-input
"Contract 2: :user-input traverses pipeline and produces a response."
(with-daemon ()
(multiple-value-bind (stream sock) (daemon-connect)
@@ -119,7 +128,7 @@ Sends a ~:user-input~ event and verifies the pipeline produces a response.
(is (not (null resp)) "Expected a response")))
(usocket:socket-close sock)))))
(test test-pipeline-heartbeat
(fiveam:test test-pipeline-heartbeat
"Contract 2: heartbeat signals do not crash the daemon."
(with-daemon ()
(multiple-value-bind (stream sock) (daemon-connect)
@@ -135,7 +144,7 @@ Sends a ~:user-input~ event and verifies the pipeline produces a response.
Verifies framed TCP round-trip and malformed-input resilience.
#+begin_src lisp
(test test-tcp-round-trip
(fiveam:test test-tcp-round-trip
"Contract 3: framed health-check survives TCP round-trip."
(with-daemon ()
(multiple-value-bind (stream sock) (daemon-connect)
@@ -147,7 +156,7 @@ Verifies framed TCP round-trip and malformed-input resilience.
(is (member (getf resp :type) '(:HEALTH-RESPONSE)))))
(usocket:socket-close sock)))))
(test test-daemon-survives-junk
(fiveam:test test-daemon-survives-junk
"Contract 3: daemon does not crash on junk input."
(with-daemon ()
(multiple-value-bind (stream sock) (daemon-connect)
@@ -166,7 +175,7 @@ Verifies framed TCP round-trip and malformed-input resilience.
Verifies the skill loader populates ~*skill-registry*~ after daemon start.
#+begin_src lisp
(test test-skill-registry-populated
(fiveam:test test-skill-registry-populated
"Contract 4: *skill-registry* is populated after daemon start."
(with-daemon ()
(is (hash-table-p passepartout::*skill-registry*))
@@ -180,7 +189,7 @@ Verifies the skill loader populates ~*skill-registry*~ after daemon start.
Verifies safe shell commands execute and dangerous patterns are blocked.
#+begin_src lisp
(test test-shell-safe-echo
(fiveam:test test-shell-safe-echo
"Contract 5: safe shell command does not crash the daemon."
(with-daemon ()
(multiple-value-bind (stream sock) (daemon-connect)
@@ -191,7 +200,7 @@ Verifies safe shell commands execute and dangerous patterns are blocked.
(usocket:socket-close sock))
(pass))))
(test test-shell-dangerous-blocked
(fiveam:test test-shell-dangerous-blocked
"Contract 5: rm -rf / is blocked by the security dispatcher."
(with-daemon ()
(multiple-value-bind (stream sock) (daemon-connect)
@@ -208,7 +217,7 @@ Verifies safe shell commands execute and dangerous patterns are blocked.
Verifies text input over TCP reaches the pipeline.
#+begin_src lisp
(test test-cli-gateway-input
(fiveam:test test-cli-gateway-input
"Contract 6: text via TCP produces a response."
(with-daemon ()
(multiple-value-bind (stream sock) (daemon-connect)
@@ -225,7 +234,7 @@ Verifies text input over TCP reaches the pipeline.
Verifies the gateway registry function is available after daemon start.
#+begin_src lisp
(test test-gateway-registry
(fiveam:test test-gateway-registry
"Contract 7: gateway-registry-initialize is available."
(with-daemon ()
(is (fboundp 'gateway-registry-initialize))
@@ -252,7 +261,7 @@ credentials. Skipped silently if OPENROUTER_API_KEY is unset.
(format t " [SKIP] ~a not set~%" ,env-var)
(skip "~a not set" ,env-var))))
(test test-provider-openai-request
(fiveam:test test-provider-openai-request
"Contract Phase2: provider-openai-request returns :success with valid API key."
(skip-unless "OPENROUTER_API_KEY"
(let ((result (provider-openai-request "Say hello" "Be brief."
@@ -262,12 +271,26 @@ credentials. Skipped silently if OPENROUTER_API_KEY is unset.
(eq (getf result :status) :error))
"Expected :success or :error, got: ~a" result))))
(test test-backend-cascade-real
(fiveam:test test-backend-cascade-real
"Contract Phase2: backend-cascade-call returns string content with real provider."
(skip-unless "OPENROUTER_API_KEY"
(let ((passepartout::*provider-cascade* '(:openrouter)))
(let ((result (backend-cascade-call "Say hello" :system-prompt "Be brief.")))
(is (stringp result) "Expected string response, got: ~a" result)))))
(fiveam:test test-provider-cascade-parsing
"Contract Phase2: PROVIDER_CASCADE env var parses to clean keywords matching backends."
(provider-cascade-initialize)
(let ((cascade passepartout::*provider-cascade*))
(is (listp cascade) "Cascade must be a list")
(is (>= (length cascade) 1) "Cascade must have at least one entry")
(dolist (entry cascade)
(is (keywordp entry) "Entry ~s must be a keyword" entry)
(let ((name (symbol-name entry)))
(is (not (find #\" name)) "Entry ~s must not contain double-quote" entry)
(is (not (find #\' name)) "Entry ~s must not contain single-quote" entry)))
(is (some (lambda (e) (gethash e passepartout::*probabilistic-backends*)) cascade)
"At least one cascade entry must match a registered backend")))
#+end_src
* Messaging Link/Unlink
@@ -277,7 +300,7 @@ returns the correct status, and messaging-unlink removes it. No real
API credentials needed — these are management functions.
#+begin_src lisp
(test test-messaging-link-unlink
(fiveam:test test-messaging-link-unlink
"Contract Phase2: messaging-link stores token, configured-p returns T, unlink removes it."
(with-daemon ()
(messaging-link :test-platform :token "fake-token-123")
@@ -287,12 +310,12 @@ API credentials needed — these are management functions.
(is (not (gateway-configured-p :test-platform))
"Expected test-platform to be unconfigured after unlinking")))
(test test-gateway-configured-p-false
(fiveam:test test-gateway-configured-p-false
"Contract Phase2: gateway-configured-p returns nil for unknown platform."
(with-daemon ()
(is (not (gateway-configured-p :nonexistent-platform-xyz)))))
(test test-gateway-start-messaging
(fiveam:test test-gateway-start-messaging
"Contract Phase2: gateway registry initializes with expected platforms."
(with-daemon ()
(gateway-registry-initialize)
@@ -333,29 +356,43 @@ run_test() {
}
# ---- Setup ----
echo "Pre-warming FASL cache (speeds up TUI start from ~120s to ~10s)..."
sbcl --noinform --load ~/quicklisp/setup.lisp \
--eval '(ql:quickload :passepartout/tui :silent t)' \
--eval '(uiop:quit)' 2>/dev/null &
WARM_PID=$!
wait $WARM_PID 2>/dev/null
echo " Pre-warm complete"
echo "Starting TUI in tmux (daemon must already be running on port 9105)..."
tmux new-session -d -s tui-test "passepartout tui 2>&1 | tee $TUI_LOG"
for i in $(seq 1 40); do
sleep 3
for i in $(seq 1 15); do
sleep 2
if tmux capture-pane -t tui-test -p 2>/dev/null | grep -q 'Connected v[0-9]'; then
echo " TUI ready after $((i*3))s"
echo " TUI ready after $((i*2))s"
break
fi
if [ "$i" -eq 40 ]; then
echo " WARNING: TUI did not render after 120s"
if [ "$i" -eq 15 ]; then
echo " WARNING: TUI did not render after 30s"
fi
done
# ---- Tests ----
test_agent_responds() {
# Full round-trip: TUI → daemon → pipeline → TUI.
# Uses tmux capture-pane to read the rendered screen.
# Full round-trip: TUI → daemon → LLM → daemon → TUI.
# Must contain a real agent response (⬇), NOT a cascade failure.
local before_ts
before_ts=$(date +%s)
tmux send-keys -t tui-test "hello" Enter
tmux send-keys -t tui-test "Say hello in one word" Enter
while true; do
if tmux capture-pane -t tui-test -p -S -50 2>/dev/null | grep -q '⬇.*[a-zA-Z]\{3,\}'; then
local pane
pane=$(tmux capture-pane -t tui-test -p -S -60 2>/dev/null)
if echo "$pane" | grep -q '⬇.*[a-zA-Z]\{3,\}'; then
if echo "$pane" | grep '⬇' | grep -qi 'cascade.*fail\|exhausted\|neural cascade'; then
echo "FAIL: agent responded with cascade failure, not LLM content" >&2
return 1
fi
return 0
fi
local now_ts
@@ -368,12 +405,15 @@ test_agent_responds() {
done
}
test_agent_not_cascade_failure() {
if tmux capture-pane -t tui-test -p -S -50 2>/dev/null | grep '⬇' | grep -qi 'cascade.*fail\|exhausted\|neural cascade'; then
echo "NOTE: LLM cascade failure — no API key configured (warning only)" >&2
WARN=$((WARN + 1))
fi
return 0
test_cascade_parsing() {
# Via /eval, check that *provider-cascade* contains clean keywords.
# This catches the cl-dotenv quote contamination bug.
tmux send-keys -t tui-test "/eval *provider-cascade*" Enter
sleep 3
local pane
pane=$(tmux capture-pane -t tui-test -p -S -15 2>/dev/null)
# Must contain keyword syntax :SOMETHING (not "SOMETHING with quotes)
echo "$pane" | grep -q ':DEEPSEEK\|:OPENROUTER\|:OPENAI\|:ANTHROPIC\|:GROQ\|:GEMINI\|:NVIDIA'
}
test_eval_command() {
@@ -393,7 +433,7 @@ test_connection_drop() {
}
run_test "agent-responds" test_agent_responds
run_test "agent-not-cascade-fail" test_agent_not_cascade_failure
run_test "cascade-parsing" test_cascade_parsing
run_test "eval-command" test_eval_command
run_test "status-bar" test_status_bar
run_test "connection-drop" test_connection_drop
@@ -409,7 +449,7 @@ exit $(( FAIL > 0 ? 1 : 0 ))
Verifies Flight Plan message format and Emacs daemon connectivity.
#+begin_src lisp
(test test-flight-plan-message-format
(fiveam:test test-flight-plan-message-format
"Contract Phase3: dispatcher-flight-plan-create returns valid message."
(with-daemon ()
(load (merge-pathnames ".local/share/passepartout/lisp/security-dispatcher.lisp"
@@ -424,7 +464,7 @@ Verifies Flight Plan message format and Emacs daemon connectivity.
(is (string= "PLAN" (getf attrs :TODO)))
(is (member "FLIGHT_PLAN" (getf attrs :TAGS) :test #'string-equal))))))
(test test-emacs-daemon-connect
(fiveam:test test-emacs-daemon-connect
"Contract Phase3: Emacs daemon is reachable via emacsclient."
(handler-case
(let ((result (uiop:run-program '("emacsclient" "--eval" "(+ 1 2)")
@@ -432,5 +472,5 @@ Verifies Flight Plan message format and Emacs daemon connectivity.
:ignore-error-status t)))
(is (search "3" result) "Expected '3' from emacsclient, got: ~a" result))
(error (c)
(skip "Emacs daemon not available: ~a" c))))
(skip "Emacs daemon not available: ~a" c)))))
#+end_src

View File

@@ -25,29 +25,43 @@ run_test() {
}
# ---- Setup ----
echo "Pre-warming FASL cache (speeds up TUI start from ~120s to ~10s)..."
sbcl --noinform --load ~/quicklisp/setup.lisp \
--eval '(ql:quickload :passepartout/tui :silent t)' \
--eval '(uiop:quit)' 2>/dev/null &
WARM_PID=$!
wait $WARM_PID 2>/dev/null
echo " Pre-warm complete"
echo "Starting TUI in tmux (daemon must already be running on port 9105)..."
tmux new-session -d -s tui-test "passepartout tui 2>&1 | tee $TUI_LOG"
for i in $(seq 1 40); do
sleep 3
for i in $(seq 1 15); do
sleep 2
if tmux capture-pane -t tui-test -p 2>/dev/null | grep -q 'Connected v[0-9]'; then
echo " TUI ready after $((i*3))s"
echo " TUI ready after $((i*2))s"
break
fi
if [ "$i" -eq 40 ]; then
echo " WARNING: TUI did not render after 120s"
if [ "$i" -eq 15 ]; then
echo " WARNING: TUI did not render after 30s"
fi
done
# ---- Tests ----
test_agent_responds() {
# Full round-trip: TUI → daemon → pipeline → TUI.
# Uses tmux capture-pane to read the rendered screen.
# Full round-trip: TUI → daemon → LLM → daemon → TUI.
# Must contain a real agent response (⬇), NOT a cascade failure.
local before_ts
before_ts=$(date +%s)
tmux send-keys -t tui-test "hello" Enter
tmux send-keys -t tui-test "Say hello in one word" Enter
while true; do
if tmux capture-pane -t tui-test -p -S -50 2>/dev/null | grep -q '⬇.*[a-zA-Z]\{3,\}'; then
local pane
pane=$(tmux capture-pane -t tui-test -p -S -60 2>/dev/null)
if echo "$pane" | grep -q '⬇.*[a-zA-Z]\{3,\}'; then
if echo "$pane" | grep '⬇' | grep -qi 'cascade.*fail\|exhausted\|neural cascade'; then
echo "FAIL: agent responded with cascade failure, not LLM content" >&2
return 1
fi
return 0
fi
local now_ts
@@ -60,12 +74,15 @@ test_agent_responds() {
done
}
test_agent_not_cascade_failure() {
if tmux capture-pane -t tui-test -p -S -50 2>/dev/null | grep '⬇' | grep -qi 'cascade.*fail\|exhausted\|neural cascade'; then
echo "NOTE: LLM cascade failure — no API key configured (warning only)" >&2
WARN=$((WARN + 1))
fi
return 0
test_cascade_parsing() {
# Via /eval, check that *provider-cascade* contains clean keywords.
# This catches the cl-dotenv quote contamination bug.
tmux send-keys -t tui-test "/eval *provider-cascade*" Enter
sleep 3
local pane
pane=$(tmux capture-pane -t tui-test -p -S -15 2>/dev/null)
# Must contain keyword syntax :SOMETHING (not "SOMETHING with quotes)
echo "$pane" | grep -q ':DEEPSEEK\|:OPENROUTER\|:OPENAI\|:ANTHROPIC\|:GROQ\|:GEMINI\|:NVIDIA'
}
test_eval_command() {
@@ -85,7 +102,7 @@ test_connection_drop() {
}
run_test "agent-responds" test_agent_responds
run_test "agent-not-cascade-fail" test_agent_not_cascade_failure
run_test "cascade-parsing" test_cascade_parsing
run_test "eval-command" test_eval_command
run_test "status-bar" test_status_bar
run_test "connection-drop" test_connection_drop