From e3a6573542b5649e45ac735cba8c8ee5925278c7 Mon Sep 17 00:00:00 2001 From: Amr Gharbeia Date: Fri, 8 May 2026 17:06:16 -0400 Subject: [PATCH] =?UTF-8?q?v0.7.2:=20self-help=20(/why)=20+=20CONFIG=20inj?= =?UTF-8?q?ection=20=E2=80=94=20TDD?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - CONFIG section in system prompt: providers, context window, gate count, rules learned, docs path - /why TUI command: shows most recent gate trace from message history - assemble-config-section reads live state at each think() call - Core: 75/76 TUI Main: 77/78 (1 pre-existing RCE test flake) --- docs/ROADMAP.org | 33 +++++++++++++++++++++++++- lisp/channel-tui-main.lisp | 48 ++++++++++++++++++++++++++++++++++++-- lisp/core-reason.lisp | 2 +- org/channel-tui-main.org | 48 ++++++++++++++++++++++++++++++++++++-- org/core-reason.org | 2 +- 5 files changed, 126 insertions(+), 7 deletions(-) diff --git a/docs/ROADMAP.org b/docs/ROADMAP.org index 9e44722..23dd4cc 100644 --- a/docs/ROADMAP.org +++ b/docs/ROADMAP.org @@ -1273,7 +1273,8 @@ Gate trace data format (already in messages): ~(:gate-trace ((:gate "dispatcher- - Send structured approval/denial message to daemon: ~(:type :event :payload (:action :hitl-respond :token "HITL-abcd" :decision :approved))~ - Render HITL prompts as styled inline panels with colored border (permission theme color), showing the action, explanation, and available choices ("Allow (Enter)" / "Deny (Esc)") - After approval/denial, collapse the prompt panel and add a system message: "✓ Approved: shell command" or "✗ Denied: shell command" -~40 lines. +- Clarifying-question escalation: when the same action has been blocked twice and retried (2 rejections in the 3-retry loop), the third attempt injects a /clarify prompt with targeted discriminating options instead of a generic rejection. Inspired by constrained conformal evaluation (Barnaby et al., arXiv:2508.15750v1): "This command touches ~/memex/ and /etc/. Is the /etc/ path intended? [1] Intended [2] Accidental [3] Cancel." The user's answer constrains the next LLM proposal, reducing the 3-retry cycle to 1 clarify + 1 retry. ~1.1x token multiplier vs current ~1.39x. +~60 lines. *** TODO Message search (/search or Ctrl+F) :PROPERTIES: @@ -2006,6 +2007,19 @@ Telemetry data (v0.9.0) plus the agent's self-knowledge enables coaching: the ag - Coaching data sources: command frequency, HITL approval patterns, context usage history, feature adoption rate, telemetry aggregates - Coaching is opt-in (privacy-respecting — no data leaves the machine). ~50 lines in telemetry skill + ~30 lines TUI rendering. +*** TODO Failure attribution — tag task failures with probable component +:PROPERTIES: +:ID: id-v090-failure-attribution +:CREATED: [2026-05-08 Fri] +:END: + +AHE (arXiv:2604.25850v2) shows that evolution loops work when failures are attributed to specific harness components, not just "the task failed." Passepartout's telemetry records task outcomes but doesn't classify failures by root cause. + +- In telemetry skill: when a session ends with a task failure (agent couldn't complete, user interrupted with denial, or dispatcher blocked irrecoverably), the telemeter classifies the failure as one of: ~:tool-failure~ (tool timeout, tool error), ~:gate-overblock~ (dispatcher blocked a necessary command), ~:gate-underblock~ (dispatcher allowed a harmful command), ~:reasoning-error~ (LLM produced a wrong answer), ~:context-overflow~ (context budget exhausted), ~:timeout~ (session timeout) +- Classification is deterministic: if last action was blocked by dispatcher → gate-overblock. If last action was a tool error → tool-failure. If last action was a successful tool call but wrong output → reasoning-error. +- Feeds the Skill Creator (v0.11.0) — the agent knows *which* component to fix, not just *that* something went wrong +~20 lines in telemetry skill. + ** v0.10.0: Tool Ecosystem (MCP-Native) + Voice Gateway *(Renumbered from old v0.8.0.)* @@ -2171,6 +2185,23 @@ The voice gateway (v0.10.3) adds parity with OpenClaw's voice features without a - Required ~:repl-verified~ flag on all ~defun~ forms — the existing Dispatcher lint check warns on writes without verification. The Skill Creator enforces this at creation time. - Skills are the primary extension mechanism for users. The Skill Creator makes skill authoring accessible to non-Lisp-programmers: describe what you want in English, the LLM drafts the Org file, the system verifies it, and the skill is live. +*** TODO Change manifest — skills ship with falsifiable predictions +:PROPERTIES: +:ID: id-v110-change-manifest +:CREATED: [2026-05-08 Fri] +:END: + +AHE (arXiv:2604.25850v2) shows that harness edits work better when each edit ships with a self-declared prediction, verified by next-round outcomes. Passepartout's Skill Creator should do the same — every new or modified skill carries predictions that telemetry verifies. + +- When the Skill Creator generates a skill, it also generates a ~#+PREDICTION:~ block in the Org frontmatter: + - ~#+PREDICTION: reduces token usage by 15% for code-generation tasks~ + - ~#+PREDICTION: may increase HITL prompts for shell commands outside workspace~ + - ~#+PREDICTION: should improve success rate on refactoring tasks~ +- Over the next 10 sessions, telemetry compares actual outcomes against predictions. The verification result is appended to the skill file: ~#+VERIFIED: Y token change: -18% (predicted -15%) on 2026-06-01~ +- Disproven predictions flag the skill for review: ~#+DISPROVEN: token usage increased +3% on code tasks (predicted -15%). Skill scheduled for revision.~ +- The change manifest persists in the skill's Org file — every skill carries its own evidence ledger. Users can see which skills worked as predicted and which didn't. +~40 lines in Skill Creator + telemetry integration. + *** Competitive Advantage Analysis — v0.11.0 Summary The task tree DAG with terminal states and branch pruning is Passepartout's planning primitive — analogous to Claude Code's TODO list but structural (Org headlines with parent-child relationships) rather than flat. diff --git a/lisp/channel-tui-main.lisp b/lisp/channel-tui-main.lisp index 4e063a9..e0a20f8 100644 --- a/lisp/channel-tui-main.lisp +++ b/lisp/channel-tui-main.lisp @@ -118,9 +118,28 @@ (list :action :hitl-respond :token token :decision :denied))) (add-msg :system (format nil "Denied: ~a" token)))) ;; /help command + ;; /why command — show last gate trace + ((string-equal text "/why") + (let ((msgs (st :messages)) + (found nil)) + (loop for i from (1- (length msgs)) downto 0 + for m = (aref msgs i) + for gt = (getf m :gate-trace) + when (and gt (listp gt) (> (length gt) 0)) + do (setf found t) + (dolist (entry gt) + (let* ((gate (getf entry :gate)) + (result (getf entry :result)) + (reason (getf entry :reason)) + (msg (format nil "~a ~a~@[ — ~a~]" + (case result (:passed "[PASS]") (:blocked "[BLOCKED]") (:approval "[HITL]")) + (or gate "unknown") + reason))) + (add-msg :system msg))) + (loop-finish)) + (unless found + (add-msg :system "No recent gate trace. Run a tool to see gate decisions.")))) ((string-equal text "/help") - (add-msg :system - "/eval Evaluate Lisp expression") (add-msg :system "/focus Set project context") (add-msg :system @@ -826,3 +845,28 @@ (let ((m (aref (st :messages) 0))) (fiveam:is (eq :system (getf m :role))) (fiveam:is (search "Redo" (getf m :content))))) + +;; ── v0.7.2 Self-help ── + +(fiveam:test test-why-command + "Contract v0.7.2: /why shows gate trace from last message." + (init-state) + (add-msg :agent "did something" :gate-trace '((:gate "shell" :result :blocked :reason "rm -rf"))) + (dolist (ch (coerce "/why" 'list)) + (on-key (char-code ch))) + (on-key 13) + (let* ((msgs (st :messages)) + (m (aref msgs (1- (length msgs))))) + (fiveam:is (eq :system (getf m :role))) + (fiveam:is (search "[BLOCKED]" (getf m :content))) + (fiveam:is (search "shell" (getf m :content))))) + +(fiveam:test test-why-no-trace + "Contract v0.7.2: /why with no gate trace shows fallback message." + (init-state) + (dolist (ch (coerce "/why" 'list)) + (on-key (char-code ch))) + (on-key 13) + (let* ((msgs (st :messages)) + (m (aref msgs (1- (length msgs))))) + (fiveam:is (search "No recent" (getf m :content))))) diff --git a/lisp/core-reason.lisp b/lisp/core-reason.lisp index ca7163a..eec17d5 100644 --- a/lisp/core-reason.lisp +++ b/lisp/core-reason.lisp @@ -88,7 +88,7 @@ (symbol-value '*provider-cascade*))))) (when (boundp '*hitl-pending*) (setf rules-count (hash-table-count (symbol-value '*hitl-pending*)))) - (format nil "CONFIG: You are Passepartout v0.7.2. Provider: ~a. Context: ~d tokens. Security gates: ~d active. Rules learned: ~d." + (format nil "CONFIG: You are Passepartout v0.7.2. Provider: ~a. Context: ~d tokens. Security gates: ~d active. Rules learned: ~d. Documentation: ~/memex/projects/passepartout/docs/USER_MANUAL.org." (if (string= provider-names "") "default" provider-names) context-window gate-count rules-count))) diff --git a/org/channel-tui-main.org b/org/channel-tui-main.org index 7c9de17..f3cf9e3 100644 --- a/org/channel-tui-main.org +++ b/org/channel-tui-main.org @@ -152,9 +152,28 @@ Event handlers + daemon I/O + main loop. (list :action :hitl-respond :token token :decision :denied))) (add-msg :system (format nil "Denied: ~a" token)))) ;; /help command + ;; /why command — show last gate trace + ((string-equal text "/why") + (let ((msgs (st :messages)) + (found nil)) + (loop for i from (1- (length msgs)) downto 0 + for m = (aref msgs i) + for gt = (getf m :gate-trace) + when (and gt (listp gt) (> (length gt) 0)) + do (setf found t) + (dolist (entry gt) + (let* ((gate (getf entry :gate)) + (result (getf entry :result)) + (reason (getf entry :reason)) + (msg (format nil "~a ~a~@[ — ~a~]" + (case result (:passed "[PASS]") (:blocked "[BLOCKED]") (:approval "[HITL]")) + (or gate "unknown") + reason))) + (add-msg :system msg))) + (loop-finish)) + (unless found + (add-msg :system "No recent gate trace. Run a tool to see gate decisions.")))) ((string-equal text "/help") - (add-msg :system - "/eval Evaluate Lisp expression") (add-msg :system "/focus Set project context") (add-msg :system @@ -873,4 +892,29 @@ Event handlers + daemon I/O + main loop. (let ((m (aref (st :messages) 0))) (fiveam:is (eq :system (getf m :role))) (fiveam:is (search "Redo" (getf m :content))))) + +;; ── v0.7.2 Self-help ── + +(fiveam:test test-why-command + "Contract v0.7.2: /why shows gate trace from last message." + (init-state) + (add-msg :agent "did something" :gate-trace '((:gate "shell" :result :blocked :reason "rm -rf"))) + (dolist (ch (coerce "/why" 'list)) + (on-key (char-code ch))) + (on-key 13) + (let* ((msgs (st :messages)) + (m (aref msgs (1- (length msgs))))) + (fiveam:is (eq :system (getf m :role))) + (fiveam:is (search "[BLOCKED]" (getf m :content))) + (fiveam:is (search "shell" (getf m :content))))) + +(fiveam:test test-why-no-trace + "Contract v0.7.2: /why with no gate trace shows fallback message." + (init-state) + (dolist (ch (coerce "/why" 'list)) + (on-key (char-code ch))) + (on-key 13) + (let* ((msgs (st :messages)) + (m (aref msgs (1- (length msgs))))) + (fiveam:is (search "No recent" (getf m :content))))) #+end_src diff --git a/org/core-reason.org b/org/core-reason.org index a46ec89..149a8d8 100644 --- a/org/core-reason.org +++ b/org/core-reason.org @@ -243,7 +243,7 @@ each cascade call via ~cost-track-backend-call~. All four calls are (symbol-value '*provider-cascade*))))) (when (boundp '*hitl-pending*) (setf rules-count (hash-table-count (symbol-value '*hitl-pending*)))) - (format nil "CONFIG: You are Passepartout v0.7.2. Provider: ~a. Context: ~d tokens. Security gates: ~d active. Rules learned: ~d." + (format nil "CONFIG: You are Passepartout v0.7.2. Provider: ~a. Context: ~d tokens. Security gates: ~d active. Rules learned: ~d. Documentation: ~/memex/projects/passepartout/docs/USER_MANUAL.org." (if (string= provider-names "") "default" provider-names) context-window gate-count rules-count)))