passepartout: v0.4.2 Structured Output

- json-alist-to-plist: JSON alist-to-keyword-plist converter (core-loop-reason) - provider-openai-request: accept :tools parameter, build tool definitions in request body, parse tool_calls from response (system-model-provider) - think(): build tools from cognitive-tool-registry, pass to backend cascade, handle :tool-calls response via json-alist-to-plist (core-loop-reason) - backend-cascade-call: accept and propagate :tools parameter - Diagnostics: remove nc/socat from required binaries — health check passes - Version: 0.4.0 -> 0.4.2 across handshake, ASDF, README badge
2026-05-07 17:39:08 -04:00
parent 639bc348d9
commit 791a0f9c3b
14 changed files with 476 additions and 79 deletions
--- a/docs/ROADMAP.org
+++ b/docs/ROADMAP.org
@@ -23,9 +23,9 @@ Feature releases increment the minor version (v0.X.0). Bugfix and hardening rele
 When a version's state changes (DONE → tested → released), update these locations:

 1. ~ROADMAP.org~ — mark item DONE, update LOGBOOK timestamp
-2. ~README.org~ — update Current Capabilities table (add new Stable rows for shipped features, remove Planned rows that have shipped)
+2. ~README.org~ — update version badge (line 6), update Current Capabilities table (add new Stable rows for shipped features, remove Planned rows that have shipped)
 3. ~~.env.example~ — update version references as needed
-4. ~lisp/core-communication.lisp~ — update the ~make-hello-message~ version string (current: ~"0.2.0"~)
+4. ~lisp/core-transport.lisp~ — update the ~make-hello-message~ version string
 5. ~passepartout~ (bash entry point) — update version reference

 On release:
@@ -656,11 +656,14 @@ Rationale: Currently, several configurable values are hardcoded in source: the D

 The current ~think()~ function asks the LLM to produce raw S-expression plists. Four pieces of defensive infrastructure (~handler-case~ around ~read-from-string~, ~markdown-strip~, ~plist-keywords-normalize~, the RCE guard test) exist because LLMs cannot reliably produce balanced, keyword-prefixed plists. The fix: use the LLM API's native function calling / tool-use feature. The LLM always returns guaranteed-valid JSON. Convert to plist deterministically at the boundary.

-*** TODO Implement function-calling / tool-use API in provider requests
+*** DONE Implement function-calling / tool-use API in provider requests
 :PROPERTIES:
 :ID:       id-v042-function-calling
 :CREATED:  [2026-05-07 Thu]
 :END:
+:LOGBOOK:
+- State "DONE" from "TODO" [2026-05-07 Thu 17:17]
+:END:

 Rationale: Every major provider API (OpenAI, Anthropic, Groq, DeepSeek, OpenRouter) supports function calling. The LLM is sent tool definitions as JSON Schema. It returns ~tool_calls~ with guaranteed-valid JSON arguments. This eliminates the fragile ~read-from-string~ plist parsing entirely — the probabilistic layer speaks JSON (what it was trained on), the deterministic layer speaks plists (what the code controls). Conversion happens at a narrow, well-defined boundary.

@@ -670,15 +673,18 @@ Rationale: Every major provider API (OpenAI, Anthropic, Groq, DeepSeek, OpenRout
 - For providers that don't support function calling (local Ollama): keep ~:content~ path as fallback. LLM can still return raw text.
 - FiveAM test: send a request with a mock tool definition, verify the response shape.

-*** TODO Wire structured tool calls into ~think()~ — JSON→plist at boundary
+*** DONE Wire structured tool calls into ~think()~ — JSON→plist at boundary
 :PROPERTIES:
 :ID:       id-v042-wire-tool-calls
 :CREATED:  [2026-05-07 Thu]
 :END:
+:LOGBOOK:
+- State "DONE" from "TODO" [2026-05-07 Thu 17:17]
+:END:

 Rationale: Once the provider layer returns structured ~tool-calls~, the ~think()~ function must convert them to the internal plist format that ~cognitive-verify~ and ~loop-gate-act~ expect. This is a one-way, deterministic conversion at the architectural boundary.

- Add ~json-alist-to-plist~ helper in ~core-loop-reason.lisp~ or ~core-utils.lisp~: convert JSON alist (from ~cl-json:decode-json-from-string~) to keyword-prefixed plist. String keys → keywords. Nested objects recurse. JSON null → ~nil~. ~25 lines.
+- Add ~json-alist-to-plist~ helper in ~core-loop-reason.lisp~: convert JSON alist (from ~cl-json:decode-json-from-string~) to keyword-prefixed plist. String keys → keywords. Nested objects recurse. JSON null → ~nil~. ~25 lines.
 - In ~think()~ after ~backend-cascade-call~: if result contains ~:tool-calls~, convert each tool call's ~:arguments~ JSON to plist via ~json-alist-to-plist~, wrap in ~(:TYPE :REQUEST :PAYLOAD (:TOOL <name> :ARGS <plist> :EXPLANATION "..."))~.
 - Keep the existing ~read-from-string~ path as fallback for providers that return raw text (local Ollama, streaming).
 - The ~read-from-string~ path remains guarded by ~*read-eval* nil~ from v0.3.1.
@@ -719,11 +725,160 @@ Rationale: The current shell safety check treats all dangerous patterns equally
 - The severity classification is the foundation that ~dispatcher-learn~ (v0.5.0) builds on — learning only applies to ~:dangerous~ and ~:moderate~ tiers.
 - FiveAM test: ~echo hello~ returns ~:harmless~ severity and passes through; ~mkfs.ext4 /dev/sda~ returns ~:catastrophic~ and is always blocked.

-** v0.5.0: Token Economics & Prompt Efficiency
+** v0.5.0: File Reorganization & Token Economics

-The architecture's single largest gap versus SOTA: Passepartout currently spends tokens like a research prototype. Every ~think()~ call rebuilds and retransmits the full system prompt — IDENTITY + TOOLS + CONTEXT + LOGS — with no caching, no budget, and no incremental assembly. The foveal-peripheral model prunes memory content but doesn't touch the fixed overhead of IDENTITY, TOOLS, and LOGS sections, which together dominate the system prompt size. Standing mandates (~*standing-mandates*~) contribute negligible overhead (~40 tokens when the single active mandate fires).
+The foundation work: rename and restructure the codebase around the self-repair criterion, extract non-core fragments from core, then build the learning loop on clean foundations.

-Competitors (Claude Code, OpenClaw, Copilot) all implement some form of prefix caching — Anthropic's API gives 90% discount on cached tokens, OpenAI caches automatically. Passepartout's prompt structure is already naturally cacheable: IDENTITY, TOOLS, and LOGS format are static across calls. This version turns that structural property into a cost advantage.
+*** File Reorganization — self-repair criterion
+
+Rationale: The current file naming scheme mixes three concerns: architectural role (core-* = harness, system-* = skill), domain (security-*, programming-*, gateway-*), and implementation nature (system-model-* is LLM infrastructure, not a "system"). Worse, two fragments that can be extracted from core (context assembly, heartbeat) currently live there because the criterion for "what is core" was never defined. This reorganization establishes the criterion and applies it.
+
+The criterion: a file belongs in core if, when corrupted, the agent cannot fix it without human help. Corrupted core = dead brain, dead hands, or unreachable. Corrupted skill = degraded but self-repairable.
+
+*** TODO Extract core-context → symbolic-awareness
+:PROPERTIES:
+:ID:       id-v050-reorg-awareness
+:CREATED:  [2026-05-07 Thu]
+:END:
+
+Rationale: ~core-context.lisp~ (224 lines) handles ~context-assemble-global-awareness~, ~context-object-render~, ~context-query~, and related functions. If corrupted, the LLM receives empty awareness. But the agent still has tools, identity, and user input. It can reason about "no awareness", edit the context source file, reload the skill, and awareness returns. Degraded, not dead. Safe to extract.
+
+- Move ~core-context.lisp~ content to new ~symbolic-awareness.lisp~ (new ~org/symbolic-awareness.org~).
+- Register as a skill via ~defskill :passepartout-symbolic-awareness~.
+- In ~core-reason.lisp~'s ~think()~: wrap ~context-assemble-global-awareness~ and ~context-get-system-logs~ calls with ~fboundp~ guards. On skill failure, inject degraded awareness note.
+- Remove ~core-context~ from ~passepartout.asd~ ~:components~.
+- FiveAM: verify ~think()~ produces valid output when awareness skill is not loaded.
+
+*** TODO Extract heartbeat generation → symbolic-events
+:PROPERTIES:
+:ID:       id-v050-reorg-heartbeat
+:CREATED:  [2026-05-07 Thu]
+:END:
+
+Rationale: The heartbeat thread (~heartbeat-start~, ~*heartbeat-thread*~, auto-save counter) lives in ~core-loop.lisp~ (~50 lines). If heartbeat is corrupted or missing, the agent has no background ticks — no cron jobs, no auto-save. But the agent is fully functional: it perceives, reasons, and acts. It can detect missing ticks, reload the events skill, and heartbeat returns. Safe to extract.
+
+- Move heartbeat generation (~heartbeat-start~, ~*heartbeat-thread*~, ~*heartbeat-save-counter*~, ~*memory-auto-save-interval*~) from ~core-pipeline.lisp~ to ~symbolic-events.lisp~.
+- Rename ~heartbeat-start~ → ~events-start-heartbeat~.
+- In ~core-pipeline.lisp~'s ~main()~: change ~(heartbeat-start)~ to ~(when (fboundp 'events-start-heartbeat) (events-start-heartbeat))~.
+- ~symbolic-events~ already processes ~:heartbeat~ signals for cron dispatch (existing code). Now it also generates them.
+
+*** TODO Relocate 6 utility fragments to correct files
+:PROPERTIES:
+:ID:       id-v050-reorg-utilities
+:CREATED:  [2026-05-07 Thu]
+:END:
+
+Rationale: Several functions live in core files not because they need core protection but because they were written there first. They are utility functions that can be extracted into skills.
+
+- ~markdown-strip~ (core-reason.lisp:51) → new ~programming-markdown.lisp~ (~org/programming-markdown.org~).
+- ~plist-keywords-normalize~ (core-reason.lisp:60) → ~programming-lisp.lisp~.
+- ~cognitive-tool-prompt~ / ~generate-tool-belt-prompt~ (core-defpackage.lisp:214-231) → ~programming-tools.lisp~.
+- ~lisp-syntax-validate~ (core-skills.lisp) → ~programming-lisp.lisp~.
+- ~VAULT-MASK-STRING~ + ~*VAULT-MEMORY*~ (core-skills.lisp) → ~security-vault.lisp~.
+- ~*backend-registry*~ dedup: merge with ~*probabilistic-backends*~ (core-reason.lisp:10-12), remove ~backend-register~ (core-reason.lisp:18-19), update ~backend-cascade-call~ to check only one hash table.
+
+*** TODO Rename 6 core files — shorter, clearer names
+:PROPERTIES:
+:ID:       id-v050-reorg-core-names
+:CREATED:  [2026-05-07 Thu]
+:END:
+
+Rename mapping:
+- ~core-defpackage~ → ~core-package~
+- ~core-communication~ → ~core-transport~
+- ~core-loop~ → ~core-pipeline~
+- ~core-loop-perceive~ → ~core-perceive~
+- ~core-loop-reason~ → ~core-reason~
+- ~core-loop-act~ → ~core-act~
+
+Update: ASDF ~:components~, all ~:tangle~ headers in ~.org~ files, cross-file references, ~README.org~, ~ARCHITECTURE.org~, ~AGENTS.md~, ~*dispatcher-protected-paths*~ (wildcard ~core-*~ still matches — no change needed).
+
+*** TODO Rename 13 system-* → symbolic-/neuro-/embedding-*
+:PROPERTIES:
+:ID:       id-v050-reorg-system-names
+:CREATED:  [2026-05-07 Thu]
+:END:
+
+Rename mapping:
+- ~system-config~ → ~symbolic-config~
+- ~system-diagnostics~ → ~symbolic-diagnostics~
+- ~system-archivist~ → ~symbolic-archivist~
+- ~system-event-orchestrator~ → ~symbolic-events~
+- ~system-self-improve~ → ~symbolic-self-improve~
+- ~system-context-manager~ → ~symbolic-scope~
+- ~system-memory~ → ~symbolic-memory~
+- ~system-model-provider~ → ~neuro-provider~
+- ~system-model-router~ → ~neuro-router~
+- ~system-model-explorer~ → ~neuro-explorer~
+- ~system-model-embedding~ → ~embedding-backends~
+- ~system-model-embedding-native~ → ~embedding-native~
+- ~system-actuator-shell~ → ~channel-shell~
+
+*** TODO Delete ~system-model.lisp~ (16-line wrapper)
+
+The file delegates to ~*probabilistic-backends*~ — dead code. No skill references it directly.
+
+*** TODO Rename 4 gateway-* → channel-*
+:PROPERTIES:
+:ID:       id-v050-reorg-channel-names
+:CREATED:  [2026-05-07 Thu]
+:END:
+
+Rename mapping:
+- ~gateway-cli~ → ~channel-cli~
+- ~gateway-tui-main~ → ~channel-tui-main~
+- ~gateway-tui-model~ → ~channel-tui-state~
+- ~gateway-tui-view~ → ~channel-tui-view~
+
+Update TUI package name: ~passepartout.gateway-tui~ → ~passepartout.channel-tui~.
+
+*** TODO Split ~gateway-messaging~ → 4 ~channel-*~ files
+:PROPERTIES:
+:ID:       id-v050-reorg-messaging-split
+:CREATED:  [2026-05-07 Thu]
+:END:
+
+Rationale: ~gateway-messaging.lisp~ (411 lines) bundles 4 independent platforms. A Telegram fix shouldn't touch Signal/Discord/Slack code. Each platform becomes its own skill — independently loadable, hot-reloadable, self-repairable.
+
+- ~channel-telegram~: poll + send via Telegram Bot API. ~register-actuator :telegram~.
+- ~channel-signal~: poll + send via ~signal-cli~ subprocess. ~register-actuator :signal~.
+- ~channel-discord~: WebSocket events + REST POST. Replace hardcoded channel IDs with env vars. ~register-actuator :discord~.
+- ~channel-slack~: Events API + ~chat.postMessage~. Replace hardcoded channel IDs. ~register-actuator :slack~.
+- Delete ~gateway-messaging.lisp~. Update ~DEFSKILL-FROM-ORG~ references in ~system-config~ setup wizard.
+
+*** TODO Document core/non-core self-repair criterion
+:PROPERTIES:
+:ID:       id-v050-reorg-docs
+:CREATED:  [2026-05-07 Thu]
+:END:
+
+Rationale: The criterion is the architectural foundation for every discussion about "should this be core or a skill?" It must be documented where developers look.
+
+- New section in ~docs/ARCHITECTURE.org~: "What Makes Core Different — The Self-Repair Criterion." Explain: core = can't self-repair when corrupted, needs human. Skill = agent degrades but self-repairs.
+- Include the dependency-chain analysis: which files block self-repair.
+- New section in ~docs/DESIGN_DECISIONS.org~: "The Self-Repair Criterion for Core Files." Explain why ~core-context~ and heartbeat were extracted.
+- Update ~README.org~ architecture summary to reflect new file map.
+
+*** TODO Update all cross-references after reorg
+:PROPERTIES:
+:ID:       id-v050-reorg-crossref
+:CREATED:  [2026-05-07 Thu]
+:END:
+
+After all renames complete, update every remaining reference:
+- ~passepartout.asd~: remove ~core-context~, rename 6 core entries.
+- All ~#+PROPERTY: header-args:lisp :tangle ../lisp/<old>.lisp~ lines in ~.org~ files.
+- All ~in-package~ / ~find-package~ / ~fboundp~ references to renamed packages.
+- ~skill-initialize-all~ / ~context-skill-source~: resolve org files under new names.
+- ~README.org~: Current Capabilities table, pipeline description, file references.
+- ~ARCHITECTURE.org~: layer tables, pipeline flow, dispatcher gate stack.
+- ~AGENTS.md~: Project Structure section, file path references.
+- ~.env.example~: remove stale ~SAFETY_BLOCK_SHELL~ (unused), update skill paths if any.
+- ~ROADMAP.org~: update v0.4.2 and v0.4.3 TODOs (system-model-provider → neuro-provider, core-loop-reason → core-reason, system-actuator-shell → channel-shell) to match new names.
+
+*** Verify: ASDF compiles, FiveAM suite passes, integration tests pass.
+
+*** Token Economics (foundation complete — now build features)

 **Design insight: why token economics is the structural differentiator.** Passepartout's sparse-tree rendering and deterministic safety gates should produce 2–3x fewer tokens than competitors for equivalent coding tasks, and 13–24x fewer for knowledge management. But without caching and budget enforcement, the fixed overhead per call eats these savings. A coding session that touches 30 files with competent context management costs ~72K tokens (Passepartout) versus ~185K (Claude Code). Without caching, the Passepartout number climbs toward ~150K because every call retransmits the static prefix. The architectural advantage exists in theory but requires operational plumbing to materialize.

@@ -902,7 +1057,7 @@ Rationale: The Merkle tree provides content-addressed storage. Combined with emb
 - ~memory-find-similar~ in ~core-memory.lisp~: given a vector, return N memory objects with highest cosine similarity. Uses ~memory-object-vector~ (already populated via ~ingest-ast~ → ~embeddings-compute~ since v0.4.0). ~30 lines.
 - ~memory-outcome-record~: store an outcome (success/failure plist) against a signal. Keyed by Merkle hash of the signal. ~25 lines.
 - ~memory-find-outcomes~: given a signal (current context), find similar past signals and their outcomes. Uses ~memory-find-similar~ on the signal's foveal vector. Returns ranked list of past approaches with success/failure labels. ~40 lines.
- Outcome data feeds into ~context-assemble-global-awareness~: when the foveal node has similar past interactions, include them in the context as "Historical: last 3 times you asked this, approach X succeeded, Y failed."
+- Outcome data feeds into ~symbolic-awareness~ (formerly core-context, extracted from core): when the foveal node has similar past interactions, include them in the context as "Historical: last 3 times you asked this, approach X succeeded, Y failed."
 - FiveAM test: record 3 outcomes for similar signals, verify ~memory-find-outcomes~ returns them ranked by similarity.

 *** TODO Merkle learning documentation in Design Decisions
@@ -926,7 +1081,7 @@ Rationale: The Merkle tree was designed for integrity, not learning. Its second

 Rationale: Without an evaluation harness, there is no way to know if the agent's capabilities improve or regress across releases. SWE-bench (v0.9.0) measures competitive ranking against other agents. The internal suite measures regression detection — it catches when v0.5.1 breaks something v0.5.0 could do. The suite starts with 10 tasks and grows with the codebase.

- New skill: ~system-evaluation.org~ (~system-evaluation.lisp~).
+- New skill: ~symbolic-evaluation.org~ (~symbolic-evaluation.lisp~).
 - ~deftask~ macro: define an eval task with ~:setup~ (create test environment), ~:prompt~ (what to ask the agent), ~:verify~ (function that checks the output), ~:teardown~ (cleanup). Similar to ~defskill~ but for agent capabilities, not code.
 - ~run-eval-task~: inject ~:prompt~ as ~:user-input~ signal via ~stimulus-inject~, wait for completion (poll ~*memory-store*~ or signal status), run ~:verify~ on the result, return ~(:passed)~ or ~(:failed :reason ...)~.
 - ~run-eval-suite~: run all registered eval tasks, produce score (pass count / total), per-task diagnostics, summary.