v0.7.2: Merkle provenance audit + RCE flake fix — TDD

audit-node exposes memory-object lineage (type, hash, scope, version). /audit <node-id> TUI command. /audit verify deferred. Fixed RCE test flake: assemble-config-section used getf on non-plist cascade entries. Wrapped in handler-case. Also fixed ~/ format directive escape. Core reason: 35/35. Core: 81/81.
2026-05-08 18:03:24 -04:00
parent df09ac321d
commit 11c43f76fa
7 changed files with 95 additions and 12 deletions
--- a/docs/ROADMAP.org
+++ b/docs/ROADMAP.org
@@ -2049,15 +2049,19 @@ The original roadmap placed MCP at v0.9.0 and planned "10+ cognitive tools" buil
 - Propose installation command and retry the failed action on user approval.
 - Cache resolved dependency paths to avoid repeated searches.

-*** v0.10.3 — TODO Voice Gateway
+*** TODO Channels + providers — match OpenClaw on demand
+:PROPERTIES:
+:ID:       id-v100-channels
+:CREATED:  [2026-05-08 Fri]
+:END:

-Rationale: OpenClaw ships voice wake words and talk mode on macOS/iOS/Android via ElevenLabs. Hermes Agent has voice memo transcription. Both treat voice as a first-class channel. Passepartout's daemon already handles text — voice is an I/O format conversion. Speech-to-text turns audio into ~:user-input~ signals. Text-to-speech turns agent responses into audio. The architecture requires no changes; the voice gateway is a skill that wraps existing REST APIs.
+The daemon protocol is client-agnostic hex-framed plists over TCP. Every new channel is a new client that speaks the same protocol. OpenClaw's 23+ channels are trivially copyable — each platform needs a poll loop + send function, ~30 lines each. LLM providers are a row in ~*provider-cascade*~ — a new entry in ~neuro-provider.lisp~ with API endpoint + token pricing. Neither deserves its own release.

- Speech-to-text: POST audio to OpenAI Whisper API (~/v1/audio/transcriptions~) or local Whisper via Ollama. Receive text. Inject as a ~:user-input~ signal into the pipeline. The daemon processes it identically to a typed message.
- Text-to-speech: POST text to ElevenLabs REST API (~/v1/text-to-speech/{voice-id}~) with stream response. Also support system ~say~ (macOS) / ~espeak~ (Linux) as zero-dependency fallbacks.
- TUI voice toggle: ~/voice on~ enables voice capture, shows a ~🎤~ (listening) indicator in the status bar. ~/voice off~ returns to text-only. The microphone capture runs in a dedicated thread that feeds audio chunks to the speech-to-text backend.
- Voice mode in messaging gateways: on Telegram and Discord, the voice gateway transcribes voice messages into text and injects them as ~:user-input~ signals. Agent responses can be optionally spoken back via text-to-speech if the user's message included a voice note (reply in kind).
- The voice gateway is a skill (~defskill~~:passepartout-gateway-voice~). No core daemon changes required. The daemon receives text signals whether they originated from a keyboard, a messaging app, or a microphone.
+- Channels: match OpenClaw's 23+ channels on demand. The Emacs bridge (already done, v0.4.0) proves the pattern. Each new platform (WhatsApp, iMessage, Matrix, IRC, etc.) is a skill that registers a poll-fn + send-fn. ~30 lines per channel.
+- Providers: match OpenClaw/Hermes on provider count. Adding a new provider is a table entry in ~neuro-provider.lisp~: name, API endpoint, model list, pricing. ~20 lines per provider.
+- Voice: STT + TTS are REST wrappers (~whisper~ / ~elevenlabs~ / ~espeak~). Already spec'd as a skill. ~50 lines.
+
+No separate releases. Done when needed, shipped when ready.

 *** TODO Web search + web fetch tools — ~search-web~, ~fetch-web~
 :PROPERTIES:
@@ -2157,7 +2161,7 @@ The Git policy gate (commit-before-modify) is a safety feature no competitor pro

 The TUI tool visualization (v0.8.1) extends seamlessly to MCP tools — the rendering layer doesn't distinguish between native tools and MCP tools. The same colored backgrounds, collapsible outputs, and gate traces apply universally.

-The voice gateway (v0.10.3) adds parity with OpenClaw's voice features without architectural changes — speech-to-text and text-to-speech are thin REST wrappers that feed text signals into the existing pipeline. Combined with the Emacs bridge (v0.4.0), messaging gateways (v0.4.0), and the now-SOTA TUI (v0.7.0–v0.8.3), Passepartout supports four interaction surfaces by v0.10.3: terminal (TUI), messaging apps, Emacs, and voice.
+The voice gateway and additional channels add parity with OpenClaw's multi-surface approach without architectural changes — every channel is a thin client speaking the same framed TCP protocol to the same daemon. Channels and providers are trivially copyable: each new platform is ~30 lines of poll-loop, each new provider is ~20 lines of API config. Passepartout matches OpenClaw's channel count on demand, shipping when needed rather than as a scheduled milestone.

 ** v0.11.0: Planning, Self-Modification & Deterministic Routing