v0.7.2: Merkle provenance audit + RCE flake fix — TDD

audit-node exposes memory-object lineage (type, hash, scope, version).
/audit <node-id> TUI command. /audit verify deferred.

Fixed RCE test flake: assemble-config-section used getf on
non-plist cascade entries. Wrapped in handler-case. Also fixed
~/ format directive escape. Core reason: 35/35. Core: 81/81.
This commit is contained in:
2026-05-08 18:03:24 -04:00
parent df09ac321d
commit 11c43f76fa
7 changed files with 95 additions and 12 deletions

View File

@@ -2049,15 +2049,19 @@ The original roadmap placed MCP at v0.9.0 and planned "10+ cognitive tools" buil
- Propose installation command and retry the failed action on user approval.
- Cache resolved dependency paths to avoid repeated searches.
*** v0.10.3 — TODO Voice Gateway
*** TODO Channels + providers — match OpenClaw on demand
:PROPERTIES:
:ID: id-v100-channels
:CREATED: [2026-05-08 Fri]
:END:
Rationale: OpenClaw ships voice wake words and talk mode on macOS/iOS/Android via ElevenLabs. Hermes Agent has voice memo transcription. Both treat voice as a first-class channel. Passepartout's daemon already handles text — voice is an I/O format conversion. Speech-to-text turns audio into ~:user-input~ signals. Text-to-speech turns agent responses into audio. The architecture requires no changes; the voice gateway is a skill that wraps existing REST APIs.
The daemon protocol is client-agnostic hex-framed plists over TCP. Every new channel is a new client that speaks the same protocol. OpenClaw's 23+ channels are trivially copyable — each platform needs a poll loop + send function, ~30 lines each. LLM providers are a row in ~*provider-cascade*~ — a new entry in ~neuro-provider.lisp~ with API endpoint + token pricing. Neither deserves its own release.
- Speech-to-text: POST audio to OpenAI Whisper API (~/v1/audio/transcriptions~) or local Whisper via Ollama. Receive text. Inject as a ~:user-input~ signal into the pipeline. The daemon processes it identically to a typed message.
- Text-to-speech: POST text to ElevenLabs REST API (~/v1/text-to-speech/{voice-id}~) with stream response. Also support system ~say~ (macOS) / ~espeak~ (Linux) as zero-dependency fallbacks.
- TUI voice toggle: ~/voice on~ enables voice capture, shows a ~🎤~ (listening) indicator in the status bar. ~/voice off~ returns to text-only. The microphone capture runs in a dedicated thread that feeds audio chunks to the speech-to-text backend.
- Voice mode in messaging gateways: on Telegram and Discord, the voice gateway transcribes voice messages into text and injects them as ~:user-input~ signals. Agent responses can be optionally spoken back via text-to-speech if the user's message included a voice note (reply in kind).
- The voice gateway is a skill (~defskill~~:passepartout-gateway-voice~). No core daemon changes required. The daemon receives text signals whether they originated from a keyboard, a messaging app, or a microphone.
- Channels: match OpenClaw's 23+ channels on demand. The Emacs bridge (already done, v0.4.0) proves the pattern. Each new platform (WhatsApp, iMessage, Matrix, IRC, etc.) is a skill that registers a poll-fn + send-fn. ~30 lines per channel.
- Providers: match OpenClaw/Hermes on provider count. Adding a new provider is a table entry in ~neuro-provider.lisp~: name, API endpoint, model list, pricing. ~20 lines per provider.
- Voice: STT + TTS are REST wrappers (~whisper~ / ~elevenlabs~ / ~espeak~). Already spec'd as a skill. ~50 lines.
No separate releases. Done when needed, shipped when ready.
*** TODO Web search + web fetch tools — ~search-web~, ~fetch-web~
:PROPERTIES:
@@ -2157,7 +2161,7 @@ The Git policy gate (commit-before-modify) is a safety feature no competitor pro
The TUI tool visualization (v0.8.1) extends seamlessly to MCP tools — the rendering layer doesn't distinguish between native tools and MCP tools. The same colored backgrounds, collapsible outputs, and gate traces apply universally.
The voice gateway (v0.10.3) adds parity with OpenClaw's voice features without architectural changes — speech-to-text and text-to-speech are thin REST wrappers that feed text signals into the existing pipeline. Combined with the Emacs bridge (v0.4.0), messaging gateways (v0.4.0), and the now-SOTA TUI (v0.7.0v0.8.3), Passepartout supports four interaction surfaces by v0.10.3: terminal (TUI), messaging apps, Emacs, and voice.
The voice gateway and additional channels add parity with OpenClaw's multi-surface approach without architectural changes — every channel is a thin client speaking the same framed TCP protocol to the same daemon. Channels and providers are trivially copyable: each new platform is ~30 lines of poll-loop, each new provider is ~20 lines of API config. Passepartout matches OpenClaw's channel count on demand, shipping when needed rather than as a scheduled milestone.
** v0.11.0: Planning, Self-Modification & Deterministic Routing