v0.7.2: Merkle provenance audit + RCE flake fix — TDD
audit-node exposes memory-object lineage (type, hash, scope, version). /audit <node-id> TUI command. /audit verify deferred. Fixed RCE test flake: assemble-config-section used getf on non-plist cascade entries. Wrapped in handler-case. Also fixed ~/ format directive escape. Core reason: 35/35. Core: 81/81.
This commit is contained in:
@@ -2049,15 +2049,19 @@ The original roadmap placed MCP at v0.9.0 and planned "10+ cognitive tools" buil
|
||||
- Propose installation command and retry the failed action on user approval.
|
||||
- Cache resolved dependency paths to avoid repeated searches.
|
||||
|
||||
*** v0.10.3 — TODO Voice Gateway
|
||||
*** TODO Channels + providers — match OpenClaw on demand
|
||||
:PROPERTIES:
|
||||
:ID: id-v100-channels
|
||||
:CREATED: [2026-05-08 Fri]
|
||||
:END:
|
||||
|
||||
Rationale: OpenClaw ships voice wake words and talk mode on macOS/iOS/Android via ElevenLabs. Hermes Agent has voice memo transcription. Both treat voice as a first-class channel. Passepartout's daemon already handles text — voice is an I/O format conversion. Speech-to-text turns audio into ~:user-input~ signals. Text-to-speech turns agent responses into audio. The architecture requires no changes; the voice gateway is a skill that wraps existing REST APIs.
|
||||
The daemon protocol is client-agnostic hex-framed plists over TCP. Every new channel is a new client that speaks the same protocol. OpenClaw's 23+ channels are trivially copyable — each platform needs a poll loop + send function, ~30 lines each. LLM providers are a row in ~*provider-cascade*~ — a new entry in ~neuro-provider.lisp~ with API endpoint + token pricing. Neither deserves its own release.
|
||||
|
||||
- Speech-to-text: POST audio to OpenAI Whisper API (~/v1/audio/transcriptions~) or local Whisper via Ollama. Receive text. Inject as a ~:user-input~ signal into the pipeline. The daemon processes it identically to a typed message.
|
||||
- Text-to-speech: POST text to ElevenLabs REST API (~/v1/text-to-speech/{voice-id}~) with stream response. Also support system ~say~ (macOS) / ~espeak~ (Linux) as zero-dependency fallbacks.
|
||||
- TUI voice toggle: ~/voice on~ enables voice capture, shows a ~🎤~ (listening) indicator in the status bar. ~/voice off~ returns to text-only. The microphone capture runs in a dedicated thread that feeds audio chunks to the speech-to-text backend.
|
||||
- Voice mode in messaging gateways: on Telegram and Discord, the voice gateway transcribes voice messages into text and injects them as ~:user-input~ signals. Agent responses can be optionally spoken back via text-to-speech if the user's message included a voice note (reply in kind).
|
||||
- The voice gateway is a skill (~defskill~~:passepartout-gateway-voice~). No core daemon changes required. The daemon receives text signals whether they originated from a keyboard, a messaging app, or a microphone.
|
||||
- Channels: match OpenClaw's 23+ channels on demand. The Emacs bridge (already done, v0.4.0) proves the pattern. Each new platform (WhatsApp, iMessage, Matrix, IRC, etc.) is a skill that registers a poll-fn + send-fn. ~30 lines per channel.
|
||||
- Providers: match OpenClaw/Hermes on provider count. Adding a new provider is a table entry in ~neuro-provider.lisp~: name, API endpoint, model list, pricing. ~20 lines per provider.
|
||||
- Voice: STT + TTS are REST wrappers (~whisper~ / ~elevenlabs~ / ~espeak~). Already spec'd as a skill. ~50 lines.
|
||||
|
||||
No separate releases. Done when needed, shipped when ready.
|
||||
|
||||
*** TODO Web search + web fetch tools — ~search-web~, ~fetch-web~
|
||||
:PROPERTIES:
|
||||
@@ -2157,7 +2161,7 @@ The Git policy gate (commit-before-modify) is a safety feature no competitor pro
|
||||
|
||||
The TUI tool visualization (v0.8.1) extends seamlessly to MCP tools — the rendering layer doesn't distinguish between native tools and MCP tools. The same colored backgrounds, collapsible outputs, and gate traces apply universally.
|
||||
|
||||
The voice gateway (v0.10.3) adds parity with OpenClaw's voice features without architectural changes — speech-to-text and text-to-speech are thin REST wrappers that feed text signals into the existing pipeline. Combined with the Emacs bridge (v0.4.0), messaging gateways (v0.4.0), and the now-SOTA TUI (v0.7.0–v0.8.3), Passepartout supports four interaction surfaces by v0.10.3: terminal (TUI), messaging apps, Emacs, and voice.
|
||||
The voice gateway and additional channels add parity with OpenClaw's multi-surface approach without architectural changes — every channel is a thin client speaking the same framed TCP protocol to the same daemon. Channels and providers are trivially copyable: each new platform is ~30 lines of poll-loop, each new provider is ~20 lines of API config. Passepartout matches OpenClaw's channel count on demand, shipping when needed rather than as a scheduled milestone.
|
||||
|
||||
** v0.11.0: Planning, Self-Modification & Deterministic Routing
|
||||
|
||||
|
||||
Reference in New Issue
Block a user