passepartout v0.7.2 (Gate Trace + HITL + Search + 11 more features): - Gate trace visualization with Ctrl+G toggle - HITL inline panels with styled collapse on approve/deny - Agent identity file + /identity command - Safe-tool read-only allowlist - Message search mode with Up/Down nav and highlights - Context budget visibility with section breakdown - Session rewind /sessions /resume /rewind - Undo/redo per operation - Context debugging /context why /context dropped - Tool hardening (timeouts, write verify, read-only cache) - Tag stack severity tiers + trigger counts - Merkle provenance audit + audit-verify - Self-help /help <topic> reads USER_MANUAL.org - Live CONFIG section in system prompts - Pads: Page Up/Down scroll by 10 lines Core 92/92 TUI Main 104/104 TUI View 29/29 Neuro 13/13
91 lines
7.2 KiB
Org Mode
91 lines
7.2 KiB
Org Mode
#+TITLE: Comparative Agent Loop & Recovery Study
|
|
#+FILETAGS: :notes:comparative-study:agent-loop:recovery:architecture:
|
|
|
|
* Purpose
|
|
|
|
Compare agent loop architectures and error recovery mechanisms across Claude Code, OpenCode, OpenClaw, and Hermes Agent. Inform Passepartout's signal pipeline (v0.9.0) and recovery strategies.
|
|
|
|
* Findings Summary
|
|
|
|
| Dimension | Claude Code | OpenCode | OpenClaw | Hermes | Passepartout |
|
|
|-----------+-------------+----------+----------+--------+--------------|
|
|
| Loop style | Async generator while(true) | Effect while(true) | Dual nested while(true) | Plain while + budget | while(process-signal) |
|
|
| Streaming tools | StreamingToolExecutor during API stream | AI SDK streamText + tool dispatch | Dual-loop attempt dispatch | Always streaming, chunk iteration | Not yet (v0.7.1) |
|
|
| Watchdog | 90s stream idle, 30s stall | Via Effect cancellation | TUI watchdog + 5-count idle breaker | 90s stale-stream, 60-120s read | Not implemented |
|
|
| Auto-retry | 10 max, 3 for 529, stream→non-stream fallback | Effect retry/fallback | Model fallback + profile rotation | 30+ error flags with specific recovery | Not implemented |
|
|
| Compaction layers | 5 (snip, micro, collapse, auto, reactive) | Auto-compaction + pruning | Post-compaction loop guard | ContextCompressor + 1-line tool pruning | Foveal-peripheral (single layer) |
|
|
| Interrupt | AbortController + signal.reason | Runner.cancel() + BusyError | AbortSignal propagation | Thread-scoped set + 3-level cascade | Single SIGINT handler |
|
|
| Cost control | Token budget + task_budget API | Per-step tracking | Idle-timeout breaker (cost runaway) | IterationBudget + subagent caps | Planned (v0.5.0 token economics) |
|
|
| Busy-mode | N/A (single REPL) | Interrupt running + queue | Interrupt + queue | interrupt/queue/steer (3 modes) | Not implemented |
|
|
|
|
* Claude Code — Loop Architecture
|
|
|
|
**Core loop**: Async generator queryLoop() with mutable State struct tracking messages, tool context, compaction state, recovery counts, transitions. Each iteration: snip → microcompact → collapse → auto-compact → blocking limit → call model → execute tools → loop.
|
|
|
|
**Stop conditions** (return Terminal): completed, blocking_limit, aborted_streaming, aborted_tools, max_turns, stop_hook_prevented, hook_stopped, model_error, prompt_too_long, image_error.
|
|
|
|
**StreamingToolExecutor**: Tools execute DURING the API stream. As tool_use blocks arrive, immediately dispatched. Concurrency model: concurrent-safe tools run in parallel; non-concurrent serialized. Bash tool errors trigger siblingAbortController.
|
|
|
|
**5-Layer Recovery:**
|
|
1. Auto-retry: 10 max, 3 for 529, persistent retry mode (CLAUDE_CODE_UNATTENDED_RETRY), fallback model switch
|
|
2. Stale LLM detection: 90s stream idle timeout, half-time warning at 45s, fallback to non-streaming
|
|
3. Tool error recovery: user_interrupted → REJECT_MESSAGE, streaming_fallback → executor recreated, missing tool results → synthetic errors
|
|
4. Compaction retry: reactive compact on 413 prompt-too-long. Two-stage: context-collapse drain → reactive compact
|
|
5. Watchdog/context overflow: blocking limit pre-check before model call
|
|
|
|
**Unique**: Task budgets carried across compaction boundaries. Thinking block validation at query level. Memory prefetch concurrent with LLM stream.
|
|
|
|
* OpenCode — Effect-TS Functional Pipeline
|
|
|
|
**Core loop**: Effect.fn(SessionPrompt.run) → while(true). Exit on: last assistant has finish reason not tool-calls, structured output produced, processor returns stop/compact, doom loop detected, max steps reached.
|
|
|
|
**Doom loop detection**: 3 consecutive identical tool calls → permission prompt.
|
|
|
|
**Interrupt**: SessionRunState with Runner.onInterrupt. Cancel() sets status to idle. BusyError when starting while running. New request → cancel current → interrupt work returns last assistant message.
|
|
|
|
**Undo/Redo**: SessionRevert with file system snapshots. Revert restores file patches in reverse. Unrevert restores pre-revert snapshot. Per-step diff tracking.
|
|
|
|
**Streaming**: AI SDK streamText with event handling (reasoning-start/delta/end, tool-input-start, tool-call, tool-result, tool-error, text-delta, finish-step). Snapshot diff on each step.
|
|
|
|
* OpenClaw — Dual-Loop with Failover
|
|
|
|
**Outer loop** (runWithModelFallback): Model/provider fallback chain with result classification.
|
|
**Inner loop** (runEmbeddedAttemptWithBackend): Attempt dispatch with auth/profile rotation.
|
|
|
|
**Idle-timeout cost-runaway breaker**: 5 consecutive idle timeouts with no progress → halt paid model calls. Prevents $20-30 runaway from a single code bug.
|
|
|
|
**Auth profile rotation**: Multiple profiles per provider, rotated on failures. MAX_SAME_MODEL_IDLE_TIMEOUT_RETRIES=1 before model switch.
|
|
|
|
**Post-Compaction Loop Guard**: Detects loops where compaction happens but produces no progress.
|
|
|
|
**Streaming watchdog**: Armed on every delta. TUI detects idle stream, updates status. 30s timeout.
|
|
|
|
* Hermes Agent — Monolithic 14,672-line Agent
|
|
|
|
**Core loop**: while(api_call_count < max_iterations AND iteration_budget.remaining > 0). Checks: interrupt flag, iteration budget, steer drain, API retry loop (with rate limiting), compaction loop (max 3), process tool calls.
|
|
|
|
**3-Level Ctrl+C Cascade:**
|
|
1. Graceful interrupt: sets _interrupt_requested, per-thread signal, propagates to children
|
|
2. Clear & Queue: clear_interrupt(), busy mode queue (next turn)
|
|
3. /steer: Non-disruptive injection into next tool result (thread-safe)
|
|
|
|
**IterationBudget**: Parent gets 90 iterations, subagents get 50. execute_code iterations refunded. _budget_grace_call for one extra.
|
|
|
|
**Streaming**: Always preferred path. 90s stale detection, 60-120s read timeout. Chunk iteration with last_chunk_time tracking. Ollama-specific tool call reuse fix.
|
|
|
|
**Recovery flags**: 30+ provider-specific retry flags. JSON repair (5 passes: strict, commas, braces, control chars, unicode). Surrogate sanitization. Dead connection cleanup. Message sequence repair. Credential pool rotation.
|
|
|
|
* Passepartout Blindspot Assessment
|
|
|
|
1. **No streaming tool execution** — Passepartout's pipeline is strictly sequential. Claude Code executes tools during the API stream (hiding ~1s latency). Passepartout's streaming (v0.7.1) should consider whether tool calls can execute during stream. [Action: v0.7.1 protocol design]
|
|
|
|
2. **No recovery layers** — Passepartout has handler-case around think() but no auto-retry, no stale detection, no model fallback. All 3 competitors have multi-layer recovery. [Action: add to v0.9.0 signal pipeline]
|
|
|
|
3. **No busy-mode** — When the agent is running and user types, Passepartout has no defined behavior. Hermes has interrupt/queue/steer. Passepartout should add queue mode at minimum. [Action: v0.9.0 priority queue]
|
|
|
|
4. **Single SIGINT handler** — The 3-level Ctrl+C cascade is universal. Passepartout should match this. [Action: v0.7.0 Ctrl keys]
|
|
|
|
5. **No compaction layers** — Passepartout's foveal-peripheral pruning is a single strategy. Claude Code has 5. Should Passepartout add reactive compaction (compact on PTL) and tool output summarization? [Action: context/compaction study needed]
|
|
|
|
6. **No doom loop detection** — OpenCode detects 3 identical tool calls. Hermes detects budget exhaustion. Passepartout could loop forever on a stuck tool. [Action: add to v0.9.0]
|