memex/notes/comparative-agent-loop.org

#+TITLE: Comparative Agent Loop & Recovery Study
#+FILETAGS: :notes:comparative-study:agent-loop:recovery:architecture:

* Purpose

Compare agent loop architectures and error recovery mechanisms across Claude Code, OpenCode, OpenClaw, and Hermes Agent. Inform Passepartout's signal pipeline (v0.9.0) and recovery strategies.

* Findings Summary

| Dimension | Claude Code | OpenCode | OpenClaw | Hermes | Passepartout |
|-----------+-------------+----------+----------+--------+--------------|
| Loop style | Async generator while(true) | Effect while(true) | Dual nested while(true) | Plain while + budget | while(process-signal) |
| Streaming tools | StreamingToolExecutor during API stream | AI SDK streamText + tool dispatch | Dual-loop attempt dispatch | Always streaming, chunk iteration | Not yet (v0.7.1) |
| Watchdog | 90s stream idle, 30s stall | Via Effect cancellation | TUI watchdog + 5-count idle breaker | 90s stale-stream, 60-120s read | Not implemented |
| Auto-retry | 10 max, 3 for 529, stream→non-stream fallback | Effect retry/fallback | Model fallback + profile rotation | 30+ error flags with specific recovery | Not implemented |
| Compaction layers | 5 (snip, micro, collapse, auto, reactive) | Auto-compaction + pruning | Post-compaction loop guard | ContextCompressor + 1-line tool pruning | Foveal-peripheral (single layer) |
| Interrupt | AbortController + signal.reason | Runner.cancel() + BusyError | AbortSignal propagation | Thread-scoped set + 3-level cascade | Single SIGINT handler |
| Cost control | Token budget + task_budget API | Per-step tracking | Idle-timeout breaker (cost runaway) | IterationBudget + subagent caps | Planned (v0.5.0 token economics) |
| Busy-mode | N/A (single REPL) | Interrupt running + queue | Interrupt + queue | interrupt/queue/steer (3 modes) | Not implemented |

* Claude Code — Loop Architecture

**Core loop**: Async generator queryLoop() with mutable State struct tracking messages, tool context, compaction state, recovery counts, transitions. Each iteration: snip → microcompact → collapse → auto-compact → blocking limit → call model → execute tools → loop.

**Stop conditions** (return Terminal): completed, blocking_limit, aborted_streaming, aborted_tools, max_turns, stop_hook_prevented, hook_stopped, model_error, prompt_too_long, image_error.

**StreamingToolExecutor**: Tools execute DURING the API stream. As tool_use blocks arrive, immediately dispatched. Concurrency model: concurrent-safe tools run in parallel; non-concurrent serialized. Bash tool errors trigger siblingAbortController.

**5-Layer Recovery:**
1. Auto-retry: 10 max, 3 for 529, persistent retry mode (CLAUDE_CODE_UNATTENDED_RETRY), fallback model switch
2. Stale LLM detection: 90s stream idle timeout, half-time warning at 45s, fallback to non-streaming
3. Tool error recovery: user_interrupted → REJECT_MESSAGE, streaming_fallback → executor recreated, missing tool results → synthetic errors
4. Compaction retry: reactive compact on 413 prompt-too-long. Two-stage: context-collapse drain → reactive compact
5. Watchdog/context overflow: blocking limit pre-check before model call

**Unique**: Task budgets carried across compaction boundaries. Thinking block validation at query level. Memory prefetch concurrent with LLM stream.

* OpenCode — Effect-TS Functional Pipeline

**Core loop**: Effect.fn(SessionPrompt.run) → while(true). Exit on: last assistant has finish reason not tool-calls, structured output produced, processor returns stop/compact, doom loop detected, max steps reached.

**Doom loop detection**: 3 consecutive identical tool calls → permission prompt.

**Interrupt**: SessionRunState with Runner.onInterrupt. Cancel() sets status to idle. BusyError when starting while running. New request → cancel current → interrupt work returns last assistant message.

**Undo/Redo**: SessionRevert with file system snapshots. Revert restores file patches in reverse. Unrevert restores pre-revert snapshot. Per-step diff tracking.

**Streaming**: AI SDK streamText with event handling (reasoning-start/delta/end, tool-input-start, tool-call, tool-result, tool-error, text-delta, finish-step). Snapshot diff on each step.

* OpenClaw — Dual-Loop with Failover

**Outer loop** (runWithModelFallback): Model/provider fallback chain with result classification.
**Inner loop** (runEmbeddedAttemptWithBackend): Attempt dispatch with auth/profile rotation.

**Idle-timeout cost-runaway breaker**: 5 consecutive idle timeouts with no progress → halt paid model calls. Prevents $20-30 runaway from a single code bug.

**Auth profile rotation**: Multiple profiles per provider, rotated on failures. MAX_SAME_MODEL_IDLE_TIMEOUT_RETRIES=1 before model switch.

**Post-Compaction Loop Guard**: Detects loops where compaction happens but produces no progress.

**Streaming watchdog**: Armed on every delta. TUI detects idle stream, updates status. 30s timeout.

* Hermes Agent — Monolithic 14,672-line Agent

**Core loop**: while(api_call_count < max_iterations AND iteration_budget.remaining > 0). Checks: interrupt flag, iteration budget, steer drain, API retry loop (with rate limiting), compaction loop (max 3), process tool calls.

**3-Level Ctrl+C Cascade:**
1. Graceful interrupt: sets _interrupt_requested, per-thread signal, propagates to children
2. Clear & Queue: clear_interrupt(), busy mode queue (next turn)
3. /steer: Non-disruptive injection into next tool result (thread-safe)

**IterationBudget**: Parent gets 90 iterations, subagents get 50. execute_code iterations refunded. _budget_grace_call for one extra.

**Streaming**: Always preferred path. 90s stale detection, 60-120s read timeout. Chunk iteration with last_chunk_time tracking. Ollama-specific tool call reuse fix.

**Recovery flags**: 30+ provider-specific retry flags. JSON repair (5 passes: strict, commas, braces, control chars, unicode). Surrogate sanitization. Dead connection cleanup. Message sequence repair. Credential pool rotation.

* Passepartout Blindspot Assessment

1. **No streaming tool execution** — Passepartout's pipeline is strictly sequential. Claude Code executes tools during the API stream (hiding ~1s latency). Passepartout's streaming (v0.7.1) should consider whether tool calls can execute during stream. [Action: v0.7.1 protocol design]

2. **No recovery layers** — Passepartout has handler-case around think() but no auto-retry, no stale detection, no model fallback. All 3 competitors have multi-layer recovery. [Action: add to v0.9.0 signal pipeline]

3. **No busy-mode** — When the agent is running and user types, Passepartout has no defined behavior. Hermes has interrupt/queue/steer. Passepartout should add queue mode at minimum. [Action: v0.9.0 priority queue]

4. **Single SIGINT handler** — The 3-level Ctrl+C cascade is universal. Passepartout should match this. [Action: v0.7.0 Ctrl keys]

5. **No compaction layers** — Passepartout's foveal-peripheral pruning is a single strategy. Claude Code has 5. Should Passepartout add reactive compaction (compact on PTL) and tool output summarization? [Action: context/compaction study needed]

6. **No doom loop detection** — OpenCode detects 3 identical tool calls. Hermes detects budget exhaustion. Passepartout could loop forever on a stuck tool. [Action: add to v0.9.0]