:PROPERTIES: :ID: 3aa22300-2f25-57b0-8787-9f199cc978b1 :CREATED: [2026-05-22 Thu] :END: #+title: Competitive Analysis — AI Agent Landscape (May 2026) #+filetags: :passepartout:strategy:competitive: * Overview Analyzed 9 competitor codebases alongside Passepartout. The competitive landscape divides into three categories: 1. Coding agents (Aider, OpenCode, Codex CLI, Claude Code, Gemini CLI) 2. Personal AI assistants (Hermes, OpenClaw, Thoth) 3. CI/check-based systems (Continue) None of the nine compete with Passepartout on all axes simultaneously. Passepartout's strongest differentiators — Org-mode data model, deterministic gate stack, ACL2 verification, Merkle-treed memory, and the triad architecture — are absent from every competitor. * Category 1: Coding Agents ** Aider (Python, ~40K lines, MIT) Language: Python. ~6.8M pip installs. The oldest and most mature open-source coding agent. Architecture: Chat-based Coder class with 5 edit formats (diff, udiff, patch, whole, architect). Uses litellm for universal provider access (50+ providers). RepoMap provides codebase awareness via cosine-similarity embedding. Safety model: Purely prompt-based plus user-confirmation dialogs. No deterministic gate stack. No sandboxing. No model output validator. The allowed_to_edit() gate is a single user confirmation call. --yes flag auto-approves. Aider can edit its own source code with no special protection — self-modification is undetectable. Data model: Ad-hoc. Chat messages in memory. Git commits for persistence. RepoMap is a cosine-similarity index. No persistent memory across sessions. No knowledge graph. Self-modification: Full. No guard against editing its own files. Verification: None. Key gap vs Passepartout: No safety gates, no persistent memory model, no knowledge representation, no verification, no self-modification protection, no architecture for neurosymbolic reasoning. It is a thin shell around litellm + edit format parsers. ** OpenCode (TypeScript/Bun, anomalyco/opencode, 163K★) The dominant open-source coding agent by adoption. Bun runtime, Effect-TS functional core, Solid.js TUI, Turborepo monorepo. Architecture: Dual LLM runtime — default AI SDK (streamText/generateText) + opt-in native Effect-Schema runtime (@opencode-ai/llm) with 4-axis route decomposition (Protocol/Endpoint/Auth/Framing). 30+ provider plugins. Agent workflow DSL with plan/build agent switching. Agent Communication Protocol (ACP) for inter-agent messaging. Subagents inherit permission boundaries from parent. 18+ built-in tools + custom tools from config. Effect-TS ScopedCache per-project state management. Safety model: Explicitly documentes /not/ sandboxing the agent. The permission system is rule-based (glob matching, actions: allow/ask/deny) and exists as a UX feature, not security isolation. Built-in agents have carefully scoped defaults (build allows most, prompts on doom_loop; plan denies all edits except plan files; explore denies everything except grep/glob/bash/webfetch/read; question defaults to deny). Permission rules are inherited by subagents. Shell tool dynamically scans commands for filesystem-impacting operations to determine ask patterns. Data model: SQLite via Drizzle ORM with bun:sqlite or better-sqlite3. Key tables: SessionTable (project, workspace, parent hierarchy, cost, tokens, model JSON, agent config JSON, permission JSON, revert snapshot), MessageTable, PartTable. Project model stores worktree, VCS, sandbox config. Config is JSON-chain (user home → project root → worktree) with remote config fetch and mergeDeep with concatenating array semantics. 20 config modules covering agents, permissions, providers, MCP, LSP, plugins, skills, references, variable. Self-modification: Agent.generate() interface lets the LLM create new agent definitions — the system grows its own subagent roster. Skills system loads domain-specific knowledge packs dynamically. Verification: None. Key gap vs Passepartout: No deterministic safety architecture, no knowledge graph, no Org-mode, no verification/proof system, no neurosymbolic architecture. The permission system is explicitly labeled \"not security isolation\" — it's UX, not a gate stack. Largest userbase and most polished product of any coding agent, but architecturally conventional. ** Codex CLI (OpenAI, Rust, ~950K lines) OpenAI's open-source coding agent. Rust, Sandboxed. Architecture: ~116 crate Rust workspace with a protocol layer (SQ/EQ session types), sandbox manager (macOS Seatbelt, Linux nsjail), multi-provider support (via defined protocol, not directly), configurable TUI. Safety model: Most sophisticated safety system of any coding agent analyzed. Multi-layer: - Process hardening (macOS Seatbelt with 4 profile tiers) - Execution policy engine (defined policy in execpolicy crate) - Sandboxing via nsjail on Linux, seatbelt on macOS - Guardian module for tool permission gating - No prompt-based safety — all deterministic through policy definitions Data model: Protocol-defined session types. Structured request/response models. Config through TOML files with schema validation. Self-modification: Protected by sandbox — the agent cannot escape to modify its own binary or config without explicit policy override. Verification: None (no proof system). Key gap vs Passepartout: No knowledge graph (Org or otherwise), no persistent memory model, no deterministic gate stack for agent behavior (only OS-level sandboxing), no ACL2/prover, no neurosymbolic architecture. Strongest sandbox but weakest cognitive architecture. ** Claude Code (Anthropic, TypeScript/Bun, ~512K lines leaked) Anthropic's proprietary coding agent. Only available via leaked source analysis. Not open source. Architecture: Bun-bundled TypeScript single-file executable. Ink/React terminal UI. 23+ core tools. Subagent forking with byte-identical API prefixes for prompt cache sharing. Multi-agent coordination mode. Safety model: Layered deterministic safety — NOT prompt-based: 1. Permission mode system (7 modes: default, acceptEdits, bypassPermissions, etc.) 2. Persistent permission rules (alwaysAllow, alwaysDeny, alwaysAsk, rule sources from userSettings, projectSettings, localSettings, policySettings) 3. Bash security validator — 2,592 lines of dedicated code with 23+ named security checks using tree-sitter AST parsing 4. Sandbox runtime for filesystem/network containment 5. Path/mode validation 6. Optional ML bash classifier (ant-only feature) This is the most sophisticated safety system of any coding agent. Passepartout's gate stack is architecturally similar (deterministic multi-layer) but Claude Code's implementation is vastly more mature — 2,592 lines of bash validation alone is ~50x the equivalent in Passepartout. Data model: File-based markdown memdir at ~/.claude/projects//memory/. 4 memory types: user, feedback, project, reference. YAML frontmatter in .md files. PROJECT.md and CLAUDE.md for project-level config. No database. Self-modification: HIGH. Skill system writes SKILL.md files that change future behavior. Plugin system, cron scheduling, agent spawning. Verification: None. Key gap vs Passepartout: No proof system, no neurosymbolic architecture, no self-verification, no persistent knowledge graph (flat markdown files, not Org-mode with cross-references), markdown data model lacks semantic depth. Proprietary — Anthropic controls it completely. Linux-only (uses macOS sandbox profiles natively). The permission rules system is impressive but structurally inferior to Passepartout's gate stack because rules are heuristic (regex-based pattern matching) rather than typed (type-level gates with structural guarantees). ** Gemini CLI (Google, TypeScript, ~525K lines, Apache 2.0) Google's open-source coding agent. Node.js 20+, Ink/React TUI. Architecture: 7-package npm monorepo. Core backend handles Gemini API orchestration, tool execution, policy engine, safety checks, sandbox management, session management, MCP client. 7-strategy composite model routing chain. Safety model: Multi-layered: 1. CONSECA (Contextual Security Checker) — AI-driven per-request policy generation using a separate Gemini Flash model. Principle of least privilege. 2. Policy engine — 4 approval modes (PLAN, DEFAULT, AUTO_EDIT, YOLO), hierarchical rules with priority scores and wildcard matching 3. 6 sandbox methods (macOS Seatbelt, Docker/Podman, bwrap, gVisor, LXC, Windows) 4. Trusted folders with discovery phase and path traversal protection 5. Policy integrity verification via cryptographic hashes 6. Built-in safety checkers (AllowedPathChecker, CONSECA) 7. Loop detection service Data model: JSONL session files. Turn-based conversation model. 4-layer config precedence (system-defaults → user → project → system-override). TOML policy files. Self-modification: Modifiable hooks system, MCP extensions, custom commands. Core binaries are protected on disk by file permissions. Verification: None. Key gap vs Passepartout: No proof system, no persistent knowledge graph, no self-verification, no neurosymbolic architecture, lock-in to Google Gemini models (though it can use others via routing). The CONSECA approach is interesting (AI-generated policies) but introduces a second LLM call for every security decision — the opposite of Passepartout's approach of zero-token deterministic gating. * Category 2: Personal AI Assistants ** Hermes Agent (Python, ~17K core, MIT) The agent running this conversation. Python, OpenAI-format conversations. Architecture: Synchronous conversation loop with OpenAI-format messages. 60+ built-in tools. 109+ providers via pluggable transport layer. 15+ messaging platforms via gateway. MCP client (native, not bridge). Ink/React TUI as Node.js subprocess. Cron jobs, Kanban board, subagent delegation. Safety model: Multi-layer but NOT a deterministic gate stack: 1. Message sanitization (surrogates, control chars, malformed JSON) 2. Tirith binary scanner (pre-execution terminal command analysis) 3. Command approval system (manual/smart/off modes) 4. Memory injection detection (prompt injection pattern matching) 5. Secret/PII redaction 6. Tool call guardrails (loop detection) 7. MCP security (env filtering, credential stripping) 8. Context fencing (memory injection span scrubbing) These are all heuristic or prompt-based — no structural type-level gates. Tirith is a separate binary, not in-process. The approval system is good but reactive (LLM proposes → system blocks) rather than preventive (type system prevents by construction). Data model: SQLite session DB (FTS5 full-text search). File-based memory (MEMORY.md + USER.md). YAML config. No knowledge graph. No Org-mode. Self-modification: Skill system writes SKILL.md files. Memory tool edits MEMORY.md/USER.md. Config YAML editable. Core Python code is read-only in execution but the LLM could request modifications to its own source files (no gate specifically prevents this). Verification: None. Key gap vs Passepartout: No deterministic gate stack (heuristic layers, not structural/typed), no knowledge graph, no Org-mode, no neurosymbolic architecture, no self-verification, no proof system. Hermes's strength is breadth — 109 providers, 15 platforms, MCP ecosystem, big tool surface. But it has no depth in safety, knowledge representation, or reasoning architecture. ** OpenClaw (TypeScript/Node.js, ~3.5M lines) The largest codebase analyzed. Personal AI assistant with 25+ messaging channel support. Architecture: pnpm workspace with ~135 bundled plugins. Gateway control plane routes messages through multi-agent routing. Per-agent sessions, workspaces, skill registries. Companion native apps (macOS, iOS, Android). Safety model: Tiered — main agent runs tools directly on host (trusted-operator), non-main sessions sandboxed via Docker (read-only rootfs, capability dropping, seccomp/AppArmor, memory/cpu/PID limits, SSH/OpenShell backends). Data model: Typed JSON/YAML config (openclaw.json). Multi-source model catalog. Plugin SDK with narrow typed subpath exports. Self-modification: ACP (Agent Control Protocol) for spawning child sessions. Skill system with npm distribution and ClawHub registry. Verification: None. Key gap vs Passepartout: Same as Hermes — no gate stack, no knowledge graph, no Org-mode, no verification, no neurosymbolic architecture. Differentiated by vastly broader channel support and mature plugin ecosystem. But architecturally conventional — LLM + tools + channels, no cognitive architecture innovation. ** Thoth (Python, ~151K lines, Apache 2.0) https://github.com/siddsachar/Thoth — Personal AI Sovereignty. Local-first desktop AI assistant with knowledge graph, tools, voice, vision, shell, browser automation, workflow engine, and messaging channels. Architecture: LangGraph create_react_agent (prebuilt ReAct pattern). Dual-mode streaming via agent.stream(). NiceGUI web UI served by Python app.py with desktop launcher (tray icon, Ollama auto-start, browser/OS window). Context trimming via tiktoken to ~85% of model window, base64 data redaction, stale browser snapshot compression (keeps last 8), MD5 tool result dedup, old tool result summarization. 50-step recursion limit (chat), 100 (tasks), 120 (Developer Studio). Agent graph cached by tool set + model override. Checkpoints via LangGraph's SQLite-backed checkpointer. 30+ tool modules. Safety model: Shell command classification (tools/shell_tool.py) with 17 blocked patterns (rm -rf /, mkfs, dd of=/dev/, shutdown, fork bombs, pipe-to-bash, etc.), 30+ safe auto-execute prefixes (ls, cat, grep, git status, etc.), needs-approval for compound commands (;, &&, ||, |, $(), backticks). Interactive interrupt() for non-safe shell — LangGraph human-in-the-loop pauses the graph. Per-workflow safety modes: block (default, refuse non-safe), approve (pause), allow_all. Prompt-injection defense: scans tool outputs and user inputs for 5 categories (role overrides, instruction hijacking, data exfiltration, invisible unicode, hidden HTML directives) — detection-only, no stripping. Filesystem workspace boundary (~/Documents/Thoth). Opt-in Docker Sandbox for Developer Studio. Destructive ops (file delete, moderate shell, Gmail send, calendar delete, memory/task/tracker delete) require confirmation. MCP servers disabled until tested. Custom Tools reviewed and promoted. No sandboxing of agent runtime itself — agent runs in-process. No response-level guardrails. Data model: SQLite (WAL mode) at ~/.thoth/memory.db — shared between knowledge graph and legacy memory. Knowledge graph: SQLite (durable) + NetworkX MultiDiGraph (in-memory, rebuilt on startup) + FAISS vector index (semantic recall, rebuilt on every entity write). 11 entity types (person, preference, fact, event, place, project, organisation, concept, skill, media, self_knowledge). 67+ typed relations with 30+ LLM-produced aliases mapped to canonical forms. Dream Cycle refinement pipeline for entity dedup/merge/stale-confidence decay. Config: JSON files (skills_config.json, api_keys.json, providers.json, channels_config.json). Keys in OS credential store (Windows Credential Manager, macOS Keychain, Linux Secret Service/KWallet). Memory extraction background daemon scanning past conversations every ~2 hours. Self-modification: Agent CAN create/update/delete skills via dedicated tools (thoth_create_skill, thoth_patch_skill, thoth_delete_skill). SKILL.md files with YAML frontmatter at ~/.thoth/skills/. Bundled skills (read-only) at app root; user skills override by name. Skill patching requires user confirmation + auto backup. Maximum 1 patch proposal per conversation. Tool guides cannot be patched. Self-knowledge block injected into system prompt. No tool to modify agent.py, prompts.py, or system prompt directly. Developer Studio provides code editing through approval-gated tools (tool-assisted human workflow, not agent self-mod). Verification: None formal. Update signature verification (updater.py). Comprehensive test suite at tests/test_suite.py. No tool-call verification beyond LangGraph schema validation. No output verification or fact-checking. Key differentiators vs other assistants: LangGraph ReAct agent with structured streaming event model. Personal knowledge graph (11 entity types, 67 relations, NetworkX + FAISS). Developer Studio (Docker sandbox, code threads, Git operations, approval modes). Designer Studio (decks, documents, landing pages, sandboxed interactive runtime). 5 messaging channels (Telegram, Discord, Slack, WhatsApp, SMS) with streaming, reactions, media processing. Background workflow engine (schedules, webhooks, step pipelines, conditions, approvals, concurrency groups). 30+ tool modules including browser automation, shell, Gmail, Calendar, X, image/ video generation. 39 curated Ollama tool-calling models. 10 LLM providers (Ollama, OpenAI, Anthropic, Google AI/Gemini, xAI/Grok, MiniMax, OpenRouter, Ollama Cloud, ChatGPT/Codex subscription, custom endpoints). MCP client (stdio, Streamable HTTP, SSE) with namespaced tools, approval gates. No accounts, no telemetry, no hosted server. Local-first with OS credential store. Key gap vs Passepartout: No deterministic gate stack — shell safety is pattern list (17 blocked, 30 safe), not typed gates. No sandboxed agent runtime. No proof system. No output guardrails. No neurosymbolic architecture. No Org-mode. No Merkle-tree memory. Knowledge graph (SQLite+FAISS) is richer than Hermes but is LLM-driven entity extraction — no structural integrity guarantees. Thoth's differentiation from Hermes/OpenClaw is the knowledge graph + Developer/Designer studios + embedded LangGraph framework — a broader product scope, but still architecturally conventional (LLM + tools + channels + KG), not a new cognitive architecture. * Category 3: CI/Check Systems ** Continue (TypeScript, ~328K lines, Apache 2.0) Source-controlled AI checks for CI/CD. Markdown-as-gate-policy. Architecture: Shared core (@continuedev/core) with ~80 provider implementations, tool-calling engine, config system (YAML/JSON/Markdown). Serves CLI (Ink/React TUI + headless CI mode), IDE extensions (VS Code, JetBrains), web dashboard. Safety model: Three permission levels (allow/ask/exclude). Precedence: mode policies → CLI flags → permissions.yaml → built-in defaults. Terminal security package for shell command analysis via shell-quote parsing. Workspace-scoped file access. Data model: Markdown files for checks, agents, rules. Source-controlled in-repo. YAML frontmatter for metadata. Self-modification: Checks source-controlled — any change goes through git. Verification: None (the checks are themselves unverified). Key gap vs Passepartout: The "checks as markdown" concept is philosophically similar to Passepartout's gate rules (deterministic policies checked before execution) but the implementation is dramatically simpler — regex-based policy objects, not a type-level gate stack with structural guarantees. No persistent agent, no memory, no knowledge graph, no neurosymbolic architecture. It is a gate system without an agent to gate. * The Passepartout Advantage | Dimension | Passepartout | Best Competitor | Gap | |-----------|--------------|-----------------|-----| | Safety model | Type-level gates + 11-vector deterministic stack | Claude Code (7 permission modes + 23 bash checks) | Structural vs heuristic. Passepartout's type-level gates prevent self-modification at the category level; competitors block patterns. | | Knowledge model | Org-mode (tree, properties, TODOs, timestamps, cross-refs, IDs, tags) | Claude Code (flat markdown memdir) | Org-mode's semantic richness is ~15 primitives markdown doesn't have. | | Memory integrity | Merkle tree + SHA-256 + rollback | Hermes (file-based); Claude Code (flat files + git) | Content-addressed, tamper-evident memory no competitor has. | | Self-verification | ACL2 → CIC prover (planned) | None | No competitor does provable correctness. | | Cognitive architecture | 10-80-10 symbolic-first (planned) | 100% LLM (every competitor) | Post-flip, Passepartout uses ~10% of the tokens competitors use. | | Data format | Org-mode (human-editable, machine-parseable, single file) | JSONL/Markdown/YAML/DB (competitors use 2-5 formats) | Unified format reduces translation layers to zero. | | Self-modification | Type-level gates + hot-reload | Claude Code (skills), Hermes (skills) | Passepartout's guard against self-modification is structural (type level), not heuristic (pattern list). | | Triad | Passepartout + Stoa + Agora | None | No competitor is building a full computing stack + social network. | | Provider independence | Any OpenAI-compatible API | Hermes (109+), Gemini CLI (1 primary) | Comparable to Hermes, better than most. | * Where Competitors Lead | Dimension | Leader | Passepartout Status | |-----------|--------|---------------------| | Safety implementation maturity | Claude Code (2,592 lines bash security) | Gate stack exists but bash validation is minimal in comparison | | Provider breadth | Hermes (109+), OpenClaw (50+) | 8 providers — adequate but not competitive | | Channel/platform support | OpenClaw (25+ channels) | TUI only — no multi-channel | | Plugin ecosystem | OpenClaw (ClawHub, npm registry) | No plugin marketplace | | Subagent delegation | Claude Code (fork with context inheritance) | Planned via Screamer planner | | Codebase size / features shipped | All competitors have working products | In development | | MCP integration | Hermes, Codex (native), Continue | Planned | | Sandboxing | Codex CLI (Seatbelt+nsjail), Gemini CLI (6 methods) | None | | Business model | Hermes (MIT+services), Codex (tokens) | AGPL + appliances + SaaS | | Cross-platform | Claude Code (macOS/*nix), Codex (macOS) | Linux only | * Strategic Positioning Passepartout is not competing in the existing AI agent market. It is building a new category: provable personal infrastructure. Competitors optimize for: - Token efficiency (Aider's edit formats, OpenCode's LSP integration) - Model flexibility (Hermes' 109 providers) - Platform reach (OpenClaw's 25 channels) - UI polish (Gemini CLI's Ink/React, Claude Code's permission dialogs) - Sandbox security (Codex's Seatbelt, Gemini's gVisor) Passepartout optimizes for: - Provable correctness (ACL2 → CIC) - Data integrity (Merkle tree) - Cognitive architecture (10-80-10 symbolic-first) - Safety by construction (type-level gates) - Unified data model (Org-mode as everything) - Network effects (Agora) - Full-stack ownership (Stoa) These are not axes any competitor cares about. The risk is not that a competitor builds a better Passepartout — it's that the market never develops a preference for provable agents. If token-burning LLM agents remain the default and users don't demand verification, the entire category Passepartout addresses may not exist yet. * Immediate Implications for Development 1. Claude Code's safety system is the benchmark to exceed. The type-level gate architecture is theoretically superior to Claude Code's heuristic patterns, but the implementation needs to prove it catches things Claude Code misses. 2. No competitor has anything resembling a neurosymbolic architecture. The 10-80-10 plan has zero competition — but that also means zero market validation. 3. The Org-mode bet is invisible to competitors. They don't see the advantage because they've never tried to build a knowledge graph from flat markdown files. This is Passepartout's widest moat — it depends on a skill (Org-mode literate programming) that no competitor's team has. 4. Hermes is the closest full-stack competitor (tools, skills, cron, subagents, multi-platform), but architecturally conventional. For Hermes to match Passepartout, it would need to be rewritten. 5. The coding agents (Aider, OpenCode, Codex) are not competitors — they are single-purpose tools Passepartout could eventually replace entirely when the planner matures. * File references Repository dumps and analysis artifacts at /tmp/: - /tmp/aider/ — Aider source (Python) - /tmp/opencode/ — OpenCode archived source (Go) - /tmp/codex/ — OpenAI Codex CLI (Rust) - /tmp/claude-code-leaked-source/ — Claude Code leaked (TypeScript/Bun) - /tmp/gemini-cli/ — Google Gemini CLI (TypeScript) - /tmp/openclaw/ — OpenClaw source (TypeScript) - /tmp/thoth/ — Thoth source (Python) - /tmp/continue/ — Continue source (TypeScript) - /usr/local/lib/hermes-agent/ — Hermes Agent (Python)