Files
hermes-brain/ideas/competitive-analysis-2026-05.org
Hermes 3f38e87f4f Remove versioned roadmap from Passepartout docs
All version numbers stripped from roadmap across all brain documents:
- passepartout-economics.org: v0.x.x version table → phase-name-only table,
  v1.0.0 → 'neurosymbolic maturity', versioned text references → capability
  descriptions. Retained phase names (Phase 0-7) and line estimates as they
  describe capabilities, not version milestones.
- competitive-analysis-2026-05.org: version references removed
- time-estimates.org: v0.4.0 → 'initial state', v1.0.0 → 'neurosymbolic maturity'
- native-org-knowledge-base.org: v0.8.0-v0.9.0 → capability-based target
2026-05-23 06:24:20 +00:00

24 KiB
Raw Blame History

Competitive Analysis — AI Agent Landscape (May 2026)

Overview

Analyzed 9 competitor codebases alongside Passepartout. The competitive landscape divides into three categories:

  1. Coding agents (Aider, OpenCode, Codex CLI, Claude Code, Gemini CLI)
  2. Personal AI assistants (Hermes, OpenClaw, Thoth)
  3. CI/check-based systems (Continue)

None of the nine compete with Passepartout on all axes simultaneously. Passepartout's strongest differentiators — Org-mode data model, deterministic gate stack, ACL2 verification, Merkle-treed memory, and the triad architecture — are absent from every competitor.

Category 1: Coding Agents

Aider (Python, ~40K lines, MIT)

Language: Python. ~6.8M pip installs. The oldest and most mature open-source coding agent.

Architecture: Chat-based Coder class with 5 edit formats (diff, udiff, patch, whole, architect). Uses litellm for universal provider access (50+ providers). RepoMap provides codebase awareness via cosine-similarity embedding.

Safety model: Purely prompt-based plus user-confirmation dialogs. No deterministic gate stack. No sandboxing. No model output validator. The allowed_to_edit() gate is a single user confirmation call. yes flag auto-approves. Aider can edit its own source code with no special protection — self-modification is undetectable.

Data model: Ad-hoc. Chat messages in memory. Git commits for persistence. RepoMap is a cosine-similarity index. No persistent memory across sessions. No knowledge graph.

Self-modification: Full. No guard against editing its own files.

Verification: None.

Key gap vs Passepartout: No safety gates, no persistent memory model, no knowledge representation, no verification, no self-modification protection, no architecture for neurosymbolic reasoning. It is a thin shell around litellm + edit format parsers.

OpenCode (TypeScript/Bun, anomalyco/opencode, 163K★)

The dominant open-source coding agent by adoption. Bun runtime, Effect-TS functional core, Solid.js TUI, Turborepo monorepo.

Architecture: Dual LLM runtime — default AI SDK (streamText/generateText) + opt-in native Effect-Schema runtime (@opencode-ai/llm) with 4-axis route decomposition (Protocol/Endpoint/Auth/Framing). 30+ provider plugins. Agent workflow DSL with plan/build agent switching. Agent Communication Protocol (ACP) for inter-agent messaging. Subagents inherit permission boundaries from parent. 18+ built-in tools + custom tools from config. Effect-TS ScopedCache per-project state management.

Safety model: Explicitly documentes not sandboxing the agent. The permission system is rule-based (glob matching, actions: allow/ask/deny) and exists as a UX feature, not security isolation. Built-in agents have carefully scoped defaults (build allows most, prompts on doom_loop; plan denies all edits except plan files; explore denies everything except grep/glob/bash/webfetch/read; question defaults to deny). Permission rules are inherited by subagents. Shell tool dynamically scans commands for filesystem-impacting operations to determine ask patterns.

Data model: SQLite via Drizzle ORM with bun:sqlite or better-sqlite3. Key tables: SessionTable (project, workspace, parent hierarchy, cost, tokens, model JSON, agent config JSON, permission JSON, revert snapshot), MessageTable, PartTable. Project model stores worktree, VCS, sandbox config. Config is JSON-chain (user home → project root → worktree) with remote config fetch and mergeDeep with concatenating array semantics. 20 config modules covering agents, permissions, providers, MCP, LSP, plugins, skills, references, variable.

Self-modification: Agent.generate() interface lets the LLM create new agent definitions — the system grows its own subagent roster. Skills system loads domain-specific knowledge packs dynamically.

Verification: None.

Key gap vs Passepartout: No deterministic safety architecture, no knowledge graph, no Org-mode, no verification/proof system, no neurosymbolic architecture. The permission system is explicitly labeled \"not security isolation\" — it's UX, not a gate stack. Largest userbase and most polished product of any coding agent, but architecturally conventional.

Codex CLI (OpenAI, Rust, ~950K lines)

OpenAI's open-source coding agent. Rust, Sandboxed.

Architecture: ~116 crate Rust workspace with a protocol layer (SQ/EQ session types), sandbox manager (macOS Seatbelt, Linux nsjail), multi-provider support (via defined protocol, not directly), configurable TUI.

Safety model: Most sophisticated safety system of any coding agent analyzed. Multi-layer:

  • Process hardening (macOS Seatbelt with 4 profile tiers)
  • Execution policy engine (defined policy in execpolicy crate)
  • Sandboxing via nsjail on Linux, seatbelt on macOS
  • Guardian module for tool permission gating
  • No prompt-based safety — all deterministic through policy definitions

Data model: Protocol-defined session types. Structured request/response models. Config through TOML files with schema validation.

Self-modification: Protected by sandbox — the agent cannot escape to modify its own binary or config without explicit policy override.

Verification: None (no proof system).

Key gap vs Passepartout: No knowledge graph (Org or otherwise), no persistent memory model, no deterministic gate stack for agent behavior (only OS-level sandboxing), no ACL2/prover, no neurosymbolic architecture. Strongest sandbox but weakest cognitive architecture.

Claude Code (Anthropic, TypeScript/Bun, ~512K lines leaked)

Anthropic's proprietary coding agent. Only available via leaked source analysis. Not open source.

Architecture: Bun-bundled TypeScript single-file executable. Ink/React terminal UI. 23+ core tools. Subagent forking with byte-identical API prefixes for prompt cache sharing. Multi-agent coordination mode.

Safety model: Layered deterministic safety — NOT prompt-based:

  1. Permission mode system (7 modes: default, acceptEdits, bypassPermissions, etc.)
  2. Persistent permission rules (alwaysAllow, alwaysDeny, alwaysAsk, rule sources from userSettings, projectSettings, localSettings, policySettings)
  3. Bash security validator — 2,592 lines of dedicated code with 23+ named security checks using tree-sitter AST parsing
  4. Sandbox runtime for filesystem/network containment
  5. Path/mode validation
  6. Optional ML bash classifier (ant-only feature)

This is the most sophisticated safety system of any coding agent. Passepartout's gate stack is architecturally similar (deterministic multi-layer) but Claude Code's implementation is vastly more mature — 2,592 lines of bash validation alone is ~50x the equivalent in Passepartout.

Data model: File-based markdown memdir at ~/.claude/projects/<slug>/memory/. 4 memory types: user, feedback, project, reference. YAML frontmatter in .md files. PROJECT.md and CLAUDE.md for project-level config. No database.

Self-modification: HIGH. Skill system writes SKILL.md files that change future behavior. Plugin system, cron scheduling, agent spawning.

Verification: None.

Key gap vs Passepartout: No proof system, no neurosymbolic architecture, no self-verification, no persistent knowledge graph (flat markdown files, not Org-mode with cross-references), markdown data model lacks semantic depth. Proprietary — Anthropic controls it completely. Linux-only (uses macOS sandbox profiles natively). The permission rules system is impressive but structurally inferior to Passepartout's gate stack because rules are heuristic (regex-based pattern matching) rather than typed (type-level gates with structural guarantees).

Gemini CLI (Google, TypeScript, ~525K lines, Apache 2.0)

Google's open-source coding agent. Node.js 20+, Ink/React TUI.

Architecture: 7-package npm monorepo. Core backend handles Gemini API orchestration, tool execution, policy engine, safety checks, sandbox management, session management, MCP client. 7-strategy composite model routing chain.

Safety model: Multi-layered:

  1. CONSECA (Contextual Security Checker) — AI-driven per-request policy generation using a separate Gemini Flash model. Principle of least privilege.
  2. Policy engine — 4 approval modes (PLAN, DEFAULT, AUTO_EDIT, YOLO), hierarchical rules with priority scores and wildcard matching
  3. 6 sandbox methods (macOS Seatbelt, Docker/Podman, bwrap, gVisor, LXC, Windows)
  4. Trusted folders with discovery phase and path traversal protection
  5. Policy integrity verification via cryptographic hashes
  6. Built-in safety checkers (AllowedPathChecker, CONSECA)
  7. Loop detection service

Data model: JSONL session files. Turn-based conversation model. 4-layer config precedence (system-defaults → user → project → system-override). TOML policy files.

Self-modification: Modifiable hooks system, MCP extensions, custom commands. Core binaries are protected on disk by file permissions.

Verification: None.

Key gap vs Passepartout: No proof system, no persistent knowledge graph, no self-verification, no neurosymbolic architecture, lock-in to Google Gemini models (though it can use others via routing). The CONSECA approach is interesting (AI-generated policies) but introduces a second LLM call for every security decision — the opposite of Passepartout's approach of zero-token deterministic gating.

Category 2: Personal AI Assistants

Hermes Agent (Python, ~17K core, MIT)

The agent running this conversation. Python, OpenAI-format conversations.

Architecture: Synchronous conversation loop with OpenAI-format messages. 60+ built-in tools. 109+ providers via pluggable transport layer. 15+ messaging platforms via gateway. MCP client (native, not bridge). Ink/React TUI as Node.js subprocess. Cron jobs, Kanban board, subagent delegation.

Safety model: Multi-layer but NOT a deterministic gate stack:

  1. Message sanitization (surrogates, control chars, malformed JSON)
  2. Tirith binary scanner (pre-execution terminal command analysis)
  3. Command approval system (manual/smart/off modes)
  4. Memory injection detection (prompt injection pattern matching)
  5. Secret/PII redaction
  6. Tool call guardrails (loop detection)
  7. MCP security (env filtering, credential stripping)
  8. Context fencing (memory injection span scrubbing)

These are all heuristic or prompt-based — no structural type-level gates. Tirith is a separate binary, not in-process. The approval system is good but reactive (LLM proposes → system blocks) rather than preventive (type system prevents by construction).

Data model: SQLite session DB (FTS5 full-text search). File-based memory (MEMORY.md + USER.md). YAML config. No knowledge graph. No Org-mode.

Self-modification: Skill system writes SKILL.md files. Memory tool edits MEMORY.md/USER.md. Config YAML editable. Core Python code is read-only in execution but the LLM could request modifications to its own source files (no gate specifically prevents this).

Verification: None.

Key gap vs Passepartout: No deterministic gate stack (heuristic layers, not structural/typed), no knowledge graph, no Org-mode, no neurosymbolic architecture, no self-verification, no proof system. Hermes's strength is breadth — 109 providers, 15 platforms, MCP ecosystem, big tool surface. But it has no depth in safety, knowledge representation, or reasoning architecture.

OpenClaw (TypeScript/Node.js, ~3.5M lines)

The largest codebase analyzed. Personal AI assistant with 25+ messaging channel support.

Architecture: pnpm workspace with ~135 bundled plugins. Gateway control plane routes messages through multi-agent routing. Per-agent sessions, workspaces, skill registries. Companion native apps (macOS, iOS, Android).

Safety model: Tiered — main agent runs tools directly on host (trusted-operator), non-main sessions sandboxed via Docker (read-only rootfs, capability dropping, seccomp/AppArmor, memory/cpu/PID limits, SSH/OpenShell backends).

Data model: Typed JSON/YAML config (openclaw.json). Multi-source model catalog. Plugin SDK with narrow typed subpath exports.

Self-modification: ACP (Agent Control Protocol) for spawning child sessions. Skill system with npm distribution and ClawHub registry.

Verification: None.

Key gap vs Passepartout: Same as Hermes — no gate stack, no knowledge graph, no Org-mode, no verification, no neurosymbolic architecture. Differentiated by vastly broader channel support and mature plugin ecosystem. But architecturally conventional — LLM + tools + channels, no cognitive architecture innovation.

Thoth (Python, ~151K lines, Apache 2.0)

https://github.com/siddsachar/Thoth — Personal AI Sovereignty. Local-first desktop AI assistant with knowledge graph, tools, voice, vision, shell, browser automation, workflow engine, and messaging channels.

Architecture: LangGraph create_react_agent (prebuilt ReAct pattern). Dual-mode streaming via agent.stream(). NiceGUI web UI served by Python app.py with desktop launcher (tray icon, Ollama auto-start, browser/OS window). Context trimming via tiktoken to ~85% of model window, base64 data redaction, stale browser snapshot compression (keeps last 8), MD5 tool result dedup, old tool result summarization. 50-step recursion limit (chat), 100 (tasks), 120 (Developer Studio). Agent graph cached by tool set + model override. Checkpoints via LangGraph's SQLite-backed checkpointer. 30+ tool modules.

Safety model: Shell command classification (tools/shell_tool.py) with 17 blocked patterns (rm -rf , mkfs, dd of=/dev, shutdown, fork bombs, pipe-to-bash, etc.), 30+ safe auto-execute prefixes (ls, cat, grep, git status, etc.), needs-approval for compound commands (;, &&, ||, |, $(), backticks). Interactive interrupt() for non-safe shell — LangGraph human-in-the-loop pauses the graph. Per-workflow safety modes: block (default, refuse non-safe), approve (pause), allow_all. Prompt-injection defense: scans tool outputs and user inputs for 5 categories (role overrides, instruction hijacking, data exfiltration, invisible unicode, hidden HTML directives) — detection-only, no stripping. Filesystem workspace boundary (~/Documents/Thoth). Opt-in Docker Sandbox for Developer Studio. Destructive ops (file delete, moderate shell, Gmail send, calendar delete, memory/task/tracker delete) require confirmation. MCP servers disabled until tested. Custom Tools reviewed and promoted. No sandboxing of agent runtime itself — agent runs in-process. No response-level guardrails.

Data model: SQLite (WAL mode) at ~/.thoth/memory.db — shared between knowledge graph and legacy memory. Knowledge graph: SQLite (durable) + NetworkX MultiDiGraph (in-memory, rebuilt on startup) + FAISS vector index (semantic recall, rebuilt on every entity write). 11 entity types (person, preference, fact, event, place, project, organisation, concept, skill, media, self_knowledge). 67+ typed relations with 30+ LLM-produced aliases mapped to canonical forms. Dream Cycle refinement pipeline for entity dedup/merge/stale-confidence decay. Config: JSON files (skills_config.json, api_keys.json, providers.json, channels_config.json). Keys in OS credential store (Windows Credential Manager, macOS Keychain, Linux Secret Service/KWallet). Memory extraction background daemon scanning past conversations every ~2 hours.

Self-modification: Agent CAN create/update/delete skills via dedicated tools (thoth_create_skill, thoth_patch_skill, thoth_delete_skill). SKILL.md files with YAML frontmatter at ~/.thoth/skills/. Bundled skills (read-only) at app root; user skills override by name. Skill patching requires user confirmation + auto backup. Maximum 1 patch proposal per conversation. Tool guides cannot be patched. Self-knowledge block injected into system prompt. No tool to modify agent.py, prompts.py, or system prompt directly. Developer Studio provides code editing through approval-gated tools (tool-assisted human workflow, not agent self-mod).

Verification: None formal. Update signature verification (updater.py). Comprehensive test suite at tests/test_suite.py. No tool-call verification beyond LangGraph schema validation. No output verification or fact-checking.

Key differentiators vs other assistants: LangGraph ReAct agent with structured streaming event model. Personal knowledge graph (11 entity types, 67 relations, NetworkX + FAISS). Developer Studio (Docker sandbox, code threads, Git operations, approval modes). Designer Studio (decks, documents, landing pages, sandboxed interactive runtime). 5 messaging channels (Telegram, Discord, Slack, WhatsApp, SMS) with streaming, reactions, media processing. Background workflow engine (schedules, webhooks, step pipelines, conditions, approvals, concurrency groups). 30+ tool modules including browser automation, shell, Gmail, Calendar, X, image/ video generation. 39 curated Ollama tool-calling models. 10 LLM providers (Ollama, OpenAI, Anthropic, Google AI/Gemini, xAI/Grok, MiniMax, OpenRouter, Ollama Cloud, ChatGPT/Codex subscription, custom endpoints). MCP client (stdio, Streamable HTTP, SSE) with namespaced tools, approval gates. No accounts, no telemetry, no hosted server. Local-first with OS credential store.

Key gap vs Passepartout: No deterministic gate stack — shell safety is pattern list (17 blocked, 30 safe), not typed gates. No sandboxed agent runtime. No proof system. No output guardrails. No neurosymbolic architecture. No Org-mode. No Merkle-tree memory. Knowledge graph (SQLite+FAISS) is richer than Hermes but is LLM-driven entity extraction — no structural integrity guarantees. Thoth's differentiation from Hermes/OpenClaw is the knowledge graph + Developer/Designer studios + embedded LangGraph framework — a broader product scope, but still architecturally conventional (LLM + tools + channels + KG), not a new cognitive architecture.

Category 3: CI/Check Systems

Continue (TypeScript, ~328K lines, Apache 2.0)

Source-controlled AI checks for CI/CD. Markdown-as-gate-policy.

Architecture: Shared core (@continuedev/core) with ~80 provider implementations, tool-calling engine, config system (YAML/JSON/Markdown). Serves CLI (Ink/React TUI

  • headless CI mode), IDE extensions (VS Code, JetBrains), web dashboard.

Safety model: Three permission levels (allow/ask/exclude). Precedence: mode policies → CLI flags → permissions.yaml → built-in defaults. Terminal security package for shell command analysis via shell-quote parsing. Workspace-scoped file access.

Data model: Markdown files for checks, agents, rules. Source-controlled in-repo. YAML frontmatter for metadata.

Self-modification: Checks source-controlled — any change goes through git.

Verification: None (the checks are themselves unverified).

Key gap vs Passepartout: The "checks as markdown" concept is philosophically similar to Passepartout's gate rules (deterministic policies checked before execution) but the implementation is dramatically simpler — regex-based policy objects, not a type-level gate stack with structural guarantees. No persistent agent, no memory, no knowledge graph, no neurosymbolic architecture. It is a gate system without an agent to gate.

The Passepartout Advantage

Dimension Passepartout Best Competitor Gap
Safety model Type-level gates + 11-vector deterministic stack Claude Code (7 permission modes + 23 bash checks) Structural vs heuristic. Passepartout's type-level gates prevent self-modification at the category level; competitors block patterns.
Knowledge model Org-mode (tree, properties, TODOs, timestamps, cross-refs, IDs, tags) Claude Code (flat markdown memdir) Org-mode's semantic richness is ~15 primitives markdown doesn't have.
Memory integrity Merkle tree + SHA-256 + rollback Hermes (file-based); Claude Code (flat files + git) Content-addressed, tamper-evident memory no competitor has.
Self-verification ACL2 → CIC prover (planned) None No competitor does provable correctness.
Cognitive architecture 10-80-10 symbolic-first (planned) 100% LLM (every competitor) Post-flip, Passepartout uses ~10% of the tokens competitors use.
Data format Org-mode (human-editable, machine-parseable, single file) JSONL/Markdown/YAML/DB (competitors use 2-5 formats) Unified format reduces translation layers to zero.
Self-modification Type-level gates + hot-reload Claude Code (skills), Hermes (skills) Passepartout's guard against self-modification is structural (type level), not heuristic (pattern list).
Triad Passepartout + Stoa + Agora None No competitor is building a full computing stack + social network.
Provider independence Any OpenAI-compatible API Hermes (109+), Gemini CLI (1 primary) Comparable to Hermes, better than most.

Where Competitors Lead

Dimension Leader Passepartout Status
Safety implementation maturity Claude Code (2,592 lines bash security) Gate stack exists but bash validation is minimal in comparison
Provider breadth Hermes (109+), OpenClaw (50+) 8 providers — adequate but not competitive
Channel/platform support OpenClaw (25+ channels) TUI only — no multi-channel
Plugin ecosystem OpenClaw (ClawHub, npm registry) No plugin marketplace
Subagent delegation Claude Code (fork with context inheritance) Planned via Screamer planner
Codebase size / features shipped All competitors have working products In development
MCP integration Hermes, Codex (native), Continue Planned
Sandboxing Codex CLI (Seatbelt+nsjail), Gemini CLI (6 methods) None
Business model Hermes (MIT+services), Codex (tokens) AGPL + appliances + SaaS
Cross-platform Claude Code (macOS/*nix), Codex (macOS) Linux only

Strategic Positioning

Passepartout is not competing in the existing AI agent market. It is building a new category: provable personal infrastructure.

Competitors optimize for:

  • Token efficiency (Aider's edit formats, OpenCode's LSP integration)
  • Model flexibility (Hermes' 109 providers)
  • Platform reach (OpenClaw's 25 channels)
  • UI polish (Gemini CLI's Ink/React, Claude Code's permission dialogs)
  • Sandbox security (Codex's Seatbelt, Gemini's gVisor)

Passepartout optimizes for:

  • Provable correctness (ACL2 → CIC)
  • Data integrity (Merkle tree)
  • Cognitive architecture (10-80-10 symbolic-first)
  • Safety by construction (type-level gates)
  • Unified data model (Org-mode as everything)
  • Network effects (Agora)
  • Full-stack ownership (Stoa)

These are not axes any competitor cares about. The risk is not that a competitor builds a better Passepartout — it's that the market never develops a preference for provable agents. If token-burning LLM agents remain the default and users don't demand verification, the entire category Passepartout addresses may not exist yet.

Immediate Implications for Development

  1. Claude Code's safety system is the benchmark to exceed. The type-level gate architecture is theoretically superior to Claude Code's heuristic patterns, but the implementation needs to prove it catches things Claude Code misses.
  2. No competitor has anything resembling a neurosymbolic architecture. The 10-80-10 plan has zero competition — but that also means zero market validation.
  3. The Org-mode bet is invisible to competitors. They don't see the advantage because they've never tried to build a knowledge graph from flat markdown files. This is Passepartout's widest moat — it depends on a skill (Org-mode literate programming) that no competitor's team has.
  4. Hermes is the closest full-stack competitor (tools, skills, cron, subagents, multi-platform), but architecturally conventional. For Hermes to match Passepartout, it would need to be rewritten.
  5. The coding agents (Aider, OpenCode, Codex) are not competitors — they are single-purpose tools Passepartout could eventually replace entirely when the planner matures.

File references

Repository dumps and analysis artifacts at tmp:

  • tmp/aider — Aider source (Python)
  • tmp/opencode — OpenCode archived source (Go)
  • tmp/codex — OpenAI Codex CLI (Rust)
  • tmp/claude-code-leaked-source — Claude Code leaked (TypeScript/Bun)
  • tmp/gemini-cli — Google Gemini CLI (TypeScript)
  • tmp/openclaw — OpenClaw source (TypeScript)
  • tmp/thoth — Thoth source (Python)
  • tmp/continue — Continue source (TypeScript)
  • usr/local/lib/hermes-agent — Hermes Agent (Python)