Files
hermes-brain/ideas/competitive-analysis-2026-05.org
Hermes 7b2ea7f28d Add Thoth to competitive analysis; refine compute marketplace thesis
- Thoth: new Category 2 entry (Personal AI Assistants), LangGraph ReAct
  agent with knowledge graph, Developer/Designer studios, 151K LOC
- Compute marketplace: answer the structural question 'why buy compute
  if every user runs their own Passepartout?' — three structural reasons:
  specialized proof libraries, certification weight, bootstrap verification
2026-05-23 05:36:27 +00:00

464 lines
24 KiB
Org Mode

:PROPERTIES:
:ID: 3aa22300-2f25-57b0-8787-9f199cc978b1
:CREATED: [2026-05-22 Thu]
:END:
#+title: Competitive Analysis — AI Agent Landscape (May 2026)
#+filetags: :passepartout:strategy:competitive:
* Overview
Analyzed 9 competitor codebases alongside Passepartout. The competitive landscape
divides into three categories:
1. Coding agents (Aider, OpenCode, Codex CLI, Claude Code, Gemini CLI)
2. Personal AI assistants (Hermes, OpenClaw, Thoth)
3. CI/check-based systems (Continue)
None of the nine compete with Passepartout on all axes simultaneously. Passepartout's
strongest differentiators — Org-mode data model, deterministic gate stack, ACL2
verification, Merkle-treed memory, and the triad architecture — are absent from
every competitor.
* Category 1: Coding Agents
** Aider (Python, ~40K lines, MIT)
Language: Python. ~6.8M pip installs. The oldest and most mature open-source
coding agent.
Architecture: Chat-based Coder class with 5 edit formats (diff, udiff, patch,
whole, architect). Uses litellm for universal provider access (50+ providers).
RepoMap provides codebase awareness via cosine-similarity embedding.
Safety model: Purely prompt-based plus user-confirmation dialogs. No deterministic
gate stack. No sandboxing. No model output validator. The allowed_to_edit() gate
is a single user confirmation call. --yes flag auto-approves. Aider can edit its
own source code with no special protection — self-modification is undetectable.
Data model: Ad-hoc. Chat messages in memory. Git commits for persistence. RepoMap
is a cosine-similarity index. No persistent memory across sessions. No knowledge
graph.
Self-modification: Full. No guard against editing its own files.
Verification: None.
Key gap vs Passepartout: No safety gates, no persistent memory model, no knowledge
representation, no verification, no self-modification protection, no architecture
for neurosymbolic reasoning. It is a thin shell around litellm + edit format
parsers.
** OpenCode (TypeScript/Bun, anomalyco/opencode, 163K★)
The dominant open-source coding agent by adoption. Bun runtime, Effect-TS
functional core, Solid.js TUI, Turborepo monorepo.
Architecture: Dual LLM runtime — default AI SDK (streamText/generateText) +
opt-in native Effect-Schema runtime (@opencode-ai/llm) with 4-axis route
decomposition (Protocol/Endpoint/Auth/Framing). 30+ provider plugins.
Agent workflow DSL with plan/build agent switching. Agent Communication
Protocol (ACP) for inter-agent messaging. Subagents inherit permission
boundaries from parent. 18+ built-in tools + custom tools from config.
Effect-TS ScopedCache per-project state management.
Safety model: Explicitly documentes /not/ sandboxing the agent. The
permission system is rule-based (glob matching, actions: allow/ask/deny)
and exists as a UX feature, not security isolation. Built-in agents have
carefully scoped defaults (build allows most, prompts on doom_loop;
plan denies all edits except plan files; explore denies everything except
grep/glob/bash/webfetch/read; question defaults to deny). Permission
rules are inherited by subagents. Shell tool dynamically scans commands
for filesystem-impacting operations to determine ask patterns.
Data model: SQLite via Drizzle ORM with bun:sqlite or better-sqlite3.
Key tables: SessionTable (project, workspace, parent hierarchy, cost,
tokens, model JSON, agent config JSON, permission JSON, revert snapshot),
MessageTable, PartTable. Project model stores worktree, VCS, sandbox
config. Config is JSON-chain (user home → project root → worktree) with
remote config fetch and mergeDeep with concatenating array semantics.
20 config modules covering agents, permissions, providers, MCP, LSP,
plugins, skills, references, variable.
Self-modification: Agent.generate() interface lets the LLM create new
agent definitions — the system grows its own subagent roster. Skills
system loads domain-specific knowledge packs dynamically.
Verification: None.
Key gap vs Passepartout: No deterministic safety architecture, no
knowledge graph, no Org-mode, no verification/proof system, no
neurosymbolic architecture. The permission system is explicitly labeled
\"not security isolation\" — it's UX, not a gate stack. Largest userbase
and most polished product of any coding agent, but architecturally
conventional.
** Codex CLI (OpenAI, Rust, ~950K lines)
OpenAI's open-source coding agent. Rust, Sandboxed.
Architecture: ~116 crate Rust workspace with a protocol layer (SQ/EQ session types),
sandbox manager (macOS Seatbelt, Linux nsjail), multi-provider support (via defined
protocol, not directly), configurable TUI.
Safety model: Most sophisticated safety system of any coding agent analyzed.
Multi-layer:
- Process hardening (macOS Seatbelt with 4 profile tiers)
- Execution policy engine (defined policy in execpolicy crate)
- Sandboxing via nsjail on Linux, seatbelt on macOS
- Guardian module for tool permission gating
- No prompt-based safety — all deterministic through policy definitions
Data model: Protocol-defined session types. Structured request/response models.
Config through TOML files with schema validation.
Self-modification: Protected by sandbox — the agent cannot escape to modify its
own binary or config without explicit policy override.
Verification: None (no proof system).
Key gap vs Passepartout: No knowledge graph (Org or otherwise), no persistent
memory model, no deterministic gate stack for agent behavior (only OS-level
sandboxing), no ACL2/prover, no neurosymbolic architecture. Strongest sandbox
but weakest cognitive architecture.
** Claude Code (Anthropic, TypeScript/Bun, ~512K lines leaked)
Anthropic's proprietary coding agent. Only available via leaked source analysis.
Not open source.
Architecture: Bun-bundled TypeScript single-file executable. Ink/React terminal UI.
23+ core tools. Subagent forking with byte-identical API prefixes for prompt cache
sharing. Multi-agent coordination mode.
Safety model: Layered deterministic safety — NOT prompt-based:
1. Permission mode system (7 modes: default, acceptEdits, bypassPermissions, etc.)
2. Persistent permission rules (alwaysAllow, alwaysDeny, alwaysAsk, rule sources
from userSettings, projectSettings, localSettings, policySettings)
3. Bash security validator — 2,592 lines of dedicated code with 23+ named
security checks using tree-sitter AST parsing
4. Sandbox runtime for filesystem/network containment
5. Path/mode validation
6. Optional ML bash classifier (ant-only feature)
This is the most sophisticated safety system of any coding agent. Passepartout's
gate stack is architecturally similar (deterministic multi-layer) but Claude
Code's implementation is vastly more mature — 2,592 lines of bash validation
alone is ~50x the equivalent in Passepartout.
Data model: File-based markdown memdir at ~/.claude/projects/<slug>/memory/.
4 memory types: user, feedback, project, reference. YAML frontmatter in .md files.
PROJECT.md and CLAUDE.md for project-level config. No database.
Self-modification: HIGH. Skill system writes SKILL.md files that change future
behavior. Plugin system, cron scheduling, agent spawning.
Verification: None.
Key gap vs Passepartout: No proof system, no neurosymbolic architecture, no
self-verification, no persistent knowledge graph (flat markdown files, not
Org-mode with cross-references), markdown data model lacks semantic depth.
Proprietary — Anthropic controls it completely. Linux-only (uses macOS sandbox
profiles natively). The permission rules system is impressive but structurally
inferior to Passepartout's gate stack because rules are heuristic (regex-based
pattern matching) rather than typed (type-level gates with structural guarantees).
** Gemini CLI (Google, TypeScript, ~525K lines, Apache 2.0)
Google's open-source coding agent. Node.js 20+, Ink/React TUI.
Architecture: 7-package npm monorepo. Core backend handles Gemini API orchestration,
tool execution, policy engine, safety checks, sandbox management, session management,
MCP client. 7-strategy composite model routing chain.
Safety model: Multi-layered:
1. CONSECA (Contextual Security Checker) — AI-driven per-request policy generation
using a separate Gemini Flash model. Principle of least privilege.
2. Policy engine — 4 approval modes (PLAN, DEFAULT, AUTO_EDIT, YOLO), hierarchical
rules with priority scores and wildcard matching
3. 6 sandbox methods (macOS Seatbelt, Docker/Podman, bwrap, gVisor, LXC, Windows)
4. Trusted folders with discovery phase and path traversal protection
5. Policy integrity verification via cryptographic hashes
6. Built-in safety checkers (AllowedPathChecker, CONSECA)
7. Loop detection service
Data model: JSONL session files. Turn-based conversation model. 4-layer config
precedence (system-defaults → user → project → system-override). TOML policy files.
Self-modification: Modifiable hooks system, MCP extensions, custom commands.
Core binaries are protected on disk by file permissions.
Verification: None.
Key gap vs Passepartout: No proof system, no persistent knowledge graph, no
self-verification, no neurosymbolic architecture, lock-in to Google Gemini models
(though it can use others via routing). The CONSECA approach is interesting
(AI-generated policies) but introduces a second LLM call for every security
decision — the opposite of Passepartout's approach of zero-token deterministic gating.
* Category 2: Personal AI Assistants
** Hermes Agent (Python, ~17K core, MIT)
The agent running this conversation. Python, OpenAI-format conversations.
Architecture: Synchronous conversation loop with OpenAI-format messages. 60+
built-in tools. 109+ providers via pluggable transport layer. 15+ messaging
platforms via gateway. MCP client (native, not bridge). Ink/React TUI as Node.js
subprocess. Cron jobs, Kanban board, subagent delegation.
Safety model: Multi-layer but NOT a deterministic gate stack:
1. Message sanitization (surrogates, control chars, malformed JSON)
2. Tirith binary scanner (pre-execution terminal command analysis)
3. Command approval system (manual/smart/off modes)
4. Memory injection detection (prompt injection pattern matching)
5. Secret/PII redaction
6. Tool call guardrails (loop detection)
7. MCP security (env filtering, credential stripping)
8. Context fencing (memory injection span scrubbing)
These are all heuristic or prompt-based — no structural type-level gates.
Tirith is a separate binary, not in-process. The approval system is good but
reactive (LLM proposes → system blocks) rather than preventive (type system
prevents by construction).
Data model: SQLite session DB (FTS5 full-text search). File-based memory
(MEMORY.md + USER.md). YAML config. No knowledge graph. No Org-mode.
Self-modification: Skill system writes SKILL.md files. Memory tool edits
MEMORY.md/USER.md. Config YAML editable. Core Python code is read-only in
execution but the LLM could request modifications to its own source files
(no gate specifically prevents this).
Verification: None.
Key gap vs Passepartout: No deterministic gate stack (heuristic layers, not
structural/typed), no knowledge graph, no Org-mode, no neurosymbolic architecture,
no self-verification, no proof system. Hermes's strength is breadth —
109 providers, 15 platforms, MCP ecosystem, big tool surface. But it has no
depth in safety, knowledge representation, or reasoning architecture.
** OpenClaw (TypeScript/Node.js, ~3.5M lines)
The largest codebase analyzed. Personal AI assistant with 25+ messaging channel
support.
Architecture: pnpm workspace with ~135 bundled plugins. Gateway control plane
routes messages through multi-agent routing. Per-agent sessions, workspaces,
skill registries. Companion native apps (macOS, iOS, Android).
Safety model: Tiered — main agent runs tools directly on host (trusted-operator),
non-main sessions sandboxed via Docker (read-only rootfs, capability dropping,
seccomp/AppArmor, memory/cpu/PID limits, SSH/OpenShell backends).
Data model: Typed JSON/YAML config (openclaw.json). Multi-source model catalog.
Plugin SDK with narrow typed subpath exports.
Self-modification: ACP (Agent Control Protocol) for spawning child sessions.
Skill system with npm distribution and ClawHub registry.
Verification: None.
Key gap vs Passepartout: Same as Hermes — no gate stack, no knowledge graph,
no Org-mode, no verification, no neurosymbolic architecture. Differentiated by
vastly broader channel support and mature plugin ecosystem. But architecturally
conventional — LLM + tools + channels, no cognitive architecture innovation.
** Thoth (Python, ~151K lines, Apache 2.0)
https://github.com/siddsachar/Thoth — Personal AI Sovereignty. Local-first
desktop AI assistant with knowledge graph, tools, voice, vision, shell,
browser automation, workflow engine, and messaging channels.
Architecture: LangGraph create_react_agent (prebuilt ReAct pattern). Dual-mode
streaming via agent.stream(). NiceGUI web UI served by Python app.py with
desktop launcher (tray icon, Ollama auto-start, browser/OS window). Context
trimming via tiktoken to ~85% of model window, base64 data redaction, stale
browser snapshot compression (keeps last 8), MD5 tool result dedup, old tool
result summarization. 50-step recursion limit (chat), 100 (tasks), 120 (Developer
Studio). Agent graph cached by tool set + model override. Checkpoints via
LangGraph's SQLite-backed checkpointer. 30+ tool modules.
Safety model: Shell command classification (tools/shell_tool.py) with 17 blocked
patterns (rm -rf /, mkfs, dd of=/dev/, shutdown, fork bombs, pipe-to-bash, etc.),
30+ safe auto-execute prefixes (ls, cat, grep, git status, etc.), needs-approval
for compound commands (;, &&, ||, |, $(), backticks). Interactive interrupt() for
non-safe shell — LangGraph human-in-the-loop pauses the graph. Per-workflow safety
modes: block (default, refuse non-safe), approve (pause), allow_all.
Prompt-injection defense: scans tool outputs and user inputs for 5 categories
(role overrides, instruction hijacking, data exfiltration, invisible unicode,
hidden HTML directives) — detection-only, no stripping. Filesystem workspace
boundary (~/Documents/Thoth). Opt-in Docker Sandbox for Developer Studio.
Destructive ops (file delete, moderate shell, Gmail send, calendar delete,
memory/task/tracker delete) require confirmation. MCP servers disabled until
tested. Custom Tools reviewed and promoted. No sandboxing of agent runtime
itself — agent runs in-process. No response-level guardrails.
Data model: SQLite (WAL mode) at ~/.thoth/memory.db — shared between knowledge
graph and legacy memory. Knowledge graph: SQLite (durable) + NetworkX MultiDiGraph
(in-memory, rebuilt on startup) + FAISS vector index (semantic recall, rebuilt on
every entity write). 11 entity types (person, preference, fact, event, place,
project, organisation, concept, skill, media, self_knowledge). 67+ typed relations
with 30+ LLM-produced aliases mapped to canonical forms. Dream Cycle refinement
pipeline for entity dedup/merge/stale-confidence decay. Config: JSON files
(skills_config.json, api_keys.json, providers.json, channels_config.json). Keys in
OS credential store (Windows Credential Manager, macOS Keychain, Linux Secret
Service/KWallet). Memory extraction background daemon scanning past conversations
every ~2 hours.
Self-modification: Agent CAN create/update/delete skills via dedicated tools
(thoth_create_skill, thoth_patch_skill, thoth_delete_skill). SKILL.md files with
YAML frontmatter at ~/.thoth/skills/. Bundled skills (read-only) at app root;
user skills override by name. Skill patching requires user confirmation + auto
backup. Maximum 1 patch proposal per conversation. Tool guides cannot be patched.
Self-knowledge block injected into system prompt. No tool to modify agent.py,
prompts.py, or system prompt directly. Developer Studio provides code editing
through approval-gated tools (tool-assisted human workflow, not agent self-mod).
Verification: None formal. Update signature verification (updater.py).
Comprehensive test suite at tests/test_suite.py. No tool-call verification beyond
LangGraph schema validation. No output verification or fact-checking.
Key differentiators vs other assistants: LangGraph ReAct agent with structured
streaming event model. Personal knowledge graph (11 entity types, 67 relations,
NetworkX + FAISS). Developer Studio (Docker sandbox, code threads, Git operations,
approval modes). Designer Studio (decks, documents, landing pages, sandboxed
interactive runtime). 5 messaging channels (Telegram, Discord, Slack, WhatsApp,
SMS) with streaming, reactions, media processing. Background workflow engine
(schedules, webhooks, step pipelines, conditions, approvals, concurrency groups).
30+ tool modules including browser automation, shell, Gmail, Calendar, X, image/
video generation. 39 curated Ollama tool-calling models. 10 LLM providers (Ollama,
OpenAI, Anthropic, Google AI/Gemini, xAI/Grok, MiniMax, OpenRouter, Ollama Cloud,
ChatGPT/Codex subscription, custom endpoints). MCP client (stdio, Streamable HTTP,
SSE) with namespaced tools, approval gates. No accounts, no telemetry, no hosted
server. Local-first with OS credential store.
Key gap vs Passepartout: No deterministic gate stack — shell safety is pattern
list (17 blocked, 30 safe), not typed gates. No sandboxed agent runtime. No
proof system. No output guardrails. No neurosymbolic architecture. No Org-mode.
No Merkle-tree memory. Knowledge graph (SQLite+FAISS) is richer than Hermes but
is LLM-driven entity extraction — no structural integrity guarantees. Thoth's
differentiation from Hermes/OpenClaw is the knowledge graph + Developer/Designer
studios + embedded LangGraph framework — a broader product scope, but still
architecturally conventional (LLM + tools + channels + KG), not a new cognitive
architecture.
* Category 3: CI/Check Systems
** Continue (TypeScript, ~328K lines, Apache 2.0)
Source-controlled AI checks for CI/CD. Markdown-as-gate-policy.
Architecture: Shared core (@continuedev/core) with ~80 provider implementations,
tool-calling engine, config system (YAML/JSON/Markdown). Serves CLI (Ink/React TUI
+ headless CI mode), IDE extensions (VS Code, JetBrains), web dashboard.
Safety model: Three permission levels (allow/ask/exclude). Precedence: mode policies
→ CLI flags → permissions.yaml → built-in defaults. Terminal security package for
shell command analysis via shell-quote parsing. Workspace-scoped file access.
Data model: Markdown files for checks, agents, rules. Source-controlled in-repo.
YAML frontmatter for metadata.
Self-modification: Checks source-controlled — any change goes through git.
Verification: None (the checks are themselves unverified).
Key gap vs Passepartout: The "checks as markdown" concept is philosophically
similar to Passepartout's gate rules (deterministic policies checked before
execution) but the implementation is dramatically simpler — regex-based policy
objects, not a type-level gate stack with structural guarantees. No persistent
agent, no memory, no knowledge graph, no neurosymbolic architecture. It is a
gate system without an agent to gate.
* The Passepartout Advantage
| Dimension | Passepartout | Best Competitor | Gap |
|-----------|--------------|-----------------|-----|
| Safety model | Type-level gates + 11-vector deterministic stack | Claude Code (7 permission modes + 23 bash checks) | Structural vs heuristic. Passepartout's type-level gates prevent self-modification at the category level; competitors block patterns. |
| Knowledge model | Org-mode (tree, properties, TODOs, timestamps, cross-refs, IDs, tags) | Claude Code (flat markdown memdir) | Org-mode's semantic richness is ~15 primitives markdown doesn't have. |
| Memory integrity | Merkle tree + SHA-256 + rollback | Hermes (file-based); Claude Code (flat files + git) | Content-addressed, tamper-evident memory no competitor has. |
| Self-verification | ACL2 → CIC prover (planned) | None | No competitor does provable correctness. |
| Cognitive architecture | 10-80-10 symbolic-first (planned) | 100% LLM (every competitor) | Post-flip, Passepartout uses ~10% of the tokens competitors use. |
| Data format | Org-mode (human-editable, machine-parseable, single file) | JSONL/Markdown/YAML/DB (competitors use 2-5 formats) | Unified format reduces translation layers to zero. |
| Self-modification | Type-level gates + hot-reload | Claude Code (skills), Hermes (skills) | Passepartout's guard against self-modification is structural (type level), not heuristic (pattern list). |
| Triad | Passepartout + Stoa + Agora | None | No competitor is building a full computing stack + social network. |
| Provider independence | Any OpenAI-compatible API | Hermes (109+), Gemini CLI (1 primary) | Comparable to Hermes, better than most. |
* Where Competitors Lead
| Dimension | Leader | Passepartout Status |
|-----------|--------|---------------------|
| Safety implementation maturity | Claude Code (2,592 lines bash security) | Gate stack exists but bash validation is minimal in comparison |
| Provider breadth | Hermes (109+), OpenClaw (50+) | 8 providers — adequate but not competitive |
| Channel/platform support | OpenClaw (25+ channels) | TUI only — no multi-channel |
| Plugin ecosystem | OpenClaw (ClawHub, npm registry) | No plugin marketplace |
| Subagent delegation | Claude Code (fork with context inheritance) | Planned via Screamer planner |
| Codebase size / features shipped | All competitors have working products | v0.7.2 in development |
| MCP integration | Hermes, Codex (native), Continue | Planned v0.53.0 |
| Sandboxing | Codex CLI (Seatbelt+nsjail), Gemini CLI (6 methods) | None |
| Business model | Hermes (MIT+services), Codex (tokens) | AGPL + appliances + SaaS |
| Cross-platform | Claude Code (macOS/*nix), Codex (macOS) | Linux only |
* Strategic Positioning
Passepartout is not competing in the existing AI agent market. It is building a
new category: provable personal infrastructure.
Competitors optimize for:
- Token efficiency (Aider's edit formats, OpenCode's LSP integration)
- Model flexibility (Hermes' 109 providers)
- Platform reach (OpenClaw's 25 channels)
- UI polish (Gemini CLI's Ink/React, Claude Code's permission dialogs)
- Sandbox security (Codex's Seatbelt, Gemini's gVisor)
Passepartout optimizes for:
- Provable correctness (ACL2 → CIC)
- Data integrity (Merkle tree)
- Cognitive architecture (10-80-10 symbolic-first)
- Safety by construction (type-level gates)
- Unified data model (Org-mode as everything)
- Network effects (Agora)
- Full-stack ownership (Stoa)
These are not axes any competitor cares about. The risk is not that a competitor
builds a better Passepartout — it's that the market never develops a preference
for provable agents. If token-burning LLM agents remain the default and users
don't demand verification, the entire category Passepartout addresses may not
exist yet.
* Immediate Implications for Development
1. Claude Code's safety system is the benchmark to exceed. The type-level gate
architecture is theoretically superior to Claude Code's heuristic patterns,
but the implementation at v0.11.0 needs to prove it catches things Claude Code
misses.
2. No competitor has anything resembling a neurosymbolic architecture. The 10-80-10
plan has zero competition — but that also means zero market validation.
3. The Org-mode bet is invisible to competitors. They don't see the advantage
because they've never tried to build a knowledge graph from flat markdown files.
This is Passepartout's widest moat — it depends on a skill (Org-mode literate
programming) that no competitor's team has.
4. Hermes is the closest full-stack competitor (tools, skills, cron, subagents,
multi-platform), but architecturally conventional. For Hermes to match
Passepartout, it would need to be rewritten.
5. The coding agents (Aider, OpenCode, Codex) are not competitors — they are
single-purpose tools Passepartout could eventually replace entirely when the
planner matures.
* File references
Repository dumps and analysis artifacts at /tmp/:
- /tmp/aider/ — Aider source (Python)
- /tmp/opencode/ — OpenCode archived source (Go)
- /tmp/codex/ — OpenAI Codex CLI (Rust)
- /tmp/claude-code-leaked-source/ — Claude Code leaked (TypeScript/Bun)
- /tmp/gemini-cli/ — Google Gemini CLI (TypeScript)
- /tmp/openclaw/ — OpenClaw source (TypeScript)
- /tmp/thoth/ — Thoth source (Python)
- /tmp/continue/ — Continue source (TypeScript)
- /usr/local/lib/hermes-agent/ — Hermes Agent (Python)