hermes-brain/ideas/competitive-analysis-2026-05.org

:PROPERTIES:
:ID:       3aa22300-2f25-57b0-8787-9f199cc978b1
:CREATED:  [2026-05-22 Thu]
:END:
#+title: Competitive Analysis — AI Agent Landscape (May 2026)
#+filetags: :passepartout:strategy:competitive:

* Overview

Analyzed 8 competitor codebases alongside Passepartout. The competitive landscape
divides into three categories:

1. Coding agents (Aider, OpenCode, Codex CLI, Claude Code, Gemini CLI)
2. Personal AI assistants (Hermes, OpenClaw)
3. CI/check-based systems (Continue)

None of the eight compete with Passepartout on all axes simultaneously. Passepartout's
strongest differentiators — Org-mode data model, deterministic gate stack, ACL2
verification, Merkle-treed memory, and the triad architecture — are absent from
every competitor.

* Category 1: Coding Agents

** Aider (Python, ~40K lines, MIT)

Language: Python. ~6.8M pip installs. The oldest and most mature open-source
coding agent.

Architecture: Chat-based Coder class with 5 edit formats (diff, udiff, patch,
whole, architect). Uses litellm for universal provider access (50+ providers).
RepoMap provides codebase awareness via cosine-similarity embedding.

Safety model: Purely prompt-based plus user-confirmation dialogs. No deterministic
gate stack. No sandboxing. No model output validator. The allowed_to_edit() gate
is a single user confirmation call. --yes flag auto-approves. Aider can edit its
own source code with no special protection — self-modification is undetectable.

Data model: Ad-hoc. Chat messages in memory. Git commits for persistence. RepoMap
is a cosine-similarity index. No persistent memory across sessions. No knowledge
graph.

Self-modification: Full. No guard against editing its own files.

Verification: None.

Key gap vs Passepartout: No safety gates, no persistent memory model, no knowledge
representation, no verification, no self-modification protection, no architecture
for neurosymbolic reasoning. It is a thin shell around litellm + edit format
parsers.

** OpenCode / Crush (Go, ~42K lines, MIT)

Now archived and succeeded by Crush (github.com/charmbracelet/crush). Go-based,
Bubble Tea TUI.

Architecture: Pubsub-driven layered architecture with LSP integration, 8+ provider
clients (Anthropic, OpenAI, Gemini, Bedrock, Copilot, Azure, Vertex, Groq, xAI),
hierarchical subagent delegation (child agents have read-only tools).

Safety model: Hybrid prompt-based + deterministic permission gating. Permission
dialog blocks on channel until user approves. Bash commands have a banned-list
(no curl/wget/nc/telnet). Read-before-write invariant ensures edits only on
freshly-read content.

Data model: SQLite with 3 tables — sessions, messages (JSON parts column),
files (versioned file history per session). Hierarchical sessions via
parent_session_id.

Self-modification: No protection against editing its own Go source.

Verification: None.

Key gap vs Passepartout: No safety gates, no knowledge graph, no Org-mode,
no neurosymbolic architecture. The archived project status is a risk.

** Codex CLI (OpenAI, Rust, ~950K lines)

OpenAI's open-source coding agent. Rust, Sandboxed.

Architecture: ~116 crate Rust workspace with a protocol layer (SQ/EQ session types),
sandbox manager (macOS Seatbelt, Linux nsjail), multi-provider support (via defined
protocol, not directly), configurable TUI.

Safety model: Most sophisticated safety system of any coding agent analyzed.
Multi-layer:
- Process hardening (macOS Seatbelt with 4 profile tiers)
- Execution policy engine (defined policy in execpolicy crate)
- Sandboxing via nsjail on Linux, seatbelt on macOS
- Guardian module for tool permission gating
- No prompt-based safety — all deterministic through policy definitions

Data model: Protocol-defined session types. Structured request/response models.
Config through TOML files with schema validation.

Self-modification: Protected by sandbox — the agent cannot escape to modify its
own binary or config without explicit policy override.

Verification: None (no proof system).

Key gap vs Passepartout: No knowledge graph (Org or otherwise), no persistent
memory model, no deterministic gate stack for agent behavior (only OS-level
sandboxing), no ACL2/prover, no neurosymbolic architecture. Strongest sandbox
but weakest cognitive architecture.

** Claude Code (Anthropic, TypeScript/Bun, ~512K lines leaked)

Anthropic's proprietary coding agent. Only available via leaked source analysis.
Not open source.

Architecture: Bun-bundled TypeScript single-file executable. Ink/React terminal UI.
23+ core tools. Subagent forking with byte-identical API prefixes for prompt cache
sharing. Multi-agent coordination mode.

Safety model: Layered deterministic safety — NOT prompt-based:
1. Permission mode system (7 modes: default, acceptEdits, bypassPermissions, etc.)
2. Persistent permission rules (alwaysAllow, alwaysDeny, alwaysAsk, rule sources
   from userSettings, projectSettings, localSettings, policySettings)
3. Bash security validator — 2,592 lines of dedicated code with 23+ named
   security checks using tree-sitter AST parsing
4. Sandbox runtime for filesystem/network containment
5. Path/mode validation
6. Optional ML bash classifier (ant-only feature)

This is the most sophisticated safety system of any coding agent. Passepartout's
gate stack is architecturally similar (deterministic multi-layer) but Claude
Code's implementation is vastly more mature — 2,592 lines of bash validation
alone is ~50x the equivalent in Passepartout.

Data model: File-based markdown memdir at ~/.claude/projects/<slug>/memory/.
4 memory types: user, feedback, project, reference. YAML frontmatter in .md files.
PROJECT.md and CLAUDE.md for project-level config. No database.

Self-modification: HIGH. Skill system writes SKILL.md files that change future
behavior. Plugin system, cron scheduling, agent spawning.

Verification: None.

Key gap vs Passepartout: No proof system, no neurosymbolic architecture, no
self-verification, no persistent knowledge graph (flat markdown files, not
Org-mode with cross-references), markdown data model lacks semantic depth.
Proprietary — Anthropic controls it completely. Linux-only (uses macOS sandbox
profiles natively). The permission rules system is impressive but structurally
inferior to Passepartout's gate stack because rules are heuristic (regex-based
pattern matching) rather than typed (type-level gates with structural guarantees).

** Gemini CLI (Google, TypeScript, ~525K lines, Apache 2.0)

Google's open-source coding agent. Node.js 20+, Ink/React TUI.

Architecture: 7-package npm monorepo. Core backend handles Gemini API orchestration,
tool execution, policy engine, safety checks, sandbox management, session management,
MCP client. 7-strategy composite model routing chain.

Safety model: Multi-layered:
1. CONSECA (Contextual Security Checker) — AI-driven per-request policy generation
   using a separate Gemini Flash model. Principle of least privilege.
2. Policy engine — 4 approval modes (PLAN, DEFAULT, AUTO_EDIT, YOLO), hierarchical
   rules with priority scores and wildcard matching
3. 6 sandbox methods (macOS Seatbelt, Docker/Podman, bwrap, gVisor, LXC, Windows)
4. Trusted folders with discovery phase and path traversal protection
5. Policy integrity verification via cryptographic hashes
6. Built-in safety checkers (AllowedPathChecker, CONSECA)
7. Loop detection service

Data model: JSONL session files. Turn-based conversation model. 4-layer config
precedence (system-defaults → user → project → system-override). TOML policy files.

Self-modification: Modifiable hooks system, MCP extensions, custom commands.
Core binaries are protected on disk by file permissions.

Verification: None.

Key gap vs Passepartout: No proof system, no persistent knowledge graph, no
self-verification, no neurosymbolic architecture, lock-in to Google Gemini models
(though it can use others via routing). The CONSECA approach is interesting
(AI-generated policies) but introduces a second LLM call for every security
decision — the opposite of Passepartout's approach of zero-token deterministic gating.

* Category 2: Personal AI Assistants

** Hermes Agent (Python, ~17K core, MIT)

The agent running this conversation. Python, OpenAI-format conversations.

Architecture: Synchronous conversation loop with OpenAI-format messages. 60+
built-in tools. 109+ providers via pluggable transport layer. 15+ messaging
platforms via gateway. MCP client (native, not bridge). Ink/React TUI as Node.js
subprocess. Cron jobs, Kanban board, subagent delegation.

Safety model: Multi-layer but NOT a deterministic gate stack:
1. Message sanitization (surrogates, control chars, malformed JSON)
2. Tirith binary scanner (pre-execution terminal command analysis)
3. Command approval system (manual/smart/off modes)
4. Memory injection detection (prompt injection pattern matching)
5. Secret/PII redaction
6. Tool call guardrails (loop detection)
7. MCP security (env filtering, credential stripping)
8. Context fencing (memory injection span scrubbing)

These are all heuristic or prompt-based — no structural type-level gates.
Tirith is a separate binary, not in-process. The approval system is good but
reactive (LLM proposes → system blocks) rather than preventive (type system
prevents by construction).

Data model: SQLite session DB (FTS5 full-text search). File-based memory
(MEMORY.md + USER.md). YAML config. No knowledge graph. No Org-mode.

Self-modification: Skill system writes SKILL.md files. Memory tool edits
MEMORY.md/USER.md. Config YAML editable. Core Python code is read-only in
execution but the LLM could request modifications to its own source files
(no gate specifically prevents this).

Verification: None.

Key gap vs Passepartout: No deterministic gate stack (heuristic layers, not
structural/typed), no knowledge graph, no Org-mode, no neurosymbolic architecture,
no self-verification, no proof system. Hermes's strength is breadth —
109 providers, 15 platforms, MCP ecosystem, big tool surface. But it has no
depth in safety, knowledge representation, or reasoning architecture.

** OpenClaw (TypeScript/Node.js, ~3.5M lines)

The largest codebase analyzed. Personal AI assistant with 25+ messaging channel
support.

Architecture: pnpm workspace with ~135 bundled plugins. Gateway control plane
routes messages through multi-agent routing. Per-agent sessions, workspaces,
skill registries. Companion native apps (macOS, iOS, Android).

Safety model: Tiered — main agent runs tools directly on host (trusted-operator),
non-main sessions sandboxed via Docker (read-only rootfs, capability dropping,
seccomp/AppArmor, memory/cpu/PID limits, SSH/OpenShell backends).

Data model: Typed JSON/YAML config (openclaw.json). Multi-source model catalog.
Plugin SDK with narrow typed subpath exports.

Self-modification: ACP (Agent Control Protocol) for spawning child sessions.
Skill system with npm distribution and ClawHub registry.

Verification: None.

Key gap vs Passepartout: Same as Hermes — no gate stack, no knowledge graph,
no Org-mode, no verification, no neurosymbolic architecture. Differentiated by
vastly broader channel support and mature plugin ecosystem. But architecturally
conventional — LLM + tools + channels, no cognitive architecture innovation.

* Category 3: CI/Check Systems

** Continue (TypeScript, ~328K lines, Apache 2.0)

Source-controlled AI checks for CI/CD. Markdown-as-gate-policy.

Architecture: Shared core (@continuedev/core) with ~80 provider implementations,
tool-calling engine, config system (YAML/JSON/Markdown). Serves CLI (Ink/React TUI
+ headless CI mode), IDE extensions (VS Code, JetBrains), web dashboard.

Safety model: Three permission levels (allow/ask/exclude). Precedence: mode policies
→ CLI flags → permissions.yaml → built-in defaults. Terminal security package for
shell command analysis via shell-quote parsing. Workspace-scoped file access.

Data model: Markdown files for checks, agents, rules. Source-controlled in-repo.
YAML frontmatter for metadata.

Self-modification: Checks source-controlled — any change goes through git.

Verification: None (the checks are themselves unverified).

Key gap vs Passepartout: The "checks as markdown" concept is philosophically
similar to Passepartout's gate rules (deterministic policies checked before
execution) but the implementation is dramatically simpler — regex-based policy
objects, not a type-level gate stack with structural guarantees. No persistent
agent, no memory, no knowledge graph, no neurosymbolic architecture. It is a
gate system without an agent to gate.

* The Passepartout Advantage

| Dimension | Passepartout | Best Competitor | Gap |
|-----------|--------------|-----------------|-----|
| Safety model | Type-level gates + 11-vector deterministic stack | Claude Code (7 permission modes + 23 bash checks) | Structural vs heuristic. Passepartout's type-level gates prevent self-modification at the category level; competitors block patterns. |
| Knowledge model | Org-mode (tree, properties, TODOs, timestamps, cross-refs, IDs, tags) | Claude Code (flat markdown memdir) | Org-mode's semantic richness is ~15 primitives markdown doesn't have. |
| Memory integrity | Merkle tree + SHA-256 + rollback | Hermes (file-based); Claude Code (flat files + git) | Content-addressed, tamper-evident memory no competitor has. |
| Self-verification | ACL2 → CIC prover (planned) | None | No competitor does provable correctness. |
| Cognitive architecture | 10-80-10 symbolic-first (planned) | 100% LLM (every competitor) | Post-flip, Passepartout uses ~10% of the tokens competitors use. |
| Data format | Org-mode (human-editable, machine-parseable, single file) | JSONL/Markdown/YAML/DB (competitors use 2-5 formats) | Unified format reduces translation layers to zero. |
| Self-modification | Type-level gates + hot-reload | Claude Code (skills), Hermes (skills) | Passepartout's guard against self-modification is structural (type level), not heuristic (pattern list). |
| Triad | Passepartout + Stoa + Agora | None | No competitor is building a full computing stack + social network. |
| Provider independence | Any OpenAI-compatible API | Hermes (109+), Gemini CLI (1 primary) | Comparable to Hermes, better than most. |

* Where Competitors Lead

| Dimension | Leader | Passepartout Status |
|-----------|--------|---------------------|
| Safety implementation maturity | Claude Code (2,592 lines bash security) | Gate stack exists but bash validation is minimal in comparison |
| Provider breadth | Hermes (109+), OpenClaw (50+) | 8 providers — adequate but not competitive |
| Channel/platform support | OpenClaw (25+ channels) | TUI only — no multi-channel |
| Plugin ecosystem | OpenClaw (ClawHub, npm registry) | No plugin marketplace |
| Subagent delegation | Claude Code (fork with context inheritance) | Planned via Screamer planner |
| Codebase size / features shipped | All competitors have working products | v0.7.2 in development |
| MCP integration | Hermes, Codex (native), Continue | Planned v0.53.0 |
| Sandboxing | Codex CLI (Seatbelt+nsjail), Gemini CLI (6 methods) | None |
| Business model | Hermes (MIT+services), Codex (tokens) | AGPL + appliances + SaaS |
| Cross-platform | Claude Code (macOS/*nix), Codex (macOS) | Linux only |

* Strategic Positioning

Passepartout is not competing in the existing AI agent market. It is building a
new category: provable personal infrastructure.

Competitors optimize for:
- Token efficiency (Aider's edit formats, OpenCode's LSP integration)
- Model flexibility (Hermes' 109 providers)
- Platform reach (OpenClaw's 25 channels)
- UI polish (Gemini CLI's Ink/React, Claude Code's permission dialogs)
- Sandbox security (Codex's Seatbelt, Gemini's gVisor)

Passepartout optimizes for:
- Provable correctness (ACL2 → CIC)
- Data integrity (Merkle tree)
- Cognitive architecture (10-80-10 symbolic-first)
- Safety by construction (type-level gates)
- Unified data model (Org-mode as everything)
- Network effects (Agora)
- Full-stack ownership (Stoa)

These are not axes any competitor cares about. The risk is not that a competitor
builds a better Passepartout — it's that the market never develops a preference
for provable agents. If token-burning LLM agents remain the default and users
don't demand verification, the entire category Passepartout addresses may not
exist yet.

* Immediate Implications for Development

1. Claude Code's safety system is the benchmark to exceed. The type-level gate
   architecture is theoretically superior to Claude Code's heuristic patterns,
   but the implementation at v0.11.0 needs to prove it catches things Claude Code
   misses.

2. No competitor has anything resembling a neurosymbolic architecture. The 10-80-10
   plan has zero competition — but that also means zero market validation.

3. The Org-mode bet is invisible to competitors. They don't see the advantage
   because they've never tried to build a knowledge graph from flat markdown files.
   This is Passepartout's widest moat — it depends on a skill (Org-mode literate
   programming) that no competitor's team has.

4. Hermes is the closest full-stack competitor (tools, skills, cron, subagents,
   multi-platform), but architecturally conventional. For Hermes to match
   Passepartout, it would need to be rewritten.

5. The coding agents (Aider, OpenCode, Codex) are not competitors — they are
   single-purpose tools Passepartout could eventually replace entirely when the
   planner matures.

* File references

Repository dumps and analysis artifacts at /tmp/:
- /tmp/aider/ — Aider source (Python)
- /tmp/opencode/ — OpenCode archived source (Go)
- /tmp/codex/ — OpenAI Codex CLI (Rust)
- /tmp/claude-code-leaked-source/ — Claude Code leaked (TypeScript/Bun)
- /tmp/gemini-cli/ — Google Gemini CLI (TypeScript)
- /tmp/openclaw/ — OpenClaw source (TypeScript)
- /tmp/continue/ — Continue source (TypeScript)
- /usr/local/lib/hermes-agent/ — Hermes Agent (Python)