amr/hermes-brain

Fork 0

Files

Hermes 7610d3a457 Add comprehensive competitive analysis of 8 AI agent platforms

2026-05-22 10:58:10 +00:00

18 KiB

Raw Blame History

Competitive Analysis — AI Agent Landscape (May 2026)

Overview
Category 1: Coding Agents
Category 2: Personal AI Assistants
- Hermes Agent (Python, ~17K core, MIT)
- OpenClaw (TypeScript/Node.js, ~3.5M lines)
Category 3: CI/Check Systems
- Continue (TypeScript, ~328K lines, Apache 2.0)
The Passepartout Advantage
Where Competitors Lead
Strategic Positioning
Immediate Implications for Development
File references

Overview

Analyzed 8 competitor codebases alongside Passepartout. The competitive landscape divides into three categories:

Coding agents (Aider, OpenCode, Codex CLI, Claude Code, Gemini CLI)
Personal AI assistants (Hermes, OpenClaw)
CI/check-based systems (Continue)

None of the eight compete with Passepartout on all axes simultaneously. Passepartout's strongest differentiators — Org-mode data model, deterministic gate stack, ACL2 verification, Merkle-treed memory, and the triad architecture — are absent from every competitor.

Category 1: Coding Agents

Aider (Python, ~40K lines, MIT)

Language: Python. ~6.8M pip installs. The oldest and most mature open-source coding agent.

Architecture: Chat-based Coder class with 5 edit formats (diff, udiff, patch, whole, architect). Uses litellm for universal provider access (50+ providers). RepoMap provides codebase awareness via cosine-similarity embedding.

Safety model: Purely prompt-based plus user-confirmation dialogs. No deterministic gate stack. No sandboxing. No model output validator. The allowed_to_edit() gate is a single user confirmation call. –yes flag auto-approves. Aider can edit its own source code with no special protection — self-modification is undetectable.

Data model: Ad-hoc. Chat messages in memory. Git commits for persistence. RepoMap is a cosine-similarity index. No persistent memory across sessions. No knowledge graph.

Self-modification: Full. No guard against editing its own files.

Verification: None.

Key gap vs Passepartout: No safety gates, no persistent memory model, no knowledge representation, no verification, no self-modification protection, no architecture for neurosymbolic reasoning. It is a thin shell around litellm + edit format parsers.

OpenCode / Crush (Go, ~42K lines, MIT)

Now archived and succeeded by Crush (github.com/charmbracelet/crush). Go-based, Bubble Tea TUI.

Architecture: Pubsub-driven layered architecture with LSP integration, 8+ provider clients (Anthropic, OpenAI, Gemini, Bedrock, Copilot, Azure, Vertex, Groq, xAI), hierarchical subagent delegation (child agents have read-only tools).

Safety model: Hybrid prompt-based + deterministic permission gating. Permission dialog blocks on channel until user approves. Bash commands have a banned-list (no curl/wget/nc/telnet). Read-before-write invariant ensures edits only on freshly-read content.

Data model: SQLite with 3 tables — sessions, messages (JSON parts column), files (versioned file history per session). Hierarchical sessions via parent_session_id.

Self-modification: No protection against editing its own Go source.

Verification: None.

Key gap vs Passepartout: No safety gates, no knowledge graph, no Org-mode, no neurosymbolic architecture. The archived project status is a risk.

Codex CLI (OpenAI, Rust, ~950K lines)

OpenAI's open-source coding agent. Rust, Sandboxed.

Architecture: ~116 crate Rust workspace with a protocol layer (SQ/EQ session types), sandbox manager (macOS Seatbelt, Linux nsjail), multi-provider support (via defined protocol, not directly), configurable TUI.

Safety model: Most sophisticated safety system of any coding agent analyzed. Multi-layer:

Process hardening (macOS Seatbelt with 4 profile tiers)
Execution policy engine (defined policy in execpolicy crate)
Sandboxing via nsjail on Linux, seatbelt on macOS
Guardian module for tool permission gating
No prompt-based safety — all deterministic through policy definitions

Data model: Protocol-defined session types. Structured request/response models. Config through TOML files with schema validation.

Self-modification: Protected by sandbox — the agent cannot escape to modify its own binary or config without explicit policy override.

Verification: None (no proof system).

Key gap vs Passepartout: No knowledge graph (Org or otherwise), no persistent memory model, no deterministic gate stack for agent behavior (only OS-level sandboxing), no ACL2/prover, no neurosymbolic architecture. Strongest sandbox but weakest cognitive architecture.

Claude Code (Anthropic, TypeScript/Bun, ~512K lines leaked)

Anthropic's proprietary coding agent. Only available via leaked source analysis. Not open source.

Architecture: Bun-bundled TypeScript single-file executable. Ink/React terminal UI. 23+ core tools. Subagent forking with byte-identical API prefixes for prompt cache sharing. Multi-agent coordination mode.

Safety model: Layered deterministic safety — NOT prompt-based:

Permission mode system (7 modes: default, acceptEdits, bypassPermissions, etc.)
Persistent permission rules (alwaysAllow, alwaysDeny, alwaysAsk, rule sources from userSettings, projectSettings, localSettings, policySettings)
Bash security validator — 2,592 lines of dedicated code with 23+ named security checks using tree-sitter AST parsing
Sandbox runtime for filesystem/network containment
Path/mode validation
Optional ML bash classifier (ant-only feature)

This is the most sophisticated safety system of any coding agent. Passepartout's gate stack is architecturally similar (deterministic multi-layer) but Claude Code's implementation is vastly more mature — 2,592 lines of bash validation alone is ~50x the equivalent in Passepartout.

Data model: File-based markdown memdir at ~/.claude/projects/<slug>/memory/. 4 memory types: user, feedback, project, reference. YAML frontmatter in .md files. PROJECT.md and CLAUDE.md for project-level config. No database.

Self-modification: HIGH. Skill system writes SKILL.md files that change future behavior. Plugin system, cron scheduling, agent spawning.

Verification: None.

Key gap vs Passepartout: No proof system, no neurosymbolic architecture, no self-verification, no persistent knowledge graph (flat markdown files, not Org-mode with cross-references), markdown data model lacks semantic depth. Proprietary — Anthropic controls it completely. Linux-only (uses macOS sandbox profiles natively). The permission rules system is impressive but structurally inferior to Passepartout's gate stack because rules are heuristic (regex-based pattern matching) rather than typed (type-level gates with structural guarantees).

Gemini CLI (Google, TypeScript, ~525K lines, Apache 2.0)

Google's open-source coding agent. Node.js 20+, Ink/React TUI.

Architecture: 7-package npm monorepo. Core backend handles Gemini API orchestration, tool execution, policy engine, safety checks, sandbox management, session management, MCP client. 7-strategy composite model routing chain.

Safety model: Multi-layered:

CONSECA (Contextual Security Checker) — AI-driven per-request policy generation using a separate Gemini Flash model. Principle of least privilege.
Policy engine — 4 approval modes (PLAN, DEFAULT, AUTO_EDIT, YOLO), hierarchical rules with priority scores and wildcard matching
6 sandbox methods (macOS Seatbelt, Docker/Podman, bwrap, gVisor, LXC, Windows)
Trusted folders with discovery phase and path traversal protection
Policy integrity verification via cryptographic hashes
Built-in safety checkers (AllowedPathChecker, CONSECA)
Loop detection service

Data model: JSONL session files. Turn-based conversation model. 4-layer config precedence (system-defaults → user → project → system-override). TOML policy files.

Self-modification: Modifiable hooks system, MCP extensions, custom commands. Core binaries are protected on disk by file permissions.

Verification: None.

Key gap vs Passepartout: No proof system, no persistent knowledge graph, no self-verification, no neurosymbolic architecture, lock-in to Google Gemini models (though it can use others via routing). The CONSECA approach is interesting (AI-generated policies) but introduces a second LLM call for every security decision — the opposite of Passepartout's approach of zero-token deterministic gating.

Category 2: Personal AI Assistants

Hermes Agent (Python, ~17K core, MIT)

The agent running this conversation. Python, OpenAI-format conversations.

Architecture: Synchronous conversation loop with OpenAI-format messages. 60+ built-in tools. 109+ providers via pluggable transport layer. 15+ messaging platforms via gateway. MCP client (native, not bridge). Ink/React TUI as Node.js subprocess. Cron jobs, Kanban board, subagent delegation.

Safety model: Multi-layer but NOT a deterministic gate stack:

Message sanitization (surrogates, control chars, malformed JSON)
Tirith binary scanner (pre-execution terminal command analysis)
Command approval system (manual/smart/off modes)
Memory injection detection (prompt injection pattern matching)
Secret/PII redaction
Tool call guardrails (loop detection)
MCP security (env filtering, credential stripping)
Context fencing (memory injection span scrubbing)

These are all heuristic or prompt-based — no structural type-level gates. Tirith is a separate binary, not in-process. The approval system is good but reactive (LLM proposes → system blocks) rather than preventive (type system prevents by construction).

Data model: SQLite session DB (FTS5 full-text search). File-based memory (MEMORY.md + USER.md). YAML config. No knowledge graph. No Org-mode.

Self-modification: Skill system writes SKILL.md files. Memory tool edits MEMORY.md/USER.md. Config YAML editable. Core Python code is read-only in execution but the LLM could request modifications to its own source files (no gate specifically prevents this).

Verification: None.

Key gap vs Passepartout: No deterministic gate stack (heuristic layers, not structural/typed), no knowledge graph, no Org-mode, no neurosymbolic architecture, no self-verification, no proof system. Hermes's strength is breadth — 109 providers, 15 platforms, MCP ecosystem, big tool surface. But it has no depth in safety, knowledge representation, or reasoning architecture.

OpenClaw (TypeScript/Node.js, ~3.5M lines)

The largest codebase analyzed. Personal AI assistant with 25+ messaging channel support.

Architecture: pnpm workspace with ~135 bundled plugins. Gateway control plane routes messages through multi-agent routing. Per-agent sessions, workspaces, skill registries. Companion native apps (macOS, iOS, Android).

Safety model: Tiered — main agent runs tools directly on host (trusted-operator), non-main sessions sandboxed via Docker (read-only rootfs, capability dropping, seccomp/AppArmor, memory/cpu/PID limits, SSH/OpenShell backends).

Data model: Typed JSON/YAML config (openclaw.json). Multi-source model catalog. Plugin SDK with narrow typed subpath exports.

Self-modification: ACP (Agent Control Protocol) for spawning child sessions. Skill system with npm distribution and ClawHub registry.

Verification: None.

Key gap vs Passepartout: Same as Hermes — no gate stack, no knowledge graph, no Org-mode, no verification, no neurosymbolic architecture. Differentiated by vastly broader channel support and mature plugin ecosystem. But architecturally conventional — LLM + tools + channels, no cognitive architecture innovation.

Category 3: CI/Check Systems

Continue (TypeScript, ~328K lines, Apache 2.0)

Source-controlled AI checks for CI/CD. Markdown-as-gate-policy.

Architecture: Shared core (@continuedev/core) with ~80 provider implementations, tool-calling engine, config system (YAML/JSON/Markdown). Serves CLI (Ink/React TUI

headless CI mode), IDE extensions (VS Code, JetBrains), web dashboard.

Safety model: Three permission levels (allow/ask/exclude). Precedence: mode policies → CLI flags → permissions.yaml → built-in defaults. Terminal security package for shell command analysis via shell-quote parsing. Workspace-scoped file access.

Data model: Markdown files for checks, agents, rules. Source-controlled in-repo. YAML frontmatter for metadata.

Self-modification: Checks source-controlled — any change goes through git.

Verification: None (the checks are themselves unverified).

Key gap vs Passepartout: The "checks as markdown" concept is philosophically similar to Passepartout's gate rules (deterministic policies checked before execution) but the implementation is dramatically simpler — regex-based policy objects, not a type-level gate stack with structural guarantees. No persistent agent, no memory, no knowledge graph, no neurosymbolic architecture. It is a gate system without an agent to gate.

The Passepartout Advantage

Dimension	Passepartout	Best Competitor	Gap
Safety model	Type-level gates + 11-vector deterministic stack	Claude Code (7 permission modes + 23 bash checks)	Structural vs heuristic. Passepartout's type-level gates prevent self-modification at the category level; competitors block patterns.
Knowledge model	Org-mode (tree, properties, TODOs, timestamps, cross-refs, IDs, tags)	Claude Code (flat markdown memdir)	Org-mode's semantic richness is ~15 primitives markdown doesn't have.
Memory integrity	Merkle tree + SHA-256 + rollback	Hermes (file-based); Claude Code (flat files + git)	Content-addressed, tamper-evident memory no competitor has.
Self-verification	ACL2 → CIC prover (planned)	None	No competitor does provable correctness.
Cognitive architecture	10-80-10 symbolic-first (planned)	100% LLM (every competitor)	Post-flip, Passepartout uses ~10% of the tokens competitors use.
Data format	Org-mode (human-editable, machine-parseable, single file)	JSONL/Markdown/YAML/DB (competitors use 2-5 formats)	Unified format reduces translation layers to zero.
Self-modification	Type-level gates + hot-reload	Claude Code (skills), Hermes (skills)	Passepartout's guard against self-modification is structural (type level), not heuristic (pattern list).
Triad	Passepartout + Stoa + Agora	None	No competitor is building a full computing stack + social network.
Provider independence	Any OpenAI-compatible API	Hermes (109+), Gemini CLI (1 primary)	Comparable to Hermes, better than most.

Where Competitors Lead

Dimension	Leader	Passepartout Status
Safety implementation maturity	Claude Code (2,592 lines bash security)	Gate stack exists but bash validation is minimal in comparison
Provider breadth	Hermes (109+), OpenClaw (50+)	8 providers — adequate but not competitive
Channel/platform support	OpenClaw (25+ channels)	TUI only — no multi-channel
Plugin ecosystem	OpenClaw (ClawHub, npm registry)	No plugin marketplace
Subagent delegation	Claude Code (fork with context inheritance)	Planned via Screamer planner
Codebase size / features shipped	All competitors have working products	v0.7.2 in development
MCP integration	Hermes, Codex (native), Continue	Planned v0.53.0
Sandboxing	Codex CLI (Seatbelt+nsjail), Gemini CLI (6 methods)	None
Business model	Hermes (MIT+services), Codex (tokens)	AGPL + appliances + SaaS
Cross-platform	Claude Code (macOS/*nix), Codex (macOS)	Linux only

Strategic Positioning

Passepartout is not competing in the existing AI agent market. It is building a new category: provable personal infrastructure.

Competitors optimize for:

Token efficiency (Aider's edit formats, OpenCode's LSP integration)
Model flexibility (Hermes' 109 providers)
Platform reach (OpenClaw's 25 channels)
UI polish (Gemini CLI's Ink/React, Claude Code's permission dialogs)
Sandbox security (Codex's Seatbelt, Gemini's gVisor)

Passepartout optimizes for:

Provable correctness (ACL2 → CIC)
Data integrity (Merkle tree)
Cognitive architecture (10-80-10 symbolic-first)
Safety by construction (type-level gates)
Unified data model (Org-mode as everything)
Network effects (Agora)
Full-stack ownership (Stoa)

These are not axes any competitor cares about. The risk is not that a competitor builds a better Passepartout — it's that the market never develops a preference for provable agents. If token-burning LLM agents remain the default and users don't demand verification, the entire category Passepartout addresses may not exist yet.

Immediate Implications for Development

Claude Code's safety system is the benchmark to exceed. The type-level gate architecture is theoretically superior to Claude Code's heuristic patterns, but the implementation at v0.11.0 needs to prove it catches things Claude Code misses.
No competitor has anything resembling a neurosymbolic architecture. The 10-80-10 plan has zero competition — but that also means zero market validation.
The Org-mode bet is invisible to competitors. They don't see the advantage because they've never tried to build a knowledge graph from flat markdown files. This is Passepartout's widest moat — it depends on a skill (Org-mode literate programming) that no competitor's team has.
Hermes is the closest full-stack competitor (tools, skills, cron, subagents, multi-platform), but architecturally conventional. For Hermes to match Passepartout, it would need to be rewritten.
The coding agents (Aider, OpenCode, Codex) are not competitors — they are single-purpose tools Passepartout could eventually replace entirely when the planner matures.

File references

Repository dumps and analysis artifacts at tmp:

tmp/aider — Aider source (Python)
tmp/opencode — OpenCode archived source (Go)
tmp/codex — OpenAI Codex CLI (Rust)
tmp/claude-code-leaked-source — Claude Code leaked (TypeScript/Bun)
tmp/gemini-cli — Google Gemini CLI (TypeScript)
tmp/openclaw — OpenClaw source (TypeScript)
tmp/continue — Continue source (TypeScript)
usr/local/lib/hermes-agent — Hermes Agent (Python)

18 KiB Raw Blame History Unescape Escape

Competitive Analysis — AI Agent Landscape (May 2026)

Overview

Category 1: Coding Agents

Aider (Python, ~40K lines, MIT)

OpenCode / Crush (Go, ~42K lines, MIT)

Codex CLI (OpenAI, Rust, ~950K lines)

Claude Code (Anthropic, TypeScript/Bun, ~512K lines leaked)

Gemini CLI (Google, TypeScript, ~525K lines, Apache 2.0)

Category 2: Personal AI Assistants

Hermes Agent (Python, ~17K core, MIT)

OpenClaw (TypeScript/Node.js, ~3.5M lines)

Category 3: CI/Check Systems

Continue (TypeScript, ~328K lines, Apache 2.0)

The Passepartout Advantage

Where Competitors Lead

Strategic Positioning

Immediate Implications for Development

File references

18 KiB

Raw Blame History