Add comprehensive competitive analysis of 8 AI agent platforms

2026-05-22 10:58:10 +00:00
parent 77b3bb6e6a
commit 7610d3a457
1 changed files with 364 additions and 0 deletions
--- a/ideas/competitive-analysis-2026-05.org
+++ b/ideas/competitive-analysis-2026-05.org
@@ -0,0 +1,364 @@
+:PROPERTIES:
+:ID:       3aa22300-2f25-57b0-8787-9f199cc978b1
+:CREATED:  [2026-05-22 Thu]
+:END:
+#+title: Competitive Analysis — AI Agent Landscape (May 2026)
+#+filetags: :passepartout:strategy:competitive:
+
+* Overview
+
+Analyzed 8 competitor codebases alongside Passepartout. The competitive landscape
+divides into three categories:
+
+1. Coding agents (Aider, OpenCode, Codex CLI, Claude Code, Gemini CLI)
+2. Personal AI assistants (Hermes, OpenClaw)
+3. CI/check-based systems (Continue)
+
+None of the eight compete with Passepartout on all axes simultaneously. Passepartout's
+strongest differentiators — Org-mode data model, deterministic gate stack, ACL2
+verification, Merkle-treed memory, and the triad architecture — are absent from
+every competitor.
+
+* Category 1: Coding Agents
+
+** Aider (Python, ~40K lines, MIT)
+
+Language: Python. ~6.8M pip installs. The oldest and most mature open-source
+coding agent.
+
+Architecture: Chat-based Coder class with 5 edit formats (diff, udiff, patch,
+whole, architect). Uses litellm for universal provider access (50+ providers).
+RepoMap provides codebase awareness via cosine-similarity embedding.
+
+Safety model: Purely prompt-based plus user-confirmation dialogs. No deterministic
+gate stack. No sandboxing. No model output validator. The allowed_to_edit() gate
+is a single user confirmation call. --yes flag auto-approves. Aider can edit its
+own source code with no special protection — self-modification is undetectable.
+
+Data model: Ad-hoc. Chat messages in memory. Git commits for persistence. RepoMap
+is a cosine-similarity index. No persistent memory across sessions. No knowledge
+graph.
+
+Self-modification: Full. No guard against editing its own files.
+
+Verification: None.
+
+Key gap vs Passepartout: No safety gates, no persistent memory model, no knowledge
+representation, no verification, no self-modification protection, no architecture
+for neurosymbolic reasoning. It is a thin shell around litellm + edit format
+parsers.
+
+** OpenCode / Crush (Go, ~42K lines, MIT)
+
+Now archived and succeeded by Crush (github.com/charmbracelet/crush). Go-based,
+Bubble Tea TUI.
+
+Architecture: Pubsub-driven layered architecture with LSP integration, 8+ provider
+clients (Anthropic, OpenAI, Gemini, Bedrock, Copilot, Azure, Vertex, Groq, xAI),
+hierarchical subagent delegation (child agents have read-only tools).
+
+Safety model: Hybrid prompt-based + deterministic permission gating. Permission
+dialog blocks on channel until user approves. Bash commands have a banned-list
+(no curl/wget/nc/telnet). Read-before-write invariant ensures edits only on
+freshly-read content.
+
+Data model: SQLite with 3 tables — sessions, messages (JSON parts column),
+files (versioned file history per session). Hierarchical sessions via
+parent_session_id.
+
+Self-modification: No protection against editing its own Go source.
+
+Verification: None.
+
+Key gap vs Passepartout: No safety gates, no knowledge graph, no Org-mode,
+no neurosymbolic architecture. The archived project status is a risk.
+
+** Codex CLI (OpenAI, Rust, ~950K lines)
+
+OpenAI's open-source coding agent. Rust, Sandboxed.
+
+Architecture: ~116 crate Rust workspace with a protocol layer (SQ/EQ session types),
+sandbox manager (macOS Seatbelt, Linux nsjail), multi-provider support (via defined
+protocol, not directly), configurable TUI.
+
+Safety model: Most sophisticated safety system of any coding agent analyzed.
+Multi-layer:
+- Process hardening (macOS Seatbelt with 4 profile tiers)
+- Execution policy engine (defined policy in execpolicy crate)
+- Sandboxing via nsjail on Linux, seatbelt on macOS
+- Guardian module for tool permission gating
+- No prompt-based safety — all deterministic through policy definitions
+
+Data model: Protocol-defined session types. Structured request/response models.
+Config through TOML files with schema validation.
+
+Self-modification: Protected by sandbox — the agent cannot escape to modify its
+own binary or config without explicit policy override.
+
+Verification: None (no proof system).
+
+Key gap vs Passepartout: No knowledge graph (Org or otherwise), no persistent
+memory model, no deterministic gate stack for agent behavior (only OS-level
+sandboxing), no ACL2/prover, no neurosymbolic architecture. Strongest sandbox
+but weakest cognitive architecture.
+
+** Claude Code (Anthropic, TypeScript/Bun, ~512K lines leaked)
+
+Anthropic's proprietary coding agent. Only available via leaked source analysis.
+Not open source.
+
+Architecture: Bun-bundled TypeScript single-file executable. Ink/React terminal UI.
+23+ core tools. Subagent forking with byte-identical API prefixes for prompt cache
+sharing. Multi-agent coordination mode.
+
+Safety model: Layered deterministic safety — NOT prompt-based:
+1. Permission mode system (7 modes: default, acceptEdits, bypassPermissions, etc.)
+2. Persistent permission rules (alwaysAllow, alwaysDeny, alwaysAsk, rule sources
+   from userSettings, projectSettings, localSettings, policySettings)
+3. Bash security validator — 2,592 lines of dedicated code with 23+ named
+   security checks using tree-sitter AST parsing
+4. Sandbox runtime for filesystem/network containment
+5. Path/mode validation
+6. Optional ML bash classifier (ant-only feature)
+
+This is the most sophisticated safety system of any coding agent. Passepartout's
+gate stack is architecturally similar (deterministic multi-layer) but Claude
+Code's implementation is vastly more mature — 2,592 lines of bash validation
+alone is ~50x the equivalent in Passepartout.
+
+Data model: File-based markdown memdir at ~/.claude/projects/<slug>/memory/.
+4 memory types: user, feedback, project, reference. YAML frontmatter in .md files.
+PROJECT.md and CLAUDE.md for project-level config. No database.
+
+Self-modification: HIGH. Skill system writes SKILL.md files that change future
+behavior. Plugin system, cron scheduling, agent spawning.
+
+Verification: None.
+
+Key gap vs Passepartout: No proof system, no neurosymbolic architecture, no
+self-verification, no persistent knowledge graph (flat markdown files, not
+Org-mode with cross-references), markdown data model lacks semantic depth.
+Proprietary — Anthropic controls it completely. Linux-only (uses macOS sandbox
+profiles natively). The permission rules system is impressive but structurally
+inferior to Passepartout's gate stack because rules are heuristic (regex-based
+pattern matching) rather than typed (type-level gates with structural guarantees).
+
+** Gemini CLI (Google, TypeScript, ~525K lines, Apache 2.0)
+
+Google's open-source coding agent. Node.js 20+, Ink/React TUI.
+
+Architecture: 7-package npm monorepo. Core backend handles Gemini API orchestration,
+tool execution, policy engine, safety checks, sandbox management, session management,
+MCP client. 7-strategy composite model routing chain.
+
+Safety model: Multi-layered:
+1. CONSECA (Contextual Security Checker) — AI-driven per-request policy generation
+   using a separate Gemini Flash model. Principle of least privilege.
+2. Policy engine — 4 approval modes (PLAN, DEFAULT, AUTO_EDIT, YOLO), hierarchical
+   rules with priority scores and wildcard matching
+3. 6 sandbox methods (macOS Seatbelt, Docker/Podman, bwrap, gVisor, LXC, Windows)
+4. Trusted folders with discovery phase and path traversal protection
+5. Policy integrity verification via cryptographic hashes
+6. Built-in safety checkers (AllowedPathChecker, CONSECA)
+7. Loop detection service
+
+Data model: JSONL session files. Turn-based conversation model. 4-layer config
+precedence (system-defaults → user → project → system-override). TOML policy files.
+
+Self-modification: Modifiable hooks system, MCP extensions, custom commands.
+Core binaries are protected on disk by file permissions.
+
+Verification: None.
+
+Key gap vs Passepartout: No proof system, no persistent knowledge graph, no
+self-verification, no neurosymbolic architecture, lock-in to Google Gemini models
+(though it can use others via routing). The CONSECA approach is interesting
+(AI-generated policies) but introduces a second LLM call for every security
+decision — the opposite of Passepartout's approach of zero-token deterministic gating.
+
+* Category 2: Personal AI Assistants
+
+** Hermes Agent (Python, ~17K core, MIT)
+
+The agent running this conversation. Python, OpenAI-format conversations.
+
+Architecture: Synchronous conversation loop with OpenAI-format messages. 60+
+built-in tools. 109+ providers via pluggable transport layer. 15+ messaging
+platforms via gateway. MCP client (native, not bridge). Ink/React TUI as Node.js
+subprocess. Cron jobs, Kanban board, subagent delegation.
+
+Safety model: Multi-layer but NOT a deterministic gate stack:
+1. Message sanitization (surrogates, control chars, malformed JSON)
+2. Tirith binary scanner (pre-execution terminal command analysis)
+3. Command approval system (manual/smart/off modes)
+4. Memory injection detection (prompt injection pattern matching)
+5. Secret/PII redaction
+6. Tool call guardrails (loop detection)
+7. MCP security (env filtering, credential stripping)
+8. Context fencing (memory injection span scrubbing)
+
+These are all heuristic or prompt-based — no structural type-level gates.
+Tirith is a separate binary, not in-process. The approval system is good but
+reactive (LLM proposes → system blocks) rather than preventive (type system
+prevents by construction).
+
+Data model: SQLite session DB (FTS5 full-text search). File-based memory
+(MEMORY.md + USER.md). YAML config. No knowledge graph. No Org-mode.
+
+Self-modification: Skill system writes SKILL.md files. Memory tool edits
+MEMORY.md/USER.md. Config YAML editable. Core Python code is read-only in
+execution but the LLM could request modifications to its own source files
+(no gate specifically prevents this).
+
+Verification: None.
+
+Key gap vs Passepartout: No deterministic gate stack (heuristic layers, not
+structural/typed), no knowledge graph, no Org-mode, no neurosymbolic architecture,
+no self-verification, no proof system. Hermes's strength is breadth —
+109 providers, 15 platforms, MCP ecosystem, big tool surface. But it has no
+depth in safety, knowledge representation, or reasoning architecture.
+
+** OpenClaw (TypeScript/Node.js, ~3.5M lines)
+
+The largest codebase analyzed. Personal AI assistant with 25+ messaging channel
+support.
+
+Architecture: pnpm workspace with ~135 bundled plugins. Gateway control plane
+routes messages through multi-agent routing. Per-agent sessions, workspaces,
+skill registries. Companion native apps (macOS, iOS, Android).
+
+Safety model: Tiered — main agent runs tools directly on host (trusted-operator),
+non-main sessions sandboxed via Docker (read-only rootfs, capability dropping,
+seccomp/AppArmor, memory/cpu/PID limits, SSH/OpenShell backends).
+
+Data model: Typed JSON/YAML config (openclaw.json). Multi-source model catalog.
+Plugin SDK with narrow typed subpath exports.
+
+Self-modification: ACP (Agent Control Protocol) for spawning child sessions.
+Skill system with npm distribution and ClawHub registry.
+
+Verification: None.
+
+Key gap vs Passepartout: Same as Hermes — no gate stack, no knowledge graph,
+no Org-mode, no verification, no neurosymbolic architecture. Differentiated by
+vastly broader channel support and mature plugin ecosystem. But architecturally
+conventional — LLM + tools + channels, no cognitive architecture innovation.
+
+* Category 3: CI/Check Systems
+
+** Continue (TypeScript, ~328K lines, Apache 2.0)
+
+Source-controlled AI checks for CI/CD. Markdown-as-gate-policy.
+
+Architecture: Shared core (@continuedev/core) with ~80 provider implementations,
+tool-calling engine, config system (YAML/JSON/Markdown). Serves CLI (Ink/React TUI
+ headless CI mode), IDE extensions (VS Code, JetBrains), web dashboard.
+
+Safety model: Three permission levels (allow/ask/exclude). Precedence: mode policies
+→ CLI flags → permissions.yaml → built-in defaults. Terminal security package for
+shell command analysis via shell-quote parsing. Workspace-scoped file access.
+
+Data model: Markdown files for checks, agents, rules. Source-controlled in-repo.
+YAML frontmatter for metadata.
+
+Self-modification: Checks source-controlled — any change goes through git.
+
+Verification: None (the checks are themselves unverified).
+
+Key gap vs Passepartout: The "checks as markdown" concept is philosophically
+similar to Passepartout's gate rules (deterministic policies checked before
+execution) but the implementation is dramatically simpler — regex-based policy
+objects, not a type-level gate stack with structural guarantees. No persistent
+agent, no memory, no knowledge graph, no neurosymbolic architecture. It is a
+gate system without an agent to gate.
+
+* The Passepartout Advantage
+
+| Dimension | Passepartout | Best Competitor | Gap |
+|-----------|--------------|-----------------|-----|
+| Safety model | Type-level gates + 11-vector deterministic stack | Claude Code (7 permission modes + 23 bash checks) | Structural vs heuristic. Passepartout's type-level gates prevent self-modification at the category level; competitors block patterns. |
+| Knowledge model | Org-mode (tree, properties, TODOs, timestamps, cross-refs, IDs, tags) | Claude Code (flat markdown memdir) | Org-mode's semantic richness is ~15 primitives markdown doesn't have. |
+| Memory integrity | Merkle tree + SHA-256 + rollback | Hermes (file-based); Claude Code (flat files + git) | Content-addressed, tamper-evident memory no competitor has. |
+| Self-verification | ACL2 → CIC prover (planned) | None | No competitor does provable correctness. |
+| Cognitive architecture | 10-80-10 symbolic-first (planned) | 100% LLM (every competitor) | Post-flip, Passepartout uses ~10% of the tokens competitors use. |
+| Data format | Org-mode (human-editable, machine-parseable, single file) | JSONL/Markdown/YAML/DB (competitors use 2-5 formats) | Unified format reduces translation layers to zero. |
+| Self-modification | Type-level gates + hot-reload | Claude Code (skills), Hermes (skills) | Passepartout's guard against self-modification is structural (type level), not heuristic (pattern list). |
+| Triad | Passepartout + Stoa + Agora | None | No competitor is building a full computing stack + social network. |
+| Provider independence | Any OpenAI-compatible API | Hermes (109+), Gemini CLI (1 primary) | Comparable to Hermes, better than most. |
+
+* Where Competitors Lead
+
+| Dimension | Leader | Passepartout Status |
+|-----------|--------|---------------------|
+| Safety implementation maturity | Claude Code (2,592 lines bash security) | Gate stack exists but bash validation is minimal in comparison |
+| Provider breadth | Hermes (109+), OpenClaw (50+) | 8 providers — adequate but not competitive |
+| Channel/platform support | OpenClaw (25+ channels) | TUI only — no multi-channel |
+| Plugin ecosystem | OpenClaw (ClawHub, npm registry) | No plugin marketplace |
+| Subagent delegation | Claude Code (fork with context inheritance) | Planned via Screamer planner |
+| Codebase size / features shipped | All competitors have working products | v0.7.2 in development |
+| MCP integration | Hermes, Codex (native), Continue | Planned v0.53.0 |
+| Sandboxing | Codex CLI (Seatbelt+nsjail), Gemini CLI (6 methods) | None |
+| Business model | Hermes (MIT+services), Codex (tokens) | AGPL + appliances + SaaS |
+| Cross-platform | Claude Code (macOS/*nix), Codex (macOS) | Linux only |
+
+* Strategic Positioning
+
+Passepartout is not competing in the existing AI agent market. It is building a
+new category: provable personal infrastructure.
+
+Competitors optimize for:
+- Token efficiency (Aider's edit formats, OpenCode's LSP integration)
+- Model flexibility (Hermes' 109 providers)
+- Platform reach (OpenClaw's 25 channels)
+- UI polish (Gemini CLI's Ink/React, Claude Code's permission dialogs)
+- Sandbox security (Codex's Seatbelt, Gemini's gVisor)
+
+Passepartout optimizes for:
+- Provable correctness (ACL2 → CIC)
+- Data integrity (Merkle tree)
+- Cognitive architecture (10-80-10 symbolic-first)
+- Safety by construction (type-level gates)
+- Unified data model (Org-mode as everything)
+- Network effects (Agora)
+- Full-stack ownership (Stoa)
+
+These are not axes any competitor cares about. The risk is not that a competitor
+builds a better Passepartout — it's that the market never develops a preference
+for provable agents. If token-burning LLM agents remain the default and users
+don't demand verification, the entire category Passepartout addresses may not
+exist yet.
+
+* Immediate Implications for Development
+
+1. Claude Code's safety system is the benchmark to exceed. The type-level gate
+   architecture is theoretically superior to Claude Code's heuristic patterns,
+   but the implementation at v0.11.0 needs to prove it catches things Claude Code
+   misses.
+
+2. No competitor has anything resembling a neurosymbolic architecture. The 10-80-10
+   plan has zero competition — but that also means zero market validation.
+
+3. The Org-mode bet is invisible to competitors. They don't see the advantage
+   because they've never tried to build a knowledge graph from flat markdown files.
+   This is Passepartout's widest moat — it depends on a skill (Org-mode literate
+   programming) that no competitor's team has.
+
+4. Hermes is the closest full-stack competitor (tools, skills, cron, subagents,
+   multi-platform), but architecturally conventional. For Hermes to match
+   Passepartout, it would need to be rewritten.
+
+5. The coding agents (Aider, OpenCode, Codex) are not competitors — they are
+   single-purpose tools Passepartout could eventually replace entirely when the
+   planner matures.
+
+* File references
+
+Repository dumps and analysis artifacts at /tmp/:
+- /tmp/aider/ — Aider source (Python)
+- /tmp/opencode/ — OpenCode archived source (Go)  
+- /tmp/codex/ — OpenAI Codex CLI (Rust)
+- /tmp/claude-code-leaked-source/ — Claude Code leaked (TypeScript/Bun)
+- /tmp/gemini-cli/ — Google Gemini CLI (TypeScript)
+- /tmp/openclaw/ — OpenClaw source (TypeScript)
+- /tmp/continue/ — Continue source (TypeScript)
+- /usr/local/lib/hermes-agent/ — Hermes Agent (Python)