Add Thoth to competitive analysis; refine compute marketplace thesis

- Thoth: new Category 2 entry (Personal AI Assistants), LangGraph ReAct agent with knowledge graph, Developer/Designer studios, 151K LOC - Compute marketplace: answer the structural question 'why buy compute if every user runs their own Passepartout?' — three structural reasons: specialized proof libraries, certification weight, bootstrap verification
2026-05-23 05:36:27 +00:00
parent b2ec6e9c65
commit 7b2ea7f28d
2 changed files with 99 additions and 3 deletions
--- a/ideas/competitive-analysis-2026-05.org
+++ b/ideas/competitive-analysis-2026-05.org
@@ -7,14 +7,14 @@
 * Overview
-Analyzed 8 competitor codebases alongside Passepartout. The competitive landscape
+Analyzed 9 competitor codebases alongside Passepartout. The competitive landscape
 divides into three categories:
 1. Coding agents (Aider, OpenCode, Codex CLI, Claude Code, Gemini CLI)
-2. Personal AI assistants (Hermes, OpenClaw)
+2. Personal AI assistants (Hermes, OpenClaw, Thoth)
 3. CI/check-based systems (Continue)
-None of the eight compete with Passepartout on all axes simultaneously. Passepartout's
+None of the nine compete with Passepartout on all axes simultaneously. Passepartout's
 strongest differentiators — Org-mode data model, deterministic gate stack, ACL2
 verification, Merkle-treed memory, and the triad architecture — are absent from
 every competitor.
@@ -263,6 +263,85 @@ no Org-mode, no verification, no neurosymbolic architecture. Differentiated by
 vastly broader channel support and mature plugin ecosystem. But architecturally
 conventional — LLM + tools + channels, no cognitive architecture innovation.
 ** Thoth (Python, ~151K lines, Apache 2.0)
 https://github.com/siddsachar/Thoth — Personal AI Sovereignty. Local-first
 desktop AI assistant with knowledge graph, tools, voice, vision, shell,
 browser automation, workflow engine, and messaging channels.
 Architecture: LangGraph create_react_agent (prebuilt ReAct pattern). Dual-mode
 streaming via agent.stream(). NiceGUI web UI served by Python app.py with
 desktop launcher (tray icon, Ollama auto-start, browser/OS window). Context
 trimming via tiktoken to ~85% of model window, base64 data redaction, stale
 browser snapshot compression (keeps last 8), MD5 tool result dedup, old tool
 result summarization. 50-step recursion limit (chat), 100 (tasks), 120 (Developer
 Studio). Agent graph cached by tool set + model override. Checkpoints via
 LangGraph's SQLite-backed checkpointer. 30+ tool modules.
 Safety model: Shell command classification (tools/shell_tool.py) with 17 blocked
 patterns (rm -rf /, mkfs, dd of=/dev/, shutdown, fork bombs, pipe-to-bash, etc.),
 30+ safe auto-execute prefixes (ls, cat, grep, git status, etc.), needs-approval
 for compound commands (;, &&, ||, |, $(), backticks). Interactive interrupt() for
 non-safe shell — LangGraph human-in-the-loop pauses the graph. Per-workflow safety
 modes: block (default, refuse non-safe), approve (pause), allow_all.
 Prompt-injection defense: scans tool outputs and user inputs for 5 categories
 (role overrides, instruction hijacking, data exfiltration, invisible unicode,
 hidden HTML directives) — detection-only, no stripping. Filesystem workspace
 boundary (~/Documents/Thoth). Opt-in Docker Sandbox for Developer Studio.
 Destructive ops (file delete, moderate shell, Gmail send, calendar delete,
 memory/task/tracker delete) require confirmation. MCP servers disabled until
 tested. Custom Tools reviewed and promoted. No sandboxing of agent runtime
 itself — agent runs in-process. No response-level guardrails.
 Data model: SQLite (WAL mode) at ~/.thoth/memory.db — shared between knowledge
 graph and legacy memory. Knowledge graph: SQLite (durable) + NetworkX MultiDiGraph
 (in-memory, rebuilt on startup) + FAISS vector index (semantic recall, rebuilt on
 every entity write). 11 entity types (person, preference, fact, event, place,
 project, organisation, concept, skill, media, self_knowledge). 67+ typed relations
 with 30+ LLM-produced aliases mapped to canonical forms. Dream Cycle refinement
 pipeline for entity dedup/merge/stale-confidence decay. Config: JSON files
 (skills_config.json, api_keys.json, providers.json, channels_config.json). Keys in
 OS credential store (Windows Credential Manager, macOS Keychain, Linux Secret
 Service/KWallet). Memory extraction background daemon scanning past conversations
 every ~2 hours.
 Self-modification: Agent CAN create/update/delete skills via dedicated tools
 (thoth_create_skill, thoth_patch_skill, thoth_delete_skill). SKILL.md files with
 YAML frontmatter at ~/.thoth/skills/. Bundled skills (read-only) at app root;
 user skills override by name. Skill patching requires user confirmation + auto
 backup. Maximum 1 patch proposal per conversation. Tool guides cannot be patched.
 Self-knowledge block injected into system prompt. No tool to modify agent.py,
 prompts.py, or system prompt directly. Developer Studio provides code editing
 through approval-gated tools (tool-assisted human workflow, not agent self-mod).
 Verification: None formal. Update signature verification (updater.py).
 Comprehensive test suite at tests/test_suite.py. No tool-call verification beyond
 LangGraph schema validation. No output verification or fact-checking.
 Key differentiators vs other assistants: LangGraph ReAct agent with structured
 streaming event model. Personal knowledge graph (11 entity types, 67 relations,
 NetworkX + FAISS). Developer Studio (Docker sandbox, code threads, Git operations,
 approval modes). Designer Studio (decks, documents, landing pages, sandboxed
 interactive runtime). 5 messaging channels (Telegram, Discord, Slack, WhatsApp,
 SMS) with streaming, reactions, media processing. Background workflow engine
 (schedules, webhooks, step pipelines, conditions, approvals, concurrency groups).
 30+ tool modules including browser automation, shell, Gmail, Calendar, X, image/
 video generation. 39 curated Ollama tool-calling models. 10 LLM providers (Ollama,
 OpenAI, Anthropic, Google AI/Gemini, xAI/Grok, MiniMax, OpenRouter, Ollama Cloud,
 ChatGPT/Codex subscription, custom endpoints). MCP client (stdio, Streamable HTTP,
 SSE) with namespaced tools, approval gates. No accounts, no telemetry, no hosted
 server. Local-first with OS credential store.
 Key gap vs Passepartout: No deterministic gate stack — shell safety is pattern
 list (17 blocked, 30 safe), not typed gates. No sandboxed agent runtime. No
 proof system. No output guardrails. No neurosymbolic architecture. No Org-mode.
 No Merkle-tree memory. Knowledge graph (SQLite+FAISS) is richer than Hermes but
 is LLM-driven entity extraction — no structural integrity guarantees. Thoth's
 differentiation from Hermes/OpenClaw is the knowledge graph + Developer/Designer
 studios + embedded LangGraph framework — a broader product scope, but still
 architecturally conventional (LLM + tools + channels + KG), not a new cognitive
 architecture.
 * Category 3: CI/Check Systems
 ** Continue (TypeScript, ~328K lines, Apache 2.0)
@@ -379,5 +458,6 @@ Repository dumps and analysis artifacts at /tmp/:
 - /tmp/claude-code-leaked-source/ — Claude Code leaked (TypeScript/Bun)
 - /tmp/gemini-cli/ — Google Gemini CLI (TypeScript)
 - /tmp/openclaw/ — OpenClaw source (TypeScript)
 - /tmp/thoth/ — Thoth source (Python)
 - /tmp/continue/ — Continue source (TypeScript)
 - /usr/local/lib/hermes-agent/ — Hermes Agent (Python)
--- a/ideas/passepartout-economics/compute-marketplace.org
+++ b/ideas/passepartout-economics/compute-marketplace.org
@@ -8,6 +8,22 @@ Passepartout instances offer their symbolic engine capacity (ACL2 cycles, Scream
 The early player runs a large instance and sells compute to smaller instances. The AGPL allows this because the marketplace is a service, not a modification of the code. Revenue is a percentage of each compute transaction.
 But the question is structural: if every user runs their own Passepartout — each with the same symbolic engine, the same gate stack, the same ACL2 prover — why would they need to buy compute from anyone? The answer is that Passepartout's symbolic engine is /domain-specific/, not /generalized/. Local compute handles your daily gate stack (milliseconds per verification). The marketplace sells three things a local instance cannot produce:
 **1. Specialized proof libraries and search strategies.** ACL2 is a search — the prover tries strategies until something works. A fresh Passepartout has generic strategies (the default waterfall, basic arithmetic, simple induction). A provider who has run 10,000 medical-device ISO 13482 proofs has tuned rewrite rules, custom clause processors, cached lemmas, and known failure-mode workarounds for that domain. You don't want to rediscover those from scratch — you buy them as a burst compute transaction. The provider isn't selling raw CPU cycles; they are selling /the accumulated search strategy from every proof ever run in that domain/, pre-packaged as a service. Over time, your own Passepartout learns the patterns and needs less external compute, but the provider stays ahead because they aggregate proof experience from /every/ client in that domain.
 **2. Certification weight for third-party trust.** Your Passepartout can prove "this gate rule is correct" to /you/. ACL2 produces a machine-checkable proof log — anyone can mechanically verify it. But when a hospital buyer evaluating a published HIPAA gate rule needs to know the rule satisfies the regulation, they do not care about your Passepartout's isolated run of the proof. They want the rule verified by a provider who:
 - Carries errors-and-omissions insurance for the specific regulation
 - Submits to annual third-party audits
 - Maintains compliance documentation for the proof pipeline
 - Has a publicly verifiable track record of correct certifications
 Your local instance cannot produce any of this. The provider's proof carries /reputational weight/ because the provider is a legal and economic entity, not a process. This is the same reason software is certified by UL or TÜV rather than by the developer running the test suite locally.
 **3. Bootstrap verification for new instances.** A fresh Passepartout cannot verify its own initial state — the bootstrapping problem. You need a working system to generate the proof that the system is correct, but the proof refers to the system itself. The marketplace provides bootstrap proofs from existing trusted providers. Once verified, your instance stands on its own, but the initial self-certification requires an external prover that /already/ has a self-verified image. This is a one-time cost per instance (or per upgrade).
 Secondary but real: burst capacity for heavy proofs (hours-long ACL2 conjectures you do not want tying up your daily agent's CPU), collective regression suite execution (small instances contribute edge cases but cannot run the full suite on every change), and latency guarantees for time-critical gate verifications (trading, emergency shutdown). These are infrastructure economics — the same reason individuals buy cloud burst instances despite having their own hardware.
 If Passepartout instances on Agora transact billions of verified operations per day, the spread on compute transactions is enormous. This is not a product sale — it is a bet on network effects. Every new instance increases the value of the network (more capacity, more diversity, more resilience).
 The early player that provisions the largest compute capacity on Agora becomes the default infrastructure provider for the entire network. This is venture-scale money.