remediation: backfill v0.1.0/v0.2.0 gaps (P0+P1)
- vault: add vault-get-secret/vault-set-secret wrappers - programming-org: implement org-modify (text search-replace) and org-ast-render (AST to Org text) - programming-literate: implement literate-block-balance-check (paren validation) and literate-tangle-sync-check (org→lisp diff) - system-self-improve: replace stubs with surgical text editing and error diagnosis; remove dead first defskill - system-event-orchestrator: implement orchestrator-bootstrap (scan Org files for HOOK/CRON) - system-archivist: implement Scribe distillation (daily logs→atomic notes) and Gardener link/orphan repair - system-memory: implement memory-inspect with type/todo/orphan statistics - core-skills, core-context: fix path relic (skills/ → lisp/, org/) - docs: add Token Economics section to DESIGN_DECISIONS, remediation roadmap entries
This commit is contained in:
@@ -336,4 +336,125 @@ The long-term goal is a single =passepartout= binary that the user runs. It star
|
||||
|
||||
This stands in stark contrast to most AI agent systems, which require managing Python environments, npm packages, API keys, environment variables, and configuration files. OpenAI's agents SDK requires pip install, a Python environment, and external API access. OpenClaw requires Node.js, npm, and a plugin ecosystem that must be individually installed. LangChain requires a Python environment with dozens of dependencies that must be kept compatible.
|
||||
|
||||
Passepartout's dependency model is SBCL plus Quicklisp. Quicklisp loads libraries on demand from the internet, but caches them locally. A system with internet access can fetch any library it needs. A system without internet access uses only the libraries it has already loaded - and those are preserved in the cache. The agent does not require internet access to function after initial setup.
|
||||
Passepartout's dependency model is SBCL plus Quicklisp. Quicklisp loads libraries on demand from the internet, but caches them locally. A system with internet access can fetch any library it needs. A system without internet access uses only the libraries it has already loaded - and those are preserved in the cache. The agent does not require internet access to function after initial setup.
|
||||
|
||||
* Token Economics and Performance Advantage
|
||||
:PROPERTIES:
|
||||
:ID: design-token-economics
|
||||
:END:
|
||||
|
||||
This section analyzes how Passepartout's architectural decisions translate into token usage, latency, and cost versus competing agent designs (OpenClaw, Hermes, Claude Code).
|
||||
|
||||
** The Core Insight: LLM as Expensive Resource, Not Default Engine
|
||||
|
||||
Passepartout treats the LLM as a resource to be minimized. Every operation is designed to reduce LLM dependency. Competitors treat the LLM as the core engine through which all operations flow. This is not a difference of degree but of architecture.
|
||||
|
||||
The three structural multipliers are:
|
||||
|
||||
1. *Sparse tree retrieval* — loading relevant subtrees (200-800 tokens per file) rather than full files (1,500-5,000 tokens) = ~5-10x reduction per file access
|
||||
2. *Deterministic safety* — 9-vector dispatcher gate runs in pure Lisp (0 LLM tokens per verification) versus prompt-based guardrails (200-500 tokens per action) = infinite multiplier
|
||||
3. *REPL verification* — catches errors in-image (milliseconds, 0 LLM tokens) versus LLM correction round-trips (500-2,000 tokens per retry)
|
||||
|
||||
These compound. A coding session touching 20 files, performing 10 actions, and triggering 3 errors saves ~50,000-100,000 tokens compared to the same session with Claude Code.
|
||||
|
||||
** Per-Task Type Analysis
|
||||
|
||||
*** Coding (debugging, refactoring, PR review)
|
||||
|
||||
| Operation | Passepartout | Claude Code | Hermes (3-agent) | Savings vs Claude |
|
||||
|-----------|-------------|-------------|-------------------|--------------------|
|
||||
| File access (30 files) | 30 × 400 tok = 12,000 | 30 × 3,000 tok = 90,000 | 30 × 3,000 tok × 3 = 270,000 | 78,000 tok |
|
||||
| Reasoning rounds (20) | 20 × 3,000 tok = 60,000 | 20 × 4,000 tok = 80,000 | 20 × 3,000 tok × 3 = 180,000 | 20,000 tok |
|
||||
| Error correction (5 caught by REPL) | 0 (REPL) | 5 × 1,000 tok = 5,000 | 5 × 1,000 tok × 3 = 15,000 | 5,000 tok |
|
||||
| Safety verification | 0 (deterministic) | 500 tok/round × 20 = 10,000 | 200 tok/round × agents | 10,000 tok |
|
||||
| Agent coordination | 0 | 0 | 3,000-5,000 tok/task | 0 |
|
||||
| *Total* | *~72,000 tok* | *~185,000 tok* | *~475,000 tok* | *~113,000 tok (2.6x)* |
|
||||
|
||||
Over a month of daily coding (20 sessions): ~2.3 million tokens saved. At typical API pricing ($2-15/M tokens), this saves $5-35/month.
|
||||
|
||||
*** Knowledge Management (Zettelkasten, research, note-taking)
|
||||
|
||||
Passepartout's strongest domain. The Org-mode native format and sparse tree retrieval create a 10-40x advantage because knowledge bases are the worst case for "load everything" architectures.
|
||||
|
||||
| Operation | Passepartout | Competitor | Savings |
|
||||
|-----------|-------------|------------|---------|
|
||||
| Context assembly (500-node KB) | Peripheral outline + ~5 foveal nodes = 2,000-4,000 tok | Full serialization = 80,000-150,000 tok | 40-75x |
|
||||
| Semantic search (10 queries) | Vector lookup in-image = 0 LLM tok | LLM-assisted search = 5,000 tok | 5,000 tok |
|
||||
| Note creation (10 notes) | Deterministic Org writes = 0 LLM tok | 10 × 800 tok = 8,000 | 8,000 tok |
|
||||
| *Total per session* | *~7,000 tok* | *~95,000-165,000 tok* | *~13-24x* |
|
||||
|
||||
*** Day-to-Day Life Management (calendar, tasks, reminders)
|
||||
|
||||
| Operation | Passepartout | Competitor | Savings |
|
||||
|-----------|-------------|------------|---------|
|
||||
| Background maintenance | Deterministic heartbeat-driven = 0 LLM tok | Scheduled LLM calls or skipped | Variable |
|
||||
| User interactions (30/day) | 30 × 2,000 tok = 60,000 | 30 × 4,000 tok = 120,000 | 60,000 tok |
|
||||
| Context queries by TODO/tag | Hash table scan = 0 LLM tok | LLM-based search = 2,500 tok | 2,500 tok |
|
||||
| *Total per day* | *~60,000 tok* | *~122,500 tok* | *~2x* |
|
||||
|
||||
The defining advantage: background maintenance (compaction, archiving, link repair) costs zero LLM tokens. Competing systems either skip this or pay LLM costs for it.
|
||||
|
||||
*** Chatting (casual conversation)
|
||||
|
||||
Chatting is inherently LLM-bound. Passepartout's edge is privacy filtering before content reaches the LLM and slightly smaller context footprint. Token savings are marginal (~1.3x).
|
||||
|
||||
** The Dispatcher Learning Curve: Cost Decreases Over Time
|
||||
|
||||
A unique architectural property: Passepartout's cost curve descends while competitors' ascends.
|
||||
|
||||
Passepartout: As the dispatcher accumulates deterministic rules from Human-in-the-Loop decisions, fewer actions require LLM proposals. A file write that initially triggered a full LLM proposal → dispatcher review → HITL approval → rule extraction loop eventually becomes a deterministic rule check. Each hardened rule permanently reduces future token costs.
|
||||
|
||||
Competitors: As context histories grow, safety instructions accumulate, and guardrails become more elaborate, each interaction costs more than the last. The only way to reduce cost is to cap context — sacrificing capability.
|
||||
|
||||
After 12 months of learning, Passepartout's core reasoning costs could drop to 40-60% of baseline, while competitors' costs rise to 125-140% of baseline.
|
||||
|
||||
The crossover point where Passepartout becomes structurally cheaper is estimated at 3-6 months depending on usage volume and task diversity.
|
||||
|
||||
** Local LLM Viability
|
||||
|
||||
Reduced context requirements change which model sizes deliver acceptable performance:
|
||||
|
||||
| Model | Passepartout Viability | Competitor Viability |
|
||||
|-------|----------------------|---------------------|
|
||||
| Phi-3-mini 3.8B (4K ctx) | Viable for structured tasks | Context starvation |
|
||||
| Llama 3.1 8B (8K ctx) | Comfortable daily driver | Marginal |
|
||||
| Qwen 2.5 7B (4K ctx) | Viable for most tasks | Not viable |
|
||||
| Mistral 7B (8K ctx) | Comfortable | Marginal |
|
||||
| Llama 3.1 70B (128K ctx) | Overkill (but works) | Comfortable |
|
||||
|
||||
KV cache memory scales with context length:
|
||||
|
||||
| Context Window | KV Cache (Llama 3.1 8B, FP16) |
|
||||
|---------------|-------------------------------|
|
||||
| 4K tokens | ~67 MB |
|
||||
| 32K tokens | ~540 MB |
|
||||
| 128K tokens | ~2.1 GB |
|
||||
|
||||
Passepartout at 4K effective context: ~67 MB KV cache. Competitor at 128K: ~2.1 GB. A 7-8B model on an RTX 3060 Ti (8 GB VRAM) or MacBook (16 GB unified memory) is a practical daily driver with Passepartout. Competitors at full context require 16-32 GB VRAM or cloud APIs.
|
||||
|
||||
** Open Questions and Risks
|
||||
|
||||
1. *Retrieval accuracy is the bottleneck.* If sparse tree retrieval loads the wrong subtree (low-similarity but causally relevant), the LLM makes unfixable errors. The architecture assumes embedding quality is "good enough" — this is untested at scale.
|
||||
|
||||
2. *System prompt overhead can consume savings.* Every =think= cycle iterates all registered skills and calls every =system-prompt-augment= function. With 20+ skills, a trivial interaction could carry 3,000-8,000 tokens of overhead before user input is even processed. This overhead is flat per-call, so it disproportionately affects short interactions.
|
||||
|
||||
3. *Model size vs context quality.* A 3.8B model with perfect context cannot match a 70B model on complex multi-file refactors regardless of context quality. Model size independently determines reasoning depth. The minimum viable model is likely 7-13B parameters for engineering work.
|
||||
|
||||
4. *The 3-retry dispatcher loop.* When the dispatcher rejects a proposal, the rejection trace feeds back to the LLM for self-correction (up to 3 retries). If the dispatcher rejects 30% of proposals, the effective token multiplier is 1.39x per action. At 50% rejection (plausible during early use), it is 1.75x. This penalty decreases as the dispatcher accumulates rules.
|
||||
|
||||
5. *Competitor evolution.* Sparse retrieval is not patentable. Claude Code, Copilot, and others will implement similar mechanisms. The architectural advantage is real but finite in duration. The deterministic safety gate is the harder-to-replicate differentiator.
|
||||
|
||||
** Comparison Summary
|
||||
|
||||
| Metric | Passepartout | Claude Code | Hermes | OpenClaw |
|
||||
|--------|-------------|-------------|--------|----------|
|
||||
| Active context (tokens) | 2,000-4,000 | 10,000-50,000+ | 5,000-15,000/agent | 10,000-40,000 |
|
||||
| File access cost (per file) | 200-800 tok | 1,500-5,000 tok | 1,500-5,000 tok × agents | 1,500-5,000 tok |
|
||||
| Safety verification cost | 0 (deterministic) | 200-500 tok/action | 200-500 tok/action × agents | 100-300 tok/action |
|
||||
| Agent coordination cost | 0 | 0 | 1,000-3,000 tok/task | 500-2,000 tok/task |
|
||||
| Error recovery cost | 0 (REPL) | 500-2,000 tok/retry | 500-2,000 tok/retry × agents | 500-2,000 tok/retry |
|
||||
| Long-term cost trend | Decreasing | Increasing | Increasing | Flat/Increasing |
|
||||
| Min viable local model | 3-4B params, 4K ctx | 30-70B params, 32K+ ctx | 30-70B params, 32K+ ctx | 7-13B params, 8K+ ctx |
|
||||
| Min VRAM for local | 4-6 GB | 16-32 GB | 24-48 GB | 8-16 GB |
|
||||
|
||||
*Conclusion:* Passepartout's architecture is designed to produce 2-3x token savings for coding, 13-24x for knowledge management, and 2x for life management at v1.0.0 maturity. The three structural advantages — sparse trees, deterministic safety, and REPL verification — compound. The critical risk is implementation gap: achieving the retrieval precision, dispatcher learning, and REPL integration depth required to realize the design.
|
||||
Reference in New Issue
Block a user