passepartout: v0.4.1 Design Cleanup

- Remove system-prompt-augment mechanism, introduce *standing-mandates* - Fix false token-overhead claims in DESIGN_DECISIONS + ROADMAP - Update security vector count 9-10 across all docs and dispatcher docstring - Rewrite README with agent section, soften aspirational claims - Register 10 cognitive tools in programming-tools.org with test suite - Enforce NO-HARDCODED-CONSTANTS in .env.example - ROADMAP: mark v0.3.x patches DONE, add LOGBOOKs, mark releases - AGENTS.md: rewrite compact (180 to 50 lines), move refs to CONTRIBUTING - Normalize org tangle directives to file-level PROPERTY inheritance
2026-05-07 16:44:59 -04:00
parent d3b74f5c88
commit 639bc348d9
25 changed files with 1555 additions and 144 deletions
--- a/docs/DESIGN_DECISIONS.org
+++ b/docs/DESIGN_DECISIONS.org
@@ -321,7 +321,7 @@ The structural multipliers are:

 1. *Sparse tree retrieval* — the foveal-peripheral model renders relevant Org subtrees (titles and properties for peripheral nodes, full content for foveal and semantically relevant nodes). Active context stays at 2,000–4,000 tokens. A "load everything" architecture serializes the entire knowledge base at 50,000–150,000 tokens. The mechanism is provably cheaper; the exact multiplier depends on memex size and complexity.

-2. *Deterministic safety* — the 9-vector Dispatcher gate stack runs in pure Lisp. Every gate is a Common Lisp function. Verification costs 0 LLM tokens per action. Competitors use prompt-based guardrails consuming 100–500 LLM tokens per verification. This multiplier is mathematically infinite — a Lisp function call costs no tokens, a guardrail paragraph in a system prompt costs tokens proportional to its length.
+2. *Deterministic safety* — the 10-vector Dispatcher gate stack runs in pure Lisp. Every gate is a Common Lisp function. Verification costs 0 LLM tokens per action. Competitors use prompt-based guardrails consuming 100–500 LLM tokens per verification. This multiplier is mathematically infinite — a Lisp function call costs no tokens, a guardrail paragraph in a system prompt costs tokens proportional to its length.

 3. *REPL verification* — code is tested in the running image before it is committed. Errors surface in milliseconds at 0 LLM tokens. Competitors discover errors after generation and pay 500–2,000 tokens per correction round-trip. The REPL eliminates the most expensive kind of LLM call: the one that produced wrong code and needs a do-over.

@@ -432,7 +432,7 @@ The critical risk is implementation: achieving the retrieval precision, Dispatch

 1. *Retrieval accuracy is the bottleneck.* If sparse tree retrieval loads the wrong subtree (low-similarity but causally relevant), the LLM makes unfixable errors. The architecture assumes embedding quality is "good enough" — this is untested at scale.

-2. *System prompt overhead can consume savings.* Every =think= cycle iterates all registered skills and calls every =system-prompt-augment= function. With 20+ skills, a trivial interaction could carry 3,000-8,000 tokens of overhead before user input is even processed. This overhead is flat per-call, so it disproportionately affects short interactions.
+2. *System prompt overhead can consume savings.* Every =think= cycle builds the full system prompt from IDENTITY + TOOLS + CONTEXT + LOGS. With the foveal-peripheral context model growing over time and the tool belt expanding with skills, the fixed overhead is non-trivial. However, it is driven by context and tool descriptions, not by the ~*standing-mandates*~ list (which contributes ~40 tokens when a single mandate fires, and 0 otherwise). Prefix caching (v0.5.0) is the primary mitigation for this overhead.

 3. *Model size vs context quality.* A 3.8B model with perfect context cannot match a 70B model on complex multi-file refactors regardless of context quality. Model size independently determines reasoning depth. The minimum viable model is likely 7-13B parameters for engineering work.