passepartout: v0.4.1 Design Cleanup
Some checks failed
Deploy (Gitea) / deploy (push) Failing after 2s

- Remove system-prompt-augment mechanism, introduce *standing-mandates*
- Fix false token-overhead claims in DESIGN_DECISIONS + ROADMAP
- Update security vector count 9-10 across all docs and dispatcher docstring
- Rewrite README with agent section, soften aspirational claims
- Register 10 cognitive tools in programming-tools.org with test suite
- Enforce NO-HARDCODED-CONSTANTS in .env.example
- ROADMAP: mark v0.3.x patches DONE, add LOGBOOKs, mark releases
- AGENTS.md: rewrite compact (180 to 50 lines), move refs to CONTRIBUTING
- Normalize org tangle directives to file-level PROPERTY inheritance
This commit is contained in:
2026-05-07 16:44:59 -04:00
parent d3b74f5c88
commit 639bc348d9
25 changed files with 1555 additions and 144 deletions

View File

@@ -321,7 +321,7 @@ The structural multipliers are:
1. *Sparse tree retrieval* — the foveal-peripheral model renders relevant Org subtrees (titles and properties for peripheral nodes, full content for foveal and semantically relevant nodes). Active context stays at 2,0004,000 tokens. A "load everything" architecture serializes the entire knowledge base at 50,000150,000 tokens. The mechanism is provably cheaper; the exact multiplier depends on memex size and complexity.
2. *Deterministic safety* — the 9-vector Dispatcher gate stack runs in pure Lisp. Every gate is a Common Lisp function. Verification costs 0 LLM tokens per action. Competitors use prompt-based guardrails consuming 100500 LLM tokens per verification. This multiplier is mathematically infinite — a Lisp function call costs no tokens, a guardrail paragraph in a system prompt costs tokens proportional to its length.
2. *Deterministic safety* — the 10-vector Dispatcher gate stack runs in pure Lisp. Every gate is a Common Lisp function. Verification costs 0 LLM tokens per action. Competitors use prompt-based guardrails consuming 100500 LLM tokens per verification. This multiplier is mathematically infinite — a Lisp function call costs no tokens, a guardrail paragraph in a system prompt costs tokens proportional to its length.
3. *REPL verification* — code is tested in the running image before it is committed. Errors surface in milliseconds at 0 LLM tokens. Competitors discover errors after generation and pay 5002,000 tokens per correction round-trip. The REPL eliminates the most expensive kind of LLM call: the one that produced wrong code and needs a do-over.
@@ -432,7 +432,7 @@ The critical risk is implementation: achieving the retrieval precision, Dispatch
1. *Retrieval accuracy is the bottleneck.* If sparse tree retrieval loads the wrong subtree (low-similarity but causally relevant), the LLM makes unfixable errors. The architecture assumes embedding quality is "good enough" — this is untested at scale.
2. *System prompt overhead can consume savings.* Every =think= cycle iterates all registered skills and calls every =system-prompt-augment= function. With 20+ skills, a trivial interaction could carry 3,000-8,000 tokens of overhead before user input is even processed. This overhead is flat per-call, so it disproportionately affects short interactions.
2. *System prompt overhead can consume savings.* Every =think= cycle builds the full system prompt from IDENTITY + TOOLS + CONTEXT + LOGS. With the foveal-peripheral context model growing over time and the tool belt expanding with skills, the fixed overhead is non-trivial. However, it is driven by context and tool descriptions, not by the ~*standing-mandates*~ list (which contributes ~40 tokens when a single mandate fires, and 0 otherwise). Prefix caching (v0.5.0) is the primary mitigation for this overhead.
3. *Model size vs context quality.* A 3.8B model with perfect context cannot match a 70B model on complex multi-file refactors regardless of context quality. Model size independently determines reasoning depth. The minimum viable model is likely 7-13B parameters for engineering work.