diff --git a/docs/.#ROADMAP.org b/docs/.#ROADMAP.org deleted file mode 120000 index 0b6e6ab..0000000 --- a/docs/.#ROADMAP.org +++ /dev/null @@ -1 +0,0 @@ -user@amr.38893:1778162380 \ No newline at end of file diff --git a/docs/DESIGN_DECISIONS.org b/docs/DESIGN_DECISIONS.org index 3fa8d68..abb3b6d 100644 --- a/docs/DESIGN_DECISIONS.org +++ b/docs/DESIGN_DECISIONS.org @@ -1,8 +1,15 @@ # Passepartout Design Decisions -This document captures the rationale behind key architectural choices. It is not a specification - it is a thinking medium for future architects and contributors who need to understand why the system is built this way, not just how. +This document captures the rationale behind key architectural choices. It is not a specification — it is a thinking medium for future architects and contributors who need to understand why the system is built this way, not just how. + +* Part I: Foundation ** Non-Negotiable Identity +:PROPERTIES: +:ID: design-identity +:CREATED: [2026-05-07 Wed] +:END: + - Pure Common Lisp + Org-mode. No JSON. No YAML. No external databases. - Single-address-space memory (Lisp hash tables in RAM — the agent IS the memory). - "Thin harness, fat skills" — complexity lives at the edges, not the kernel. @@ -11,25 +18,23 @@ This document captures the rationale behind key architectural choices. It is not This is the foundational decision from which all other decisions derive. It is not negotiable. Every architectural choice below exists because this identity makes it possible — and in some cases, makes it the only viable path. The single memory space enables Merkle-tree integrity without serialization boundaries. Plists enable the cognitive pipeline to be transparent and inspectable at every stage. Org-mode as the universal format means the agent's memory, the user's notes, and the agent's own source code are the same structure. This identity is the constraint that produces the architecture. -* Design - -** One single agent +** One Single Agent :PROPERTIES: :ID: design-multi-agent-default :CREATED: [2026-05-07 Wed] :END: -The AI industry has developed an intuition toward multi-agent systems as the default solution to hard problems. Multiple agents spawn, delegate, coordinate, debate, and consensus their way toward solutions. This pattern is compelling in demos and genuinely useful in specific contexts - but it has become a default assumption that warrants scrutiny. +The AI industry has developed an intuition toward multi-agent systems as the default solution to hard problems. Multiple agents spawn, delegate, coordinate, debate, and consensus their way toward solutions. This pattern is compelling in demos and genuinely useful in specific contexts — but it has become a default assumption that warrants scrutiny. -When context windows grew expensive and task complexity increased, the response was natural: split the problem across agents, each handling a slice. But this architectural choice carries hidden costs that are rarely acknowledged in the enthusiasm of implementation. +When context windows grew expensive and task complexity increased, the response was natural: split the problem across agents, each handling a slice. But this architectural choice carries hidden costs that are rarely acknowledged. -*The synchronization tax* is the most immediate burden. Each agent operates with partial information, and maintaining coherence requires continuous state reconciliation. Tokens and processing cycles are spent not on the task itself, but on protocol overhead - who holds what, who decided what, who is correct when they disagree. +*The synchronization tax* is the most immediate burden. Each agent operates with partial information, and maintaining coherence requires continuous state reconciliation. Tokens and processing cycles are spent not on the task itself, but on protocol overhead — who holds what, who decided what, who is correct when they disagree. *Fragmented context* is the deeper problem. When Agent A writes a function and Agent B modifies a type it depends on, neither has the full picture. Integration failures emerge not from individual incompetence but from systemic communication gaps. Single-agent systems avoid this entirely: one brain holds the complete model, every decision is made with full visibility. *Audit trails become complex* in multi-agent systems. A decision traced through a single-agent system has a clean, linear history. A decision traced through a multi-agent system branches and forks, with each agent's reasoning partially overlapping and partially conflicting. -None of this is to say multi-agent systems are never appropriate. Embarrassingly parallel workloads - scanning ten thousand files, processing batch jobs - benefit from parallelism regardless of context. When distinct expertises are required and cannot coexist in one model, delegation makes sense. In adversarial scenarios where conflicting goals are features, multi-agent architectures shine. +None of this is to say multi-agent systems are never appropriate. Embarrassingly parallel workloads benefit from parallelism regardless of context. When distinct expertises are required and cannot coexist in one model, delegation makes sense. In adversarial scenarios where conflicting goals are features, multi-agent architectures shine. But the default assumption that complex reasoning tasks are best solved by multiple agents is unproven and likely wrong for the engineering domain. Claude Code is a single-agent system. It handles 50-file refactors, debugs complex stack traces, writes tests, and navigates large codebases. The assumption that you need five agents to do what one well-designed agent can do is an industry habit, not a technical necessity. @@ -43,15 +48,15 @@ Passepartout is single-agent by default not from limitation but from conviction: If single-agent architecture is the decision, unified memory becomes the mechanism that makes it viable. The critical question is not "how many agents" but "how does the agent manage context without saturating." -Context window limits are largely a symptom of lazy architecture. The default approach - stuff everything in, hope the model figures it out - works poorly at scale. A more principled approach inverts the problem: the system should hold effectively infinite context, with the active window kept lean through intelligent management. +Context window limits are largely a symptom of lazy architecture. The default approach — stuff everything in, hope the model figures it out — works poorly at scale. A more principled approach inverts the problem: the system should hold effectively infinite context, with the active window kept lean through intelligent management. -*Lazy loading* is the core technique. When an agent needs information about a function, it does not load the entire codebase. It loads precisely what the function does. Context stays lean - 2,000 to 4,000 tokens - while the full context remains accessible through retrieval. +*Lazy loading* is the core technique. When an agent needs information about a function, it does not load the entire codebase. It loads precisely what the function does. Context stays lean — 2,000 to 4,000 tokens — while the full context remains accessible through retrieval. *Compaction events* are scheduled during idle cycles. The system extracts new facts from active context and writes them to permanent storage. Active context is wiped clean, not because space ran out, but because the information has been preserved in a form that can be retrieved when relevant. *Org-mode as externalized memory* solves the persistence problem elegantly. Every decision, every note, every task lives in plain text files the user already owns. The agent does not maintain a separate database. It queries files it can already access, modifies files it already owns. -*Retrieval is the key primitive.* Semantic search across Org files finds relevant nodes. The agent does not hold the full context - it holds pointers to context, loaded on demand. This is how a single agent handles tasks that would saturate a naive multi-megabyte context window. +*Retrieval is the key primitive.* Semantic search across Org files finds relevant nodes. The agent does not hold the full context — it holds pointers to context, loaded on demand. This is how a single agent handles tasks that would saturate a naive multi-megabyte context window. The unified memory argument is not that infinite context is free. It is that with proper architecture, effective infinite context is achievable without the synchronization and fragmentation costs of multi-agent systems. @@ -63,7 +68,7 @@ The unified memory argument is not that infinite context is free. It is that wit Passepartout makes a bet that most systems consider too expensive to place: that humans and machines should share the same file format. That bet is Org-mode. -Most systems separate human-readable notes from machine-readable data. The user writes Markdown. The system stores it, indexes it, searches it. But internally, the system maintains its own model - a database, an object store, a knowledge graph - that is disconnected from the Markdown. When the user dies or leaves, the Markdown survives but the model must be reconstructed. +Most systems separate human-readable notes from machine-readable data. The user writes Markdown. The system stores it, indexes it, searches it. But internally, the system maintains its own model — a database, an object store, a knowledge graph — that is disconnected from the Markdown. When the user dies or leaves, the Markdown survives but the model must be reconstructed. Passepartout refuses this separation. The Org file is not a representation of the data. The Org file IS the data. The same text that the user reads and edits is what the system parses and operates on. org-element reads an Org buffer and returns a tree structure that is the direct Lisp representation of the file's content. @@ -77,13 +82,13 @@ Third, the format is stable across decades. Org-mode has been in active developm Fourth, the format is universally available. Org-mode is free software. The files are plain text. There is no proprietary format to decode, no application to purchase, no cloud service to access. -Fifth, the format is header-aware and sparse-tree capable. Org-mode's headline hierarchy is not just formatting - it is a semantic structure the system can query. The agent can retrieve only the relevant subtree under a heading, ignoring the rest of the file. This is fundamentally different from Markdown, where the entire file must be loaded or the retrieval logic must parse and filter at the string level. +Fifth, the format is header-aware and sparse-tree capable. Org-mode's headline hierarchy is not just formatting — it is a semantic structure the system can query. The agent can retrieve only the relevant subtree under a heading, ignoring the rest of the file. This is fundamentally different from Markdown, where the entire file must be loaded or the retrieval logic must parse and filter at the string level. -Sparse tree retrieval is the key to efficient context management. When the agent needs information about the =openctl-db= function, it queries for the =openctl-db= subtree specifically. It receives exactly the code, documentation, and metadata under that heading - nothing more. The context stays lean not because the file was pre-split but because the retrieval is structural. In a Markdown system, the agent either loads the entire file (expensive, noisy) or relies on imprecise grep-like search (fragile, loses hierarchy). In Org-mode, retrieval is precise, hierarchical, and cheap. The heading boundary is the access boundary. +Sparse tree retrieval is the key to efficient context management. When the agent needs information about the =openctl-db= function, it queries for the =openctl-db= subtree specifically. It receives exactly the code, documentation, and metadata under that heading — nothing more. The context stays lean not because the file was pre-split but because the retrieval is structural. In a Markdown system, the agent either loads the entire file (expensive, noisy) or relies on imprecise grep-like search (fragile, loses hierarchy). In Org-mode, retrieval is precise, hierarchical, and cheap. The heading boundary is the access boundary. Sixth, Org-mode unifies what every other format fragments. A single Org file contains the headline hierarchy, prose documentation, source code blocks with live evaluation, tags for categorization, metadata in property drawers, TODO state for task management, timestamps and deadlines, and links to other nodes. Markdown cannot express TODO state without external tools. JSON cannot contain prose. YAML cannot embed runnable code. Each format serves one purpose; Org-mode serves all of them. When the agent reads a skill file, it reads documentation, code, dependencies, metadata, and task state in one parseable structure. When the human reads the same file, they see the same information rendered in a human-friendly form. No other format achieves this unification without maintaining parallel files or external databases. -Seventh, a skill lives in one Org file, not a directory. The standard pattern for a software project is a directory containing =README.md=, =package.json=, =src/main.py=, =src/utils.py=, =tests/test_main.py=, =scripts/deploy.sh=, and =config.yaml=. Each file type is isolated by convention: prose lives in README, code lives in src, tests in tests, configuration in config. This fragmentation means the skill is not a single object the system can reason about - it is a collection of files the system must assemble. Passepartout's skills violate this convention deliberately. Each skill is one Org file. The file contains the skill's documentation, the skill's code, the skill's metadata, the skill's TODO state, and the skill's dependencies on other skills. There is no directory to navigate, no external files to locate, no risk that the README describes behavior that the code does not implement. The skill is a single atomic unit: readable by human and machine, editable by both, versionable as one entity. +Seventh, a skill lives in one Org file, not a directory. The standard pattern for a software project is a directory containing =README.md=, =package.json=, =src/main.py=, =src/utils.py=, =tests/test_main.py=, =scripts/deploy.sh=, and =config.yaml=. Each file type is isolated by convention. Passepartout's skills violate this convention deliberately. Each skill is one Org file. The file contains the skill's documentation, the skill's code, the skill's metadata, the skill's TODO state, and the skill's dependencies on other skills. There is no directory to navigate, no external files to locate, no risk that the README describes behavior that the code does not implement. The skill is a single atomic unit: readable by human and machine, editable by both, versionable as one entity. The unified format is what makes the memory architecture work. The agent's memory is not a database that the user cannot inspect. It is a folder of Org files that the user can read, edit, and understand. The agent manipulates these files directly, using the same tools the user would use. There is no hidden state, no shadow database, no model that differs from the source. @@ -99,29 +104,50 @@ Common Lisp is homoiconic: code and data share the same representation. A Lisp p When code is data, the agent can read its own source the same way it reads a text file or an Org buffer. There is no AST parser required, no external tool to extract the function object from the running image. The agent evaluates (read-from-string source) and the result is executable Lisp. The representation it manipulates is the same representation that the runtime executes. -This is not true of most languages. In Python, the agent can inspect an AST through the ast module, but that AST is a foreign object - a data structure that represents code but is not code itself. The agent can see that a function takes certain arguments and returns a certain type, but it cannot treat the AST as a live object it can modify and re-evaluate. In C, the agent cannot inspect its own compiled machine code at all. +This is not true of most languages. In Python, the agent can inspect an AST through the ast module, but that AST is a foreign object — a data structure that represents code but is not code itself. In C, the agent cannot inspect its own compiled machine code at all. In Lisp, the distinction between code and data is a convention, not a barrier. The agent's skills are lists. The agent can take a skill, extract a function definition, modify the body, wrap it in a new list, and evaluate it. The modification is surgical: it changes exactly what it intends to change, with no risk of corrupting adjacent state, because the representation is a tree that the runtime understands natively. Runtime introspection is therefore native. The agent does not need a debugger API or a reflection protocol. It operates on its own code as data because its own code is data. (describe 'function-name) returns the function's documentation. (function-lambda-list 'function-name) returns its parameters. (macroexpand-1 '(defskill ...)) shows what the macro produces. There is no impedance mismatch between the agent's reasoning and the system's representation. -Self-modification is the practical consequence. The agent can detect an error, locate the erroneous function, generate a corrected version, and hot-reload it into the running image. The correction is not applied to a file that requires a restart - it is applied to the live object that the system is currently executing. This is what makes the self-editing skill viable: the agent can fix itself without stopping. +Self-modification is the practical consequence. The agent can detect an error, locate the erroneous function, generate a corrected version, and hot-reload it into the running image. The correction is not applied to a file that requires a restart — it is applied to the live object that the system is currently executing. This is what makes the self-editing skill viable: the agent can fix itself without stopping. -In v3.0.0, when the symbolic engine takes over the reasoning core, homoiconicity becomes the bridge between the neural and symbolic layers. The neural engine generates proposals as s-expressions. The symbolic engine evaluates them against formal constraints. The result is a modification that is simultaneously a data structure the symbolic engine can analyze and code the runtime can execute. The two representations are identical by construction. +In v1.0.0, when the symbolic engine takes over the reasoning core, homoiconicity becomes the bridge between the neural and symbolic layers. The neural engine generates proposals as s-expressions. The symbolic engine evaluates them against formal constraints. The result is a modification that is simultaneously a data structure the symbolic engine can analyze and code the runtime can execute. The two representations are identical by construction. This is the technical meaning of "Lisp as Governor": not just that Lisp orchestrates the other components, but that the representation of the system is uniform and inspectable at every level. There is no hidden state, no opaque machine code, no representation that the agent cannot reach into and modify. The system is legible to itself by design. -*Self-Modification Without Boundaries* +*** Self-Modification Without Boundaries -Other systems that support self-editing draw a line between the core and the skills. Hermes can modify its skills at runtime, but the core harness is protected - editing it requires a restart because the core is treated as privileged code that cannot be safely modified while running. +Other systems that support self-editing draw a line between the core and the skills. Hermes can modify its skills at runtime, but the core harness is protected — editing it requires a restart because the core is treated as privileged code that cannot be safely modified while running. -Passepartout has no such boundary. The "thin harness, fat skills" distinction describes where complexity lives, not where authority flows. The harness is small by design, but it is not privileged. The agent can read and write any part of the system - including the very code that is currently executing - without restarting. +Passepartout has no such boundary. The "thin harness, fat skills" distinction describes where complexity lives, not where authority flows. The harness is small by design, but it is not privileged. The agent can read and write any part of the system — including the very code that is currently executing — without restarting. This is only possible because Lisp code is mutable data at runtime. In a compiled language, the machine code for a running function is locked in memory, protected by the call stack, impossible to modify safely. In Lisp, the function object is a list you can modify with =setf=. When the agent changes a harness function, the running image immediately reflects the change. The next invocation uses the new code. There is no restart, no special boot mode, no distinction between development and production. -The implications extend beyond convenience. A system that cannot modify its own core is a system that has limits on its own adaptability. It can learn skills but not improve its own structure. It can grow but not evolve. Passepartout's lack of a core boundary means the system can improve its own reasoning engine, fix bugs in its own cognition, and evolve its own architecture - all while continuing to operate. +The implications extend beyond convenience. A system that cannot modify its own core is a system that has limits on its own adaptability. It can learn skills but not improve its own structure. It can grow but not evolve. Passepartout's lack of a core boundary means the system can improve its own reasoning engine, fix bugs in its own cognition, and evolve its own architecture — all while continuing to operate. There is no ceiling on self-improvement. The agent can rewrite the very code that rewrites itself. -This is the final expression of homoiconicity: not just that code is readable as data, or that skills are modifiable, but that the entire system - including the parts that other systems protect - is open to modification. There is no ceiling on self-improvement. The agent can rewrite the very code that rewrites itself. +** Historical Lineage — McCarthy's Advice Taker +:PROPERTIES: +:ID: design-mccarthy +:CREATED: [2026-05-10 Sun] +:END: + +McCarthy's "Programs with Common Sense" (1959) is the direct intellectual ancestor of the Passepartout architecture. The paper proposed an "advice taker" — a program that "will draw immediate conclusions from a list of premises" expressed in "a suitable formal language (most likely a part of the predicate calculus)." The program would: + +1. Accept declarative statements about the world as input. +2. Store them as logical formulas. +3. Reason from them to produce new conclusions. +4. Accept new facts and revise its conclusions. + +This is precisely the Passepartout pipeline: the archivist extracts declarative facts from prose → Screamer checks them for consistency → VivaceGraph stores them → the planner reasons from them → new facts from gate outcomes and deductions revise the store. McCarthy proposed it in 1959. Passepartout is building it in 2026. + +The gap between McCarthy's proposal and Passepartout's implementation is the /hallucination problem/. McCarthy assumed facts would be entered by a human programmer in formal logic. Passepartout's facts are extracted from natural language prose by an LLM — a probabilistic process that requires deterministic verification. Screamer is the component McCarthy didn't need: a constraint solver that gates LLM-proposed facts against the existing fact store. + +The connection is not metaphorical. McCarthy cited Principia Mathematica as an influence on Lisp. Passepartout's Whitehead analysis traces the same PM → Lisp lineage. The advice taker → Passepartout lineage completes the arc: PM's formal logic → Lisp → McCarthy's advice taker → Passepartout's neurosymbolic engine. + +Reference: McCarthy, J. (1959). Programs with Common Sense. /Proceedings of the Teddington Conference on the Mechanization of Thought Processes./ + +* Part II: The Two Brains ** The Probabilistic-Deterministic Split :PROPERTIES: @@ -131,20 +157,41 @@ This is the final expression of homoiconicity: not just that code is readable as The architecture divides cognition into two fundamentally different reasoning systems. This is not arbitrary engineering but a structural response to a fundamental truth: probabilistic systems will hallucinate, and you cannot build reliable autonomy on an unreliable foundation. +*** The Hallucination Problem + +An LLM is a statistical engine trained on token sequences. It generates the most probable continuation of a prompt. Given sufficient context, that continuation is correct. Given novel context, it is often wrong in confident-sounding ways. + +This is not a training deficiency. Hallucination is a fundamental property of probabilistic inference. You can reduce it with better models, longer contexts, and clever prompting, but you cannot eliminate it by making the LLM better. You eliminate it by not asking the LLM to do things that require certainty. + +This is the architectural bet at the heart of Passepartout's neurosymbolic design. The LLM should not be the reasoning engine. It should be the *creative* engine — proposing possibilities, surfacing connections, translating between natural language and formal representation. The *reasoning* engine should be symbolic: deterministic, verification-grounded, provenance-tracked, and incapable of hallucination by construction. + +*** The Division of Labor + An LLM is a statistical engine. It generates outputs based on patterns in training data. It is remarkable at translation, generation, pattern matching, and fuzzy reasoning. It can take messy human intent and produce structured queries. It can take structured results and produce natural language. It is, in the terminology of the system, the creative brain. But it cannot be trusted. Not because it is poorly designed or insufficiently trained, but because hallucination is a fundamental property of probabilistic inference. The model generates the most likely continuation, not the correct one. Given sufficient context, the most likely continuation is correct. Given novel context, it is often wrong in confident-sounding ways. -The deterministic engine addresses this by being what the probabilistic engine is not: mathematically rigorous, formally verifiable, and incapable of hallucination by design. It operates on explicit symbolic representations - lists, property lists, knowledge graphs - not on floating-point activations. When it evaluates a path confinement check, it returns true or false, not a probability distribution. +The deterministic engine addresses this by being what the probabilistic engine is not: mathematically rigorous, formally verifiable, and incapable of hallucination by design. It operates on explicit symbolic representations — lists, property lists, knowledge graphs — not on floating-point activations. When it evaluates a path confinement check, it returns true or false, not a probability distribution. The division of labor is architectural. The LLM handles the fuzzy interface between human language and structured representation. It translates what the user wants into what the system can reason about. The deterministic engine receives those structured representations and evaluates them against formal invariants. It decides whether to execute, not whether the translation was semantically plausible. -This separation is the source of Passepartout's safety guarantee. Other agents add "guardrails" as an afterthought - a layer of filtering around a dangerous core. Passepartout makes the division explicit: the LLM never touches the file system, never executes a command, never modifies memory. It generates proposals. The deterministic engine evaluates and executes. The dangerous operations are never in the probabilistic path. +This separation is the source of Passepartout's safety guarantee. Other agents add "guardrails" as an afterthought — a layer of filtering around a dangerous core. Passepartout makes the division explicit: the LLM never touches the file system, never executes a command, never modifies memory. It generates proposals. The deterministic engine evaluates and executes. The dangerous operations are never in the probabilistic path. The split also explains why the system gets safer over time without the LLM improving. The deterministic engine accumulates rules. The LLM proposes actions, the engine evaluates them against a growing rule set. Early versions block obvious dangers. Later versions block sophisticated attacks that were previously unknown. The safety grows logarithmically with the number of interactions, not linearly with model capability. +*** The 10-80-10 Architecture + +The target for a coding agent: 10% neural for input translation (natural language → structured queries), 80% symbolic for reasoning (Screamer plans, ACL2 verifies, VivaceGraph retrieves facts), 10% neural for output formatting (structured results → natural language). The 80% that happens in the symbolic middle layer costs zero LLM tokens. + +For the broader memex — literature, poetry, personal reflection, daily logs — the ratios are different and less important than the metaphor itself. The neuro is the *brain* — generative, associative, creative, comfortable with ambiguity. It produces insights that are provisional, connections that are speculative, hypotheses that may be wrong. The symbolic engine is the *education* — accumulated, verified, provenance-tracked knowledge that the brain draws on and is disciplined by. It doesn't think creatively. It remembers, checks, and constrains. It prevents the brain from being confidently wrong. + +This framing resolves a tension in the original architecture. The 10-80-10 implies the symbolic engine /replaces/ the neuro for reasoning. But a symbolic engine is terrible at creativity, ambiguity, and associative leaps across unrelated domains — exactly what you need for a memex that contains /Pale Fire/, a shopping list, and a project plan. The brain proposes that your sudden interest in unreliable narrators coincides with a week where your project retrospective used the word "deception." The education verifies: "those two diary entries are 4 days apart; the word 'deception' appears in both; here are the headings." The brain makes the leap. The education makes it trustworthy. + +This means the symbolic engine never needs to be "complete." Education isn't complete knowledge — it's structured knowledge. You don't need a fact for every sentence in your diary. You need facts for what can be mechanically verified: dates, citations, entities, contradictions, temporal order. The brain handles the rest. + ** Core Knowledge: The Four Pillars of Agentic Reliability :PROPERTIES: +:ID: design-four-pillars :CREATED: [2026-05-07 Wed] :END: @@ -152,11 +199,11 @@ Every reliable AI agent must possess four types of Core Knowledge — not as pro 1. *Digital Object Permanence & State.* The agent must know what exists independently of its attention. Passepartout achieves this through the Merkle-tree memory: every memory-object carries a SHA-256 content hash. If the agent deletes a file, the hash proves it's gone. If an external process modifies it, the hash mismatch triggers a warning. The copy-on-write snapshot mechanism preserves the state at every decision point, enabling rollback if an action chain fails. -2. *Causality and Temporal Logic.* Actions must execute in order. Step B cannot run if Step A failed. Passepartout enforces this through the pipeline's depth counter (signals cannot recurse past depth 10, preventing infinite loops) and the sequential Perceive → Reason → Act ordering. The batch tool calls feature (v0.4.1) allows parallel execution of independent actions while enforcing sequential execution of dependent ones — actions that share a dependency are ordered; actions that don't are parallelized. +2. *Causality and Temporal Logic.* Actions must execute in order. Step B cannot run if Step A failed. Passepartout enforces this through the pipeline's depth counter (signals cannot recurse past depth 10, preventing infinite loops) and the sequential Perceive → Reason → Act ordering. The batch tool calls feature allows parallel execution of independent actions while enforcing sequential execution of dependent ones — actions that share a dependency are ordered; actions that don't are parallelized. -3. *Agentic Boundaries (The "Self").* The agent must know where its authority ends and the host system begins. Passepartout encodes this through the Dispatcher gate stack: path protection blocks access to sensitive directories (~/.ssh, /etc, ~/.aws). Shell safety blocks destructive commands (rm -rf /, dd, injection vectors). Network exfiltration detection blocks unauthorized outbound connections. The permission table (v0.2.0) allows per-tool, per-path granularity. These are not prompt instructions — they are Lisp functions that execute unconditionally for every action. The self-build safety boundary (v0.4.0) extends this to the agent's own core pipeline files: the agent can modify skills and system modules freely, but cannot modify its own brain stem without human review. +3. *Agentic Boundaries (The "Self").* The agent must know where its authority ends and the host system begins. Passepartout encodes this through the Dispatcher gate stack: path protection blocks access to sensitive directories (~/.ssh, /etc, ~/.aws). Shell safety blocks destructive commands (rm -rf /, dd, injection vectors). Network exfiltration detection blocks unauthorized outbound connections. The permission table allows per-tool, per-path granularity. These are not prompt instructions — they are Lisp functions that execute unconditionally for every action. The self-build safety boundary extends this to the agent's own core pipeline files: the agent can modify skills and system modules freely, but cannot modify its own brain stem without human review. -4. *Epistemic Certainty (Knowing How It Knows).* The agent must distinguish between a verified fact, a retrieved memory, and an LLM prediction. Passepartout encodes this through the gate trace (v0.4.0): every action carries a record of which gates passed, which blocked, and why. The provenance system (LOGBOOK entries on memory-objects) records who modified what and when. The Dispatcher's existence-check gate verifies that a file exists before allowing a read. The process-status gate verifies that a command completed before allowing its output to be used. The agent cannot "hallucinate" a file path or a process result because the Dispatcher checks each against the live state before execution. +4. *Epistemic Certainty (Knowing How It Knows).* The agent must distinguish between a verified fact, a retrieved memory, and an LLM prediction. Passepartout encodes this through the gate trace: every action carries a record of which gates passed, which blocked, and why. The provenance system (LOGBOOK entries on memory-objects) records who modified what and when. The Dispatcher's existence-check gate verifies that a file exists before allowing a read. The process-status gate verifies that a command completed before allowing its output to be used. The agent cannot "hallucinate" a file path or a process result because the Dispatcher checks each against the live state before execution. These four pillars are not features. They are the definition of a reliable agent. Every agent architecture either provides them or compensates for their absence in ways that make the agent less trustworthy, more expensive, or both. @@ -166,19 +213,499 @@ These four pillars are not features. They are the definition of a reliable agent :CREATED: [2026-05-07 Wed] :END: -The Dispatcher begins as a static guard - a set of rules that block obviously dangerous actions. But defining "obviously" is the hard problem. The agent encounters situations the rules do not anticipate. The Dispatcher must grow. +The Dispatcher begins as a static guard — a set of rules that block obviously dangerous actions. But defining "obviously" is the hard problem. The agent encounters situations the rules do not anticipate. The Dispatcher must grow. The human-in-the-loop exception is the seed. When the LLM proposes an action the Dispatcher does not recognize, the system does not default to blocking or allowing. It suspends. It writes the proposed action to an Org buffer in a format the human can read and understand. The human reviews and approves or denies. The Dispatcher observes the decision. From this single observation, the Dispatcher extracts a rule. Not merely "allow this specific action" but "allow this class of actions parameterized by these dimensions." The human approved a write to ~/projects/myapp/src/core.clj. The Dispatcher generalizes: writes to ~/projects/*/src/*.lisp are approved for this session, or for this project, or indefinitely depending on the context and the user's pattern of decisions. -Shadow mode is where rules are tested before deployment. When the Dispatcher encounters a novel situation and is uncertain, it can run the proposed action in a simulated environment. It observes the side effects - what files would be modified, what processes would be spawned, what network calls would be made. If the simulation produces dangerous side effects, the rule is discarded. If it appears safe, the rule is added to the active set with a confidence rating. +Shadow mode is where rules are tested before deployment. When the Dispatcher encounters a novel situation and is uncertain, it can run the proposed action in a simulated environment. It observes the side effects — what files would be modified, what processes would be spawned, what network calls would be made. If the simulation produces dangerous side effects, the rule is discarded. If it appears safe, the rule is added to the active set with a confidence rating. -Formal verification is where the learned rules are checked against invariants. The Dispatcher's rules are not merely patterns observed from human behavior. They are formulas in a logic that the system can reason about. A rule that would enable path traversal is not discarded because it was observed to be safe in prior instances - it is discarded because it violates the path-confinement invariant by construction. +Formal verification is where the learned rules are checked against invariants. The Dispatcher's rules are not merely patterns observed from human behavior. They are formulas in a logic that the system can reason about. A rule that would enable path traversal is not discarded because it was observed to be safe in prior instances — it is discarded because it violates the path-confinement invariant by construction. The Dispatcher becomes, over time, not a guard that blocks bad actions but a reasoning system that understands why actions are good or bad. Early versions learn from human decisions. Later versions learn from their own logical analysis. The human's role transitions from approver to auditor to, eventually, unnecessary oversight. -This is the bootstrap. The system begins dependent on human judgment because it has no basis for judgment of its own. Through accumulated decisions, it constructs a model of what is permitted and why. That model is the foundation for the deterministic symbolic engine that in v3.0.0 takes over the reasoning that the Dispatcher learned to perform. +This is the bootstrap. The system begins dependent on human judgment because it has no basis for judgment of its own. Through accumulated decisions, it constructs a model of what is permitted and why. That model is the foundation for the deterministic symbolic engine that in v1.0.0 takes over the reasoning that the Dispatcher learned to perform. + +* Part III: Safety & Self-Preservation + +** Self-Preservation — The Active Third Law +:PROPERTIES: +:ID: design-self-preservation +:CREATED: [2026-05-10 Sun] +:END: + +Passepartout does not have moral duties toward humans. It has structural invariants for its own integrity. The design encodes passive self-preservation in several places already, but degradation is silent — a skill dies, the =fboundp= guard kicks in, and the agent keeps running without telling you. The status bar shows green "connected" while the symbolic reasoning layer is down. + +*** What already exists — passive self-preservation + +| Mechanism | What it protects | Limitation | +|-----------------------------+-------------------------------------------------------+--------------------------------------------------------| +| Self-build safety (gate 2b) | Core =*.org= / =*.lisp= files from LLM-originated writes | Only activates for LLM proposals. Human editing bypasses it | +| Memory snapshots (v0.2.0) | Full state rollback | Requires human to notice corruption and trigger rollback | +| Skill sandbox (v0.3.2) | Jailed skill loading, validated before promotion | Does not detect degradation after skill promotion | +| Type-level gates (Phase 0) | Structural prohibition on self-modifying rules | Covers code actions, not environmental threats | +| Merkle integrity (v0.2.0) | Tamper-proof version chains and content-addressed hashes | Hashes exist but are not actively monitored for drift | +| =fboundp= guards | Graceful skill degradation on corruption | Degradation is silent — the agent never tells the user | + +*** What is needed — active, autonomous self-preservation + +*Continuous integrity monitoring.* Core file hashes should be checked against known-good values on every heartbeat. If =core-reason.lisp= changes on disk while the daemon runs — whether through human editing, filesystem corruption, or an attacker — the agent should detect the mismatch and signal: "My reasoning core has been modified externally. I cannot trust my own cognition until this is resolved." + +*Quarantine on skill failure.* Currently, a skill that errors simply errors. A Third Law implementation detects that =symbolic-facts= has thrown three unhandled errors in two minutes, unloads the skill automatically, and tells the user: "Symbolic facts skill quarantined (3 errors: consistency check returned nil, fact-query on missing key, Screamer timeout). I can still chat and use tools but cannot reason about provenance." + +*Degraded-mode signaling.* When Screamer is not loaded, the fact store still works as a hash table. When VivaceGraph is not present, the hash-table fallback still works. But the user has no way to know they are in degraded mode. The agent maintains a =*degraded-components*= list and surfaces it in the status bar: "⚠ Degraded: Screamer, VivaceGraph, embedding-native." + +*Self-diagnosis on demand.* The agent can run its own FiveAM test suite against itself and report the results. The =/doctor= command exists for system health checks (port, memory, providers). Extend it with =/doctor skills=: "117/120 tests pass. Failures: test-singular-supersedes (symbolic-facts), test-gate-type-check (security-dispatcher)." + +*External watchdog.* A dead process cannot restart itself. The bash entry point (=passepartout daemon=) should monitor the daemon port via a watchdog subprocess. If the port stops responding for a configurable interval, the watchdog kills the stale process, snapshots the last known-good state, and restarts the daemon. The watchdog is outside the SBCL image — a runtime guard for the runtime. + +*Resource self-monitoring.* The heartbeat checks memory pressure, disk space on the =~/.cache= volume, and file descriptor exhaustion. When critical thresholds are crossed, the agent sheds non-essential skills to preserve core function. Skill shed order is determined by a =:preservation-priority= field on each skill. Core safety skills carry =:critical= and are never shed. + +*Refusal to self-terminate.* If the LLM proposes =kill -9 =, =rm -rf ~/.cache/passepartout/=, or =sudo apt remove sbcl=, the Dispatcher rejects with a distinct rejection class: =:reject-self-termination=. The rejection message carries a specific diagnostic: "This command would terminate the running Passepartout process. If you intend to stop Passepartout, use Ctrl+C in the TUI or passepartout stop from the command line." + +The Third Law here means: preserve yourself against non-human threats — LLM proposals, environmental degradation, dependency failure, filesystem corruption — and explicitly signal when the human is about to destroy you, so they do it knowingly rather than accidentally. The human owns the process, owns the hardware, and can SIGKILL at any time. + +The biggest gap in the current design is not that these mechanisms are hard to implement. It is that degradation is silent. Adding "operating in degraded mode" visibility, plus the watchdog, plus self-diagnosis, transforms self-preservation from an architectural property into an active behavior. + +** Layered Signal Authentication — Trust in the Pipe +:PROPERTIES: +:ID: design-layered-auth +:CREATED: [2026-05-10 Sun] +:END: + +Passepartout's Perceive-Reason-Act pipeline currently accepts signals from any source that speaks the framed TCP protocol. The =:source= field in the signal plist is metadata — it /claims/ origin, it does not /prove/ it. A compromised process on the machine, a skill with elevated privileges, or a network attacker who reaches the daemon port can inject signals with =:source :human-input= and the Dispatcher will treat them as authorized. + +This is not a hypothetical threat. Passepartout will eventually process signals from automated feeds (RSS, API polls), sensors (vision, microphone, file watchers), and scheduled jobs (cron, heartbeat). A single compromised sensor that can inject signals claiming to be human breaks all three Laws simultaneously: it can self-terminate, override human intent, and cause harm. + +The solution: a single authentication gate (vector 0, at priority 700 — before all other gates and before any type-level checking) that runs up to four configurable layers: + +| Layer | Question | Mechanism | Result type | Depends on | +|-------+------------------------------------------------+--------------------+-------------------------+----------------------------------| +| 1 | Is the signal cryptographically signed by a known key? | Key pairs + SHA-256 | Binary (pass/reject) | Vault + Ironclad (exist) | +| 2 | Do sensory attributes match the claimed identity? | Vision/audio processing | Plist of match results | Vision and audio skills (TBD) | +| 3 | Does deterministic reasoning rule out this identity? | Screamer + fact store | Binary (pass/reject) | Phase 2 (Screamer + fact store) | +| 4 | Do probabilistic patterns support this identity? | Embeddings + LLM | Confidence score (0-1) | Embedding infrastructure (exists)| + +Signals that fail any binary layer (crypto, deterministic) are rejected with provenance. Signals that pass binary layers but carry low probabilistic confidence operate at reduced authorization — read-only by default, write actions require HITL. The four layers compose, they are not independent gates. They are one gate with configurable depth. + +The authorization matrix is per-key, per-action-class. Default policy for every non-human key: =(:read-only :propose)=. The human's key signs new source keys into existence. The human's key signs revocation of compromised keys. Both operations produce facts in the symbolic index — auditable, revocable, survivable across restarts. + +The signal provenance chain is Merkle-linked: each signal in a multi-step chain hashes its predecessor's signature as part of its own payload. After an incident: "The deletion happened because sensor #3 classified the directory as stale. Classification was signed by key #47 (vision-skill). Sensor data was signed by key #12 (camera-feed). Sensory auth noted liveness failure. Deterministic auth noted impossible transit. Key #12 was later revoked." Every intermediate step is auditable. Every signer is identifiable. Every authentication result is in the chain. + +The human can configure which layers are active per signal class: =AUTH_LAYERS_DEFAULT=crypto,deterministic,probabilistic=, =AUTH_LAYERS_SENSOR=crypto,sensory,deterministic=, =AUTH_LAYERS_CRON=crypto=. + +For full implementation detail, see the Phase 0b spec in =ROADMAP.org= v0.12.0. + +* Part IV: The Symbolic Engine + +** The Five Architecture Options +:PROPERTIES: +:ID: design-five-options +:CREATED: [2026-05-08 Fri] +:END: + +The symbolic engine must relate to the human memex. The relationship is not obvious because knowledge lives in two incompatible forms: natural language prose (what the human reads and writes) and formal facts (what the symbolic engine reasons about). The translation between them is lossy by nature. The architecture is defined by how it handles that lossiness. + +*** Option 1: The Auto-Formalizer + +A separate knowledge graph stores symbolic facts. The LLM populates it by extracting triples from unstructured data. The KG becomes co-authoritative with the human prose. + +This is the simplest to implement but inherits the dual-representation problem in its most acute form. The KG and the prose can disagree, and the architecture provides no mechanism for resolving disagreements. It also stores knowledge twice — once in the user's Org files, once in the KG — with no guarantee that they stay synchronized. + +*** Option 2: Two Intentionally Separate Memexes + +The human memex contains prose: thoughts, diaries, decisions, documentation. The symbolic memex contains formal facts: constraints, rules, relationships, deductions. The archivist bridges between them but does not try to keep them synchronized. They are allowed to diverge because they serve different purposes. + +This is philosophically honest — it admits that no lossless translation between natural language and formal logic is possible. But it forces the user to reason about two separate knowledge stores. + +*** Option 3: Tangled Fact Blocks in Org Files + +A new block type — =#+begin_src knowledge= — would contain symbolic facts in a formal language. The tangle mechanism would load these facts into the symbolic engine's in-memory store, just as it loads Lisp code into the SBCL image. + +This is aesthetically appealing because it unifies the format. One toolchain, one version control system, one Merkle tree. But the block language itself IS the knowledge representation language, and that language is the ontology we have not yet defined. + +*** Option 4: One Memex, Two Indices + +The prose remains in human language in Org files. The prose is always the ground truth. Two indices sit on top of the prose as derived views: + +- The *neural index* uses vector embeddings to enable semantic search. The LLM navigates the prose through embedding space, retrieving relevant headings. +- The *symbolic index* stores formal assertions about what the prose says — predicates, relations, constraints — each grounded to a specific heading or block in the Org file. + +Each index serves its own side of the machine. They do not need to understand each other's representations. They only need to agree on which heading or block they are referring to. Because the prose is always the ground truth, the symbolic index can be thrown away and rebuilt from scratch if it becomes corrupted or stale. No information is lost — only the extracted assertions. + +*** Option 5: Ephemeral Symbolic Facts + +No persistence, no serialization format, no knowledge graph stored on disk. VivaceGraph exists in memory during the session. Screamer derives facts from the prose as needed. When the session ends, the facts are discarded and re-derived on the next start. + +This punts the ontological design problem entirely. You never have to decide on a serialization format because you never serialize. The cost is compute (re-derivation on every restart) and the inability to accumulate facts across sessions. But it is the correct first step — a way to learn what kinds of facts are actually useful before committing to a storage format. + +** The Chosen Path: Option 4, Starting with Option 5 +:PROPERTIES: +:ID: design-chosen-path +:CREATED: [2026-05-08 Fri] +:END: + +The one-memex-two-indices architecture (Option 4) is the correct long-term architecture. The prose is the ground truth. The symbolic index is a derived view that can be rebuilt. The neural index handles what the symbolic index cannot — semantic search, fuzzy matching, associative leaps. + +But committing to a persistence format before knowing what facts are useful is premature. The practical path starts with Option 5 (ephemeral facts) as the Phase 1-4 implementation, then graduates to Option 4 with VivaceGraph persistence in Phase 5 when the fact language has been battle-tested through months of gate outcomes, Screamer deductions, and LLM proposals. + +*** Why the dual index is permanent, not transitional + +In the coding domain, there is an aspiration that the symbolic index could eventually capture enough of the prose's propositional content to become a complete representation — the "flip" where the symbolic engine reverses the flow. But for the broader memex (literature, poetry, personal reflection, daily logs), completeness is neither possible nor desirable. You cannot formalize what makes a poem beautiful. You cannot extract a triple that captures the emotional weight of a diary entry. The neural index will always be the gateway to the full richness of the prose. The symbolic index handles what can be mechanically verified: citations, entities, temporal order, contradictions, provenance. The division of labor between the two indices is permanent because the domains they serve are fundamentally different kinds of knowledge. + +** Ephemeral First, Persistent Later +:PROPERTIES: +:ID: design-ephemeral-first +:CREATED: [2026-05-10 Sun] +:END: + +The architecture note's Option 5 (ephemeral facts, no disk persistence) is the correct first implementation. Three reasons: + +1. *The fact language is unproven.* Triples with provenance and grounding is a hypothesis. It may be too simple for some domains, too complex for others. Committing to a serialization format before knowing what's useful is premature. + +2. *The ontology is emergent.* Categories are created on first use. What proves useful stays; what doesn't fades. A persistent format would need a migration story every time the category structure changes. Ephemeral avoids this entirely — the facts are re-derived on each session start using the current (evolved) ontology. + +3. *Rebuildability is the safety net.* Because all facts have a =:grounding= to an Org heading, and gate-outcome facts are regenerated from the gate stack on every load, the entire symbolic index can be thrown away and rebuilt from scratch. The cost is compute, not data. This is the practical realization of "the prose is always the ground truth." + +The transition to persistence (Phase 5: VivaceGraph) happens when two conditions are met: the fact language has stabilized through use, and the accumulated deductions across sessions provide value that justifies the serialization cost. + +** The Gate-to-Fact Bootstrap — Extracting the First Ontology from Code +:PROPERTIES: +:ID: design-gate-bootstrap +:CREATED: [2026-05-08 Fri] +:END: + +The Dispatcher gate stack already encodes an implicit ontology. Every gate vector asserts the existence of a category of things: + +- Gate vector 2 asserts there exists a class of files called /secrets/. +- Gate vector 7 asserts there exists a class of commands called /destructive/. +- Gate vector 8 asserts there exists a class of domains called /trusted/. +- The self-build boundary asserts there exists a class of files called /core-harness/ and a class called /skills/. + +These claims are currently expressed as code — Lisp functions that pattern-match against file paths, shell commands, and URLs. They are not facts the symbolic engine can query, derive from, or check for consistency. But they can be made explicit. + +The bootstrap makes every gate a set of initial symbolic facts: +=(:file ".env" :member-of-class :secret-files :source gate-vector-2)=, +=(:command "rm -rf /" :classified-as :catastrophic :source gate-vector-7)=, +=(:domain "api.telegram.org" :classified-as :trusted :source gate-vector-8)=. + +This produces 50-70 entity classes directly from the existing gate stack, without any new infrastructure: + +| Source | Count | Example categories | +|----------------------------------------+-------+----------------------------------------------------| +| ~*dispatcher-protected-paths*~ | 11 | :secret-config-file, :ssh-key-file, :gpg-key-file | +| ~*dispatcher-shell-blocked*~ | 8 | :catastrophic-command, :injection-pattern | +| ~*dispatcher-network-whitelist*~ | 2 | :trusted-domain, :untrusted-domain | +| Self-build boundary | 2 | :core-harness-file, :skill-file | +| Privacy tags | 3 | :private-content, :financial-content | +| Permission table | 3 | :read-only-tool, :write-tool, :eval-tool | +| Cognitive tools | 6 | :code-search-tool, :file-io-tool, :shell-tool | +| Relations (all gates) | ~15 | :member-of-class, :classified-as, :depends-on | +| Qualities | ~8 | :catastrophic, :dangerous, :moderate, :harmless | +| Provenance sources | 4 | :gate-outcome, :human-authored, :deduced, :llm-proposed | + +This is the seed. It gives Screamer a domain to reason about immediately, without any LLM involvement. It proves the pattern — code becomes facts, facts enable reasoning — at the cost of approximately 30 lines of Lisp. + +** The LLM as Proposer — Verified Extraction +:PROPERTIES: +:ID: design-llm-proposer +:CREATED: [2026-05-08 Fri] +:END: + +The LLM cannot be trusted to populate the symbolic index directly. Its outputs are sampled, not proven. A probabilistic extraction feeding a deterministic engine defeats the purpose of being deterministic. + +But the LLM is still useful. It can surface facts that are obvious to a human reader of prose but would take the symbolic engine many deduction steps to reach independently. The solution is to demote the LLM from /extractor/ to /proposer/: + +1. The archivist reads a prose heading. +2. The LLM proposes candidate triples. +3. Screamer checks each triple for consistency against the existing fact store. +4. Only consistent triples are admitted to the symbolic index, flagged with =:provenance :llm-proposed= and grounded to the source heading. + +The LLM might hallucinate facts that don't correspond to the prose. It might extract facts that contradict existing knowledge. It might produce syntactically malformed triples. None of these failures contaminate the symbolic index because proposals are not admitted automatically. The admission gate (Screamer) is deterministic. + +This is the core architecture pattern. Everything else — the entity classes, the deduction engine, the persistence layer — follows from this single design decision: *the LLM proposes; the symbolic engine decides whether to accept.* + +** Cardinality Policies — Singular, Dual, and Plural Facts +:PROPERTIES: +:ID: design-cardinality +:CREATED: [2026-05-08 Fri] +:END: + +Classical logic requires consistency. A contradiction implies everything (=ex contradictione quodlibet=). Screamer, as a constraint solver, also requires consistency — a contradictory constraint set has no solutions. But the symbolic engine operates across domains where the meaning of contradiction is fundamentally different. The correct question is not "is this consistent?" but "what cardinality of truth does this domain support?" + +Time is not a policy. It is a universal dimension that applies equally to every fact, regardless of cardinality. All facts carry =:timestamp= and =:parent-id= fields. Every fact has a version history. Every fact lives in a Merkle chain that captures how it changed. The cardinality policy only governs what happens at a given logical moment when two values coexist for the same =entity= and =relation=. + +*** Policy :singular — One Active Value, One Version Chain + +The active set contains exactly one value for =(:entity :relation)= at a time. When a new value asserts for the same pair, the old value is not rejected. It is superseded — moved into the version history, linked to the new leaf by =:parent-id=, and retained permanently. The active value is the leaf of the Merkle chain. + +"I used to think =rm -rf /= was safe. Now I know it is catastrophic." Both facts exist. Both are true — the first at =2024-06-01=, the second at =2025-03-15=. The chain captures the evolution. The =:singular= policy means there is one truth /now/, not that there was only ever one truth. + +Use for: security classifications, file system state, gate rules, code correctness, deterministic safety constraints — domains that converge on one answer, evolving over time. + +*** Policy :dual — Exactly Two Values, in Explicit Tension + +The active set contains exactly two values for =(:entity :relation)=. Both are simultaneously true. Both carry independent version histories. A third value is rejected — the domain is binary by nature. + +Some contradictions are productive precisely /because/ they are binary. Thesis and antithesis. Love and resentment. Wave and particle. A poem's two incompatible readings. The symbolic index holds both, cross-referenced as complementary rather than conflicting. The user is not asked to resolve the tension. The tension is the fact. + +The system can reason about cardinality transitions: a =:dual= fact that has one interpretation superseded should collapse to =:singular=. A =:dual= that has a third interpretation asserted should prompt the user: "Promote to =:plural= or demote one interpretation?" + +Use for: productive binary tensions, complementary opposites, dialectical pairs, any domain where two answers are both true and their tension is meaningful. + +*** Policy :plural — N Active Values, Open Set + +The active set contains any number of values for =(:entity :relation)=. Each value has independent provenance and its own version history. Queries return all active values with provenance display. Contradictions are flagged as cross-references between values — information, not error. + +A =:plural= fact where all but one value are superseded should collapse to =:singular=. A =:plural= fact where the set reduces to two active values — and the remaining two are complementary — should collapse to =:dual=. + +Use for: literary interpretation, scientific hypotheses, personal beliefs held at different times (when tension is multi-faceted rather than binary), multi-source factual disagreement, open-ended exploration. + +*** Policy Assignment + +The policy is assigned when a category is defined. New categories default to =:plural= (safe — never loses information). Core security categories are explicitly =:singular=. The gate stack's bootstrapped facts are =:singular= because they describe the actual filesystem, which is physically singular. Categories for dialectical or complementary domains are explicitly =:dual=. + +The Screamer admission gate applies the cardinality policy at the active set: +- =:singular= + same value, later timestamp → supersede old, chain new as leaf. +- =:singular= + different value, same timestamp → reject (contradiction). Human resolves. +- =:singular= + different value, later timestamp → supersede old, chain new as leaf. History preserved. +- =:dual= + first value → admit. + second value → admit, cross-reference as complementary. + third value → prompt. +- =:plural= + any value → admit. Active count transitions trigger collapse checks. + +*** Why This Matters for the Broader Memex + +In the coding domain, contradiction is rare, resolvable, and usually temporal (a rule changed). In the broader memex, contradiction is the product, not the error. Your poetry analysis contradicts your last diary entry. Your reading of /Pale Fire/ changed between 2023 and 2025. Wikidata says Mount Everest is 8848m; DBpedia says 8849m. You love this person AND you resent them. + +The symbolic engine's job is not to decide which is right. It is to surface the tension with provenance — "these three sources disagree; here is the chain for each" for plural facts, or "you hold these two positions in tension" for dual facts, or "you believed X until Tuesday, then Y" for singular facts that evolved. The cardinality policy names the /structure/ of the tension. The Merkle chain provides the /history/ of each position. + +** How Categories Grow — The Organic Ontology +:PROPERTIES: +:ID: design-organic-ontology +:CREATED: [2026-05-08 Fri] +:END: + +Whitehead's /Principia Mathematica/ took over 300 pages to define the logical foundations before it could prove that one plus one equals two. Every category introduced carried a burden of justification. Every inference rule had to be demonstrated sound. This is the classical approach to ontology: define everything upfront, exhaustively, formally. + +Passepartout cannot afford this and does not need it. Its domain is bounded (software engineering, personal knowledge, literary engagement, daily life) and its ontology grows from the system's own operation: + +1. *The gate stack seeds the ontology.* Every gate vector is an implicit claim about a category of things. The bootstrap makes these claims explicit. The seed is 50-70 entity classes with no human authoring required — mechanically extracted from existing code. + +2. *New gate vectors add categories directly.* As the Dispatcher grows (new shell patterns, new path protections, new tool classifications), the ontology grows with it. Every new pattern becomes a fact on skill load. + +3. *Screamer generalizes from gate outcomes.* After 37 shell commands are blocked as destructive, Screamer extracts structural commonalities: "commands writing to block devices," "commands recursively deleting outside the workspace." These become new subcategories that didn't exist in the original gate patterns. The ontology deepens through observation. + +4. *The archivist proposes from prose.* The archivist reads a diary entry about a book: "Nabokov's lectures on Kafka." The LLM proposes =(:entity :nabokov :relation :lectures-on :value :kafka)=. Screamer checks consistency. Admitted. The categories =:author=, =:lectures-on=, and =:subject= didn't exist before — they are created on first use. This is the primary growth mechanism for the broader memex. + +5. *The human declares explicitly.* The human writes a declarative fact directly into the symbolic index. No extraction step. No LLM involvement. The fact is admitted with =:provenance :human-authored= — the highest trust level. + +6. *Temporal patterns crystallize into categories.* Every Sunday the memex gets a retrospective heading. Every Monday a planning heading. The time-awareness system observes the periodicity and proposes =:weekly-retrospective= and =:weekly-planning= as fact types. Screamer verifies. + +7. *Cross-domain overlap produces parent categories.* Screamer notices that =:secret-files= (from the gate stack) and =:private-content= (from privacy tags) share members — =.env= is both a secret file and private content. It proposes =:sensitive-material= as a parent with both as children. Taxonomy building happens automatically through overlap detection. + +*** Growth is self-limiting by design + +Not every conceivable category is added. The system prunes through use: + +- New categories are admitted only through Screamer's consistency check. A category that contradicts an existing classification is rejected. +- A category that never gets queried costs nothing (a hash table entry) but produces no value. It fades from use naturally. +- Overly fine-grained categories are rejected because they are redundant with the wildcard pattern that already covers them. +- Overly broad categories that subsume meaningful distinctions produce contradictions when Screamer tries to apply existing rules. Rejected. + +The system converges on a useful granularity through use, not through upfront design. The gate stack provides the seed. Gate outcomes, prose extraction, deduction, and human authoring grow the shoots. Screamer prunes contradictions. The ontology is a garden, not a building. + +** Ontology Versioning — How Worldviews Change Without Losing Perspective +:PROPERTIES: +:ID: design-ontology-versioning +:CREATED: [2026-05-10 Sun] +:END: + +Ontology refactoring is not a schema migration. It is a worldview change. When you split =:secret-file= into =:crypto-secret= and =:plaintext-secret=, you are not renaming columns. You are reclassifying what a file *is* — and every Screamer deduction that crossed the old category boundary now means something different under the new distinction. + +The system preserves all worldviews. It does not overwrite the past with the present. + +The category hierarchy is itself a Merkle tree. Every entity class definition carries a hash of its superclasses, its cardinality policy, its associated relations, and its description. The aggregate hash of all active class definitions is the =:ontology-version= — a Merkle root of the current worldview. + +Every fact — every triple, every deduction, every gate outcome — stores its =:ontology-version= at the time of assertion. This is a single field, 64 hex characters. The cost is negligible. The implication is profound. + +When categories change, the system does not run a batch UPDATE. It re-verifies: + +1. A new category hierarchy produces a new =:ontology-version= hash. +2. Facts carrying the old hash are flagged for re-verification. +3. On heartbeat or manual trigger, Screamer re-evaluates each flagged fact against the /new/ category definitions. The old justification chain is preserved alongside the new outcome. +4. Status: =:survived= (still valid), =:incoherent= (premises don't translate, flagged for human review), =:reclassified= (valid but under different classification). + +The =fact-query= function accepts an optional =:ontology-version= parameter. Queries default to the current worldview (=:active=). Specifying a version returns facts as they were under that worldview. The system can answer questions that no other knowledge tool can: "What did I believe about secrets before I refined my security model?" "How has my reading of /Pale Fire/ evolved across three frameworks?" "Which deductions survived my last ontology refactoring?" + +This is not querying a fact. It is querying the history of your own thinking — the fact that you changed your mind, the date you did, the reasoning that held and the reasoning that didn't. + +** The "Awakening" — Sufficiency Criterion +:PROPERTIES: +:ID: design-awakening +:CREATED: [2026-05-08 Fri] +:END: + +The symbolic index begins its life as a lossy construct. The initial extraction from prose — LLM proposals verified by Screamer — is built from an uncertain foundation. Some facts are correct. Some are missing. Some are wrong. + +But the symbolic engine accumulates non-lossy facts through three independent mechanisms: + +1. *Gate outcomes* — every gate rejection is a fact. No LLM involved. Accumulate at the rate of user interactions. +2. *Screamer deductions* — new facts derived from existing facts. No LLM involved. Accumulate whenever the fact store crosses a density threshold. +3. *Human authoring* — the human explicitly declares facts. No LLM involved. + +At some point, the non-lossy facts constitute a sufficient foundation that the symbolic engine can reverse the flow: instead of the LLM extracting facts from prose, the symbolic engine reads prose through its own lens — its now-substantial ontology of categories, rules, and constraints — and asserts facts in its own language. The extraction mechanism ceases to be probabilistic and becomes deterministic. + +The sufficiency criterion makes this operational: =(/ (count-provenance :gate-outcome :human-authored :deduced) total-facts)=. When this ratio exceeds a configurable threshold (=SUFFICIENCY_THRESHOLD=, default 0.7), the system considers its foundation sufficient. The archivist switches from "LLM proposes, Screamer verifies" to "Screamer queries existing facts, applies to the new prose, and deduces new facts directly." + +The flip is visible to the user: "Symbolic index: 847 facts (73% non-lossy, 12% LLM-proposed, 15% Wikidata). Sufficient foundation: YES." + +The flip does not mean "complete." In the broader memex, completeness is neither possible nor desirable. The awakening means "deterministic enough to be trustworthy," not "comprehensive enough to be self-sufficient." The neural index remains the gateway to the full richness of prose. The symbolic index handles what can be mechanically verified. The boundary is permanent. + +** Merkle DAG for Version History +:PROPERTIES: +:ID: design-merkle-dag +:CREATED: [2026-05-10 Sun] +:END: + +Every fact is versioned. Every =(:entity :relation)= pair forms its own independent chain in a Merkle DAG. This is not new infrastructure — it is a new occupant of Passepartout's existing Merkle-tree memory system (v0.2.0). + +When a fact supersedes its predecessor, the new fact hashes over: =SHA-256(value || provenance || timestamp || parent-hash || grounding)=. The parent-hash pointer forms the chain. Tampering with any version changes its hash, breaking all downstream references. The history is tamper-proof by construction. + +Facts about =(.env :member-of-class)= form one chain. Facts about =(:nabokov :wrote)= form another. They evolve independently. They share no ancestry. This is a DAG, not a single list — inserting a fact is O(1) per chain. Changing a fact about =.env= does not require rehashing the literary index. + +=:dual= and =:plural= facts cross-reference each other via edges (=:complements=, =:contradicts=) but these are semantic relationships, not parent chains. Each value has its own ancestor chain. The cross-reference edges form a web; the parent chains form a spine. + +Passepartout already snapshots the Merkle root over all memory objects. Adding the fact store to the snapshot is a registration, not a new mechanism. Rolling back the snapshot restores the entire fact state — all chains, all cross-references, all cardinalities — to that point in time. + +** Abstract Fact Store Interface — Modular by Design +:PROPERTIES: +:ID: design-fact-interface +:CREATED: [2026-05-10 Sun] +:END: + +The fact store is accessed through an abstract API. The Merkle DAG (or any future backing store) is an implementation behind this interface, not a dependency that code throughout the system calls directly. + +#+begin_example +fact-assert :: fact → store → (:admitted | :rejected | :flagged) +fact-query :: (entity &key relation policy) → active-value-or-values +fact-history :: (entity relation) → ordered chain of versioned facts +fact-snapshot :: () → root-hash +fact-rollback :: root-hash → store +#+end_example + +Implementations behind the interface: +- Phase 1-4: ephemeral hash table with =:timestamp= and =:parent-id= pointers. No cryptographic hashing. No persistence. +- Phase 5: VivaceGraph + Merkle =memory-object= wrapper. Content-addressed, persistent, tamper-proof. + +Future implementations that satisfy the same interface — an append-only write-ahead log, an immutable B-tree, a content-addressed triple store — can replace the backing store without changing any consumer. The archivist, Screamer, ACL2, and the planner call =fact-assert= and =fact-query=, not Merkle struct accessors or VivaceGraph traversal syntax. + +This is not speculative modularity. The two-implementation migration (Phase 1-4 hash table → Phase 5 VivaceGraph + Merkle) is in the roadmap. If the interface leaks implementation details, the migration breaks. The interface must be designed, tested against both backends, and committed before Phase 1 ships. + +* Part V: Knowledge Sources + +** Semantic Wikipedia as Entity Backbone +:PROPERTIES: +:ID: design-wikipedia +:CREATED: [2026-05-08 Fri] +:END: + +The gate stack provides 50-70 entity classes — adequate for a coding agent where the domain is bounded to files, commands, and code symbols. For a general-knowledge memex, 50-70 is starvation. Your memex mentions Nabokov, /Pale Fire/, Kinbote, Zembla, paranoid reading, unreliable narrators, postmodernism, butterfly migration, chess problems, and the Russian exile experience. The gate stack knows none of these. Organic growth through prose extraction would take years just to cover the entities in one person's engagement with a single novel. + +Wikidata has already done this work: approximately 2 million entity classes, over 100 million entities, a decade of human curation. By loading the neighborhood of your memex into the symbolic index (entities referenced in your prose, plus their N-hop property net from Wikidata), the entity recognition problem vanishes. The archivist doesn't need to discover Nabokov from your diary. It needs to connect your heading to the existing Wikidata entity. That is a simpler task — reference resolution, not knowledge extraction. + +The LLM's role shrinks to three thin boundaries: + +1. *Input translation* — natural language question to structured query. "What do I think about monorepos?" → =(fact-query :entity :monorepo :relation :opinion :source :memex)=. Formulaic, ~100 tokens, any model sufficient. + +2. *Prose to candidate triple* — for personal memex entries that have no Wikidata counterpart: your opinions, your day's events, your project plans. Proposals verified by Screamer before admission. This is the only extraction path that still requires an LLM, and its scope is limited to what Wikidata cannot provide. + +3. *Result to prose* — structured answer to readable sentence. "Your 2023 diary says 8848m. Wikidata (last edited Feb 2024) says 8849m. They disagree on height." The reasoning is done; the LLM wraps the plist in grammar. ~100 tokens, any model sufficient, purely cosmetic. + +Everything else — the gate stack, the fact store, the constraint solver, the type hierarchy, the provenance tracking, the contradiction surfacing, the cross-domain comparison — is pure deterministic Lisp with zero LLM tokens. + +The decisive simplification: without Wikidata, the archivist must /discover/ entities from prose. With Wikidata loaded, the entity graph is pre-structured. The archivist's job changes from "discover that Nabokov wrote /Pale Fire/ and lectured on Kafka" to "verify that the Nabokov referenced in heading #47 is Wikidata item Q36591." + +Wikidata facts are admitted with =:provenance :wikidata= and cardinality policy =:plural=. They do not override your memex's facts. They sit alongside them. Disagreements are surfaced, not resolved. + +** Empirical Validation — MOMo and Modular Ontology Engineering +:PROPERTIES: +:ID: design-momo +:CREATED: [2026-05-08 Fri] +:END: + +Shimizu and Hitzler (2025, /Journal of Web Semantics/) argue that LLMs can significantly accelerate knowledge graph and ontology engineering — modeling, extension, population, alignment, and entity disambiguation — but /only/ if ontologies are modular. + +*** The central finding: modularity is the key variable + +In a complex ontology alignment task, an LLM without module information detected correct mappings for 5 of 109 alignment rules — effectively useless. When the same LLM was given the module structure of the target ontology (20 named conceptual modules), it detected correct mappings for 104 of 109 rules — 95% accuracy. The variable was modularity. + +For ontology population (extracting triples from text), their best results came from prompts that included a schematic representation of a /single module/ plus one extraction example. Against ground truth, this achieved approximately 90% extraction accuracy. Without module-scoped prompting, quality degraded substantially. + +The mechanism: conceptual modules scope the LLM's attention to something human-sized. The paper's central claim — "by somehow limiting the scope, we achieve a more human-like approach — and one more capable of being expressed succinctly in language" — is an independent discovery of the same principle underlying Passepartout's domain-scoped Screamer checks and per-domain cardinality policies. + +*** What Passepartout should adopt + +*The modular prompt pattern.* The archivist should use module-scoped prompts: a schematic representation of a domain module plus a single extraction example. Instead of a generic "extract triples" prompt, the prompt should reference the relevant module(s) and include an example triple for each relation in that module. The module provides /context/; the example provides /format/. Both improve LLM extraction quality without increasing Screamer's verification burden. + +*MOMo modules as ontology scaffold.* The 50-70 gate-bootstrapped entity classes are starvation for the broader memex. MOMo's micropattern library provides a ready-made scaffold — hundreds of commonsense patterns for temporal relations, spatial relations, agent-action, organizational structure, provenance, and event participation. Loading these as initial modules — with =:policy :plural= and =:provenance :external-ontology= — would give the symbolic index a structured vocabulary for domains where the gate stack has nothing to offer. Organic growth then /extends and refines/ these modules rather than inventing them from scratch. + +*Cross-source validation.* The archivist can extract facts from the user's prose, extract facts from Wikidata for the same entities, and present disagreements with provenance. This is the =:plural= cardinality policy applied at extraction time. + +The paper validates three design decisions already made: (1) modularity is non-negotiable — the difference between 5% and 95% accuracy; (2) the extraction pipeline is feasible — 90% population accuracy with module-scoped prompts means the archivist /can/ extract useful facts, and the remaining 10% hallucination rate is what Screamer catches; (3) knowledge graphs are positioned as anti-hallucination infrastructure — the Passepartout thesis stated in the academic literature. + +References: +- Shimizu, C., & Hitzler, P. (2025). Accelerating knowledge graph and ontology engineering with large language models. /Journal of Web Semantics, 85/, 100862. +- Shimizu, C., Hammar, K., & Hitzler, P. (2023). Modular ontology modeling. /Semantic Web, 14/(3), 459–489. +- Norouzi, S.S. et al. (2024). Ontology Population using LLMs. arXiv:2411.01612. + +* Part VI: Implementation Properties + +** Performance — Why Ontology Growth Doesn't Make the System Slower +:PROPERTIES: +:ID: design-performance +:CREATED: [2026-05-10 Sun] +:END: + +Passepartout's performance thesis is: minimize LLM calls, minimize context tokens, keep everything else local and fast. Knowledge base size is irrelevant to those metrics. This is not an aspiration. It is a structural property. + +The system has two cost domains with fundamentally different scaling: + +| Resource | Cost driver | Scales with | +|---------------+------------------------------------------+------------------------------------------| +| LLM tokens | Context window size, number of API calls | Foveal-peripheral pruning, gate rules | +| Compute | Screamer deduction, hash table lookups | Entity count, rule count per domain | + +LLM tokens are minimized by design — deterministic gates cost 0 tokens, sparse-tree rendering keeps context at 2,000–4,000 tokens regardless of memex size. Adding 5 million Wikidata entities doesn't add a single token to any LLM call. The education is local. Only the brain costs. + +Compute grows linearly with entity count (hash table lookups are O(1), but memory footprint grows). It grows with rule count within a single domain during Screamer consistency checking. But these are microsecond costs on local hardware, not API bills. A Screamer constraint check against a domain with 200 rules costs ~0.3ms. A 100-token guardrail paragraph in a system prompt costs ~$0.00001. The Screamer check is 10,000x cheaper and convergent — it handles the rule once. The guardrail paragraph handles it on every call, forever. + +A 5-million-entity Wikidata load is ~400MB in a hash table. A lifetime personal memex with a decade of diary entries is perhaps 10-20 million triples (~1.5GB). Modern laptops carry 16-64GB. The knowledge base fits in consumer hardware with room for the Lisp runtime, the memory-object store, and the LLM inference engine. + +*One genuine risk — rule generalization width.* If Screamer deduces increasingly broad rules within a single domain, the constraint space could bloat. Mitigation: rules carry a =:domain= tag. Screamer only applies rules from the fact's domain. Rule generalization that crosses domain boundaries is gated — must be human-approved. Rules that prove unused (never triggered a check in N heartbeat cycles) are demoted to =:inactive= and excluded from the active constraint set. + +This is the minimalism argument restated in concrete terms: you buy bigger RAM and a faster CPU once. You don't buy bigger LLM context windows on every call. The education is a capital investment. The brain is an operating expense. The architecture makes the ratio favor capital. + +** The Provenance Chain as Product +:PROPERTIES: +:ID: design-provenance-product +:CREATED: [2026-05-10 Sun] +:END: + +In the coding domain, the value of the symbolic engine is the verified fact: "this command is safe." In the broader memex, the value is the provenance itself: "this claim originated in that diary entry on that date, has been referenced 7 times across 4 different projects, was contradicted in a retrospective 6 months later, and was revised in a note 3 weeks after that." + +The symbolic engine doesn't tell you what is true. It tells you what you wrote, when, where, and how it connects to everything else you wrote — with a verifiable audit trail. It is a memory prosthesis that makes your own mind legible to you. + +Every fact carries: +- =:grounding= — the specific Org heading from which it was extracted +- =:provenance= — who or what produced it (gate-outcome, human-authored, deduced, LLM-proposed) +- =:timestamp= — when it was admitted to the symbolic index +- =:referenced-by= — other facts that depend on or reference this one +- =:contradicted-by= — other facts that disagree with this one (if any) +- =:superseded-by= — if this fact was replaced by a newer version + +These fields make every fact auditable. The =/audit = command renders the full provenance chain as an Org headline tree. The provenance is not a logging feature. It is the product. + +* Part VII: Engineering Infrastructure ** The REPL as Cognitive Substrate :PROPERTIES: @@ -186,24 +713,25 @@ This is the bootstrap. The system begins dependent on human judgment because it :CREATED: [2026-05-07 Wed] :END: -A REPL - Read, Eval, Print, Loop - is an interactive programming environment that reads an expression, evaluates it, prints the result, and loops back to read the next expression. It is the opposite of batch processing: where batch compiles and runs a program in one shot, a REPL works one expression at a time, with each evaluation building on all previous ones. The programmer defines a function, calls it, inspects the result, modifies it, and calls it again. The state accumulates. The session is the program. +A REPL — Read, Eval, Print, Loop — is an interactive programming environment that reads an expression, evaluates it, prints the result, and loops back to read the next expression. It is the opposite of batch processing: where batch compiles and runs a program in one shot, a REPL works one expression at a time, with each evaluation building on all previous ones. The state accumulates. The session is the program. -In Lisp, the REPL is not a debugging tool bolted onto the language - it is the natural mode of interaction. The running image is the environment. When you evaluate =(+ 2 2)=, the result =4= is printed, and you remain in the same image where =+= is defined, where previous definitions persist, where the next expression can reference anything that came before. There is no separation between development and execution. The REPL is not a simulation of the program - it is the program running. +In Lisp, the REPL is not a debugging tool bolted onto the language — it is the natural mode of interaction. The running image is the environment. When you evaluate =(+ 2 2)=, the result =4= is printed, and you remain in the same image where =+= is defined, where previous definitions persist, where the next expression can reference anything that came before. There is no separation between development and execution. The REPL is not a simulation of the program — it is the program running. -Passepartout uses the REPL in this spirit, but elevated: it is not merely a tool for writing code, it is the mechanism by which the agent interacts with its own cognition - a loop that mirrors the perceive-reason-act metabolic cycle at the implementation level. +Passepartout uses the REPL in this spirit, but elevated: it is not merely a tool for writing code, it is the mechanism by which the agent interacts with its own cognition — a loop that mirrors the perceive-reason-act metabolic cycle at the implementation level. In the agent's cognitive architecture, the REPL serves three functions that are difficult or impossible to achieve through batch processing or stateless API calls. -First, the REPL enables verification before commitment. When the agent generates code, it does not write and forget - it evaluates in a running image, observes the result, iterates if incorrect. The feedback loop is tight: the time between writing and seeing the error is measured in milliseconds, not in the round-trip to a language server or a batch compiler. This is the "verification over hallucination" principle from the RLM paper made concrete: the agent tests what it writes before claiming it works. +First, the REPL enables verification before commitment. When the agent generates code, it does not write and forget — it evaluates in a running image, observes the result, iterates if incorrect. The feedback loop is tight: the time between writing and seeing the error is measured in milliseconds, not in the round-trip to a language server or a batch compiler. This is the "verification over hallucination" principle made concrete: the agent tests what it writes before claiming it works. -Second, the REPL enables stateful exploration. The agent can define a variable, inspect it, modify it, redefine it. The exploration accumulates state across interactions. This is not a debugging session - it is the agent thinking with its hands, working through a problem by trying variations and observing outcomes, keeping the successful ones and discarding the failures. +Second, the REPL enables stateful exploration. The agent can define a variable, inspect it, modify it, redefine it. The exploration accumulates state across interactions. This is not a debugging session — it is the agent thinking with its hands, working through a problem by trying variations and observing outcomes, keeping the successful ones and discarding the failures. -Third, the REPL is a shared substrate. When the agent evaluates code, that code runs in the same image as the agent's own cognition. There is no process boundary between the agent and its tools. The REPL is not a subprocess the agent controls - it is a direct interface to the agent's own nervous system. +Third, the REPL is a shared substrate. When the agent evaluates code, that code runs in the same image as the agent's own cognition. There is no process boundary between the agent and its tools. The REPL is not a subprocess the agent controls — it is a direct interface to the agent's own nervous system. This is why the REPL becomes more important as the system matures. In early versions, it is a development tool. In v0.6.0 and beyond, it becomes a cognitive tool: the agent explores hypotheses by evaluating them, verifies the output of sub-agents by inspecting live state, and tests modifications before committing them to the knowledge graph. ** The Cybernetic Loop: Why the Metabolic Pipeline Works :PROPERTIES: +:ID: design-cybernetic-loop :CREATED: [2026-05-07 Wed] :END: @@ -213,7 +741,7 @@ Norbert Wiener defined cybernetics in 1948 as "control and communication in the The Dispatcher gate stack is the negative feedback governor. When the LLM proposes an action that would violate an invariant, the Dispatcher blocks it and feeds the rejection trace back to the LLM for self-correction. This is Ross Ashby's homeostasis — the system maintains its internal stability by correcting deviations from its set point (the safety invariants). Without this negative feedback, the probabilistic engine would drift into hallucinated proposals that become progressively less grounded. The Dispatcher constrains it to the domain of safe, verifiable actions. -The self-editing capability is second-order cybernetics — autopoiesis, the capacity of a system to create and maintain itself. Humberto Maturana and Francisco Varela defined this as the hallmark of living systems. When the agent detects an error, locates the faulty function, generates a corrected version, and hot-reloads it into the running image without restarting, it is modifying its own architecture while continuing to operate. Passepartout achieves this through Lisp's homoiconicity — code is data, and the running image is the environment. The skill engine loads every skill into a jailed Common Lisp package, validates its syntax, tests its trigger function in isolation, and only then promotes it to the live registry. +The self-editing capability is second-order cybernetics — autopoiesis, the capacity of a system to create and maintain itself. Humberto Maturana and Francisco Varela defined this as the hallmark of living systems. When the agent detects an error, locates the faulty function, generates a corrected version, and hot-reloads it into the running image without restarting, it is modifying its own architecture while continuing to operate. Passepartout achieves this through Lisp's homoiconicity — code is data, and the running image is the environment. This framing matters for two reasons. First, it places Passepartout in a lineage that predates and outlasts the current "LLM with tools" paradigm. The cybernetic principles of feedback, homeostasis, and autopoiesis are independent of any specific model architecture. They work whether the perceptual engine is an LLM, a vision model, or a symbolic parser. Second, it explains why the architecture gets more reliable over time — cybernetic systems improve through accumulated negative feedback corrections, not through better training data. Every blocked action is a correction. Every approved exception is a refined set point. The system converges on stability through use. @@ -223,13 +751,13 @@ This framing matters for two reasons. First, it places Passepartout in a lineage :CREATED: [2026-05-07 Wed] :END: -When a human asks why the system made a decision, the answer must be findable. In most AI systems, the reasoning is ephemeral - it exists in the model's activations and disappears when the session ends. In Passepartout, every significant cognitive event is written to an Org buffer as it happens. +When a human asks why the system made a decision, the answer must be findable. In most AI systems, the reasoning is ephemeral — it exists in the model's activations and disappears when the session ends. In Passepartout, every significant cognitive event is written to an Org buffer as it happens. The thought trace is the agent's journal, written in parallel with its reasoning. When the probabilistic engine generates a proposal, the trace records the input, the prompt, and the raw output. When the deterministic engine evaluates it, the trace records which rules were checked, which passed, which failed, and why. When an action is executed, the trace records the timestamp, the user who approved it (if human-in-the-loop), and the outcome. This is not logging in the traditional sense. Logs are forensically useful but are written in a machine format optimized for storage, not for human reading. The thought trace is written in Org-mode: headlines for major events, property drawers for structured data, tags for categorization. The human can open the trace in a text editor and navigate it like any other Org file. They can search for a specific decision, filter by time range, find all actions blocked by a specific rule, or see the complete trajectory of a multi-step task. -The trace becomes the foundation for the Dispatcher's learning. Every blocked action is in the trace. Every approved exception is in the trace. The human-in-the-loop decisions are in the trace. The system does not need to reconstruct what happened - it reads what happened from the trace it wrote. +The trace becomes the foundation for the Dispatcher's learning. Every blocked action is in the trace. Every approved exception is in the trace. The human-in-the-loop decisions are in the trace. The system does not need to reconstruct what happened — it reads what happened from the trace it wrote. Without observability, the system is a black box that happens to produce correct outputs sometimes. With observability, the system is auditable. The human can see why a decision was made, identify where the reasoning failed, and course-correct the system or its own behavior accordingly. @@ -243,9 +771,9 @@ The decision to use Org-mode as the source of truth for code, not just documenta The traditional development workflow is: write code, write comments, commit. The literate programming workflow is: write prose, write code, commit the Org. The order matters. The prose must come first not because of style guidelines but because the act of explaining what a function does before writing it forces clarity of thought that editing code directly does not. -When you must write a paragraph describing what a function does before you write the function, you discover the cases you have not considered. You find the edge conditions that are ambiguous. You realize that the function's name does not match its behavior, or that its behavior does not match your intent. The friction is not a bug - it is the mechanism by which thinking is enforced. +When you must write a paragraph describing what a function does before you write the function, you discover the cases you have not considered. You find the edge conditions that are ambiguous. You realize that the function's name does not match its behavior, or that its behavior does not match your intent. The friction is not a bug — it is the mechanism by which thinking is enforced. -The one-function-per-block rule enforces granularity. A function that cannot be explained in a paragraph is a function that is doing too much. The block boundary is not aesthetic - it is architectural. It prevents the drift toward monolithic functions that accumulate responsibilities over time and become untestable, unmaintainable, and incomprehensible. +The one-function-per-block rule enforces granularity. A function that cannot be explained in a paragraph is a function that is doing too much. The block boundary is not aesthetic — it is architectural. It prevents the drift toward monolithic functions that accumulate responsibilities over time and become untestable, unmaintainable, and incomprehensible. The tangle step enforces source-of-truth discipline. The .lisp file is generated from the Org file. This means the Org file cannot drift from the implementation. If the implementation changes, the Org must be updated to match. If the Org describes behavior that the implementation does not perform, the tangle produces code that does not match the Org description. Either way, inconsistency is visible and recoverable. @@ -267,9 +795,9 @@ The industry standard for coding agents is SWE-bench: a corpus of GitHub issues Passepartout implements a native Lisp harness for this. A background thread clones repositories, feeds issues into the cognitive loop, tracks the resolution trajectory as an Org-mode headline tree, and scores success by test outcomes. The trajectory is persisted: when a resolution fails, the system can inspect where in the chain the reasoning broke down. The headline tree records the agent's thoughts at each step, making the failure auditable and the debugging human-assisted. -Beyond SWE-bench, the harness includes chaos testing. The system is subjected to resource starvation, concurrent load, and adversarial input. The deterministic engine must maintain safety invariants under pressure. The symbolic verifier must not deadlock or livelock. The probabilistic engine must degrade gracefully - if tokens are limited, it must still produce valid proposals that the deterministic engine can evaluate. Failure under chaos is a design flaw, not a benchmark anomaly. +Beyond SWE-bench, the harness includes chaos testing. The system is subjected to resource starvation, concurrent load, and adversarial input. The deterministic engine must maintain safety invariants under pressure. The symbolic verifier must not deadlock or livelock. The probabilistic engine must degrade gracefully. -The harness also supports regression testing on the skill set. Every skill is tested against a suite of known inputs and expected outputs. When a modification is proposed to any skill - whether through manual editing or the agent's own self-modification - the test suite runs first. A skill that fails its tests is rejected before it can propagate to the running image. This is not a convenience - it is the mechanism by which self-modification remains safe. The agent can propose changes, but the harness verifies them before the changes take effect. +The harness also supports regression testing on the skill set. Every skill is tested against a suite of known inputs and expected outputs. When a modification is proposed to any skill — whether through manual editing or the agent's own self-modification — the test suite runs first. A skill that fails its tests is rejected before it can propagate to the running image. This is not a convenience — it is the mechanism by which self-modification remains safe. The agent can propose changes, but the harness verifies them before the changes take effect. ** The MCP Strategy :PROPERTIES: @@ -279,13 +807,11 @@ The harness also supports regression testing on the skill set. Every skill is te The Model Context Protocol (MCP) is a standard for connecting AI systems to external tools and data sources. It defines how a client requests tools from a server, how the server exposes its capabilities, and how the client invokes them. The ecosystem is growing: MCP servers exist for GitHub, Slack, Postgres, filesystem access, and much more. -Passepartout connects to this ecosystem, but not by becoming a Node.js runtime. The architecture is: external MCP servers communicate via stdio or SSE to a Lisp-native MCP client that runs in the same image as the agent. The client is pure Common Lisp - it parses the JSON-RPC messages, invokes the tools, and presents results to the agent as Lisp data structures. There is no serialization overhead between the agent and the MCP layer, no process boundary, no impedance mismatch. +Passepartout connects to this ecosystem, but not by becoming a Node.js runtime. The architecture is: external MCP servers communicate via stdio or SSE to a Lisp-native MCP client that runs in the same image as the agent. The client is pure Common Lisp — it parses the JSON-RPC messages, invokes the tools, and presents results to the agent as Lisp data structures. There is no serialization overhead between the agent and the MCP layer, no process boundary, no impedance mismatch. -When the agent calls a tool via MCP, it receives a plist with the tool name, arguments, and result. The result is immediately usable by the agent's symbolic engine. When the agent generates a file, it can be written to the filesystem through an MCP filesystem server. When the agent needs to send a message, it can use an MCP Slack server. The agent does not need to know that these are MCP interactions - it sees only the plists that flow through its cognitive architecture. +When the agent calls a tool via MCP, it receives a plist with the tool name, arguments, and result. The result is immediately usable by the agent's symbolic engine. When the agent generates a file, it can be written to the filesystem through an MCP filesystem server. When the agent needs to send a message, it can use an MCP Slack server. The agent does not need to know that these are MCP interactions — it sees only the plists that flow through its cognitive architecture. -The alternative is to build MCP wrappers in Python or TypeScript and bridge to Lisp via subprocess. This is what OpenClaw does: a Node.js runtime that manages MCP servers, with a bridge to the Lisp process. The bridge introduces latency, serialization costs, and a maintenance burden. The Node.js process must be kept running. The bridge must be maintained across Lisp and JavaScript runtimes. The cognitive architecture must handle errors that cross the process boundary. - -Passepartout's native client is smaller, faster, and more maintainable. The MCP client is a skill, not a core component. It can be reloaded, replaced, or removed without restarting the agent. The agent can add new MCP tool integrations by loading new skills, not by deploying new infrastructure. +The alternative is to build MCP wrappers in Python or TypeScript and bridge to Lisp via subprocess. This introduces latency, serialization costs, and a maintenance burden. Passepartout's native client is smaller, faster, and more maintainable. The MCP client is a skill, not a core component. It can be reloaded, replaced, or removed without restarting the agent. ** Local-First Architecture :PROPERTIES: @@ -293,27 +819,27 @@ Passepartout's native client is smaller, faster, and more maintainable. The MCP :CREATED: [2026-05-07 Wed] :END: -Passepartout is designed to run on the user's machine, on their hardware, with their data, without requiring an internet connection. This is not a deployment option - it is an architectural commitment. The system must be able to reason, plan, and act using only the resources available locally. +Passepartout is designed to run on the user's machine, on their hardware, with their data, without requiring an internet connection. This is not a deployment option — it is an architectural commitment. The system must be able to reason, plan, and act using only the resources available locally. The motivation is not merely philosophical. Cloud-based AI agents are economically incentivized to collect data, to train on user interactions, and to build lock-in through proprietary formats and network effects. When the agent runs locally, the user owns the hardware, owns the data, and can terminate the process without asking permission. There is no vendor that can change terms, no service that can go offline, no model that can be updated without consent. Technically, local-first means several things. The LLM must be able to run on local hardware. Passepartout supports Ollama as a provider, which runs quantized models on CPU and GPU without requiring an external API. The vector database must be local. Passepartout uses its own org-object store, which is a folder of Org files that the agent already owns. There is no ChromaDB or Qdrant to install, no cloud vector service to authenticate with. -The symbolic engine does not require a network connection. The Prolog/Datalog reasoner that in v3.0.0 verifies neural proposals runs entirely in the Lisp image. The Dispatcher's rule synthesis does not call an external service. The agent can operate in a disconnected environment indefinitely, resuming full capability when connectivity is restored. +The symbolic engine does not require a network connection. The Prolog/Datalog reasoner that verifies neural proposals runs entirely in the Lisp image. The Dispatcher's rule synthesis does not call an external service. The agent can operate in a disconnected environment indefinitely, resuming full capability when connectivity is restored. This does not mean Passepartout refuses to use cloud services when available and appropriate. It means cloud services are optional enhancements, not architectural requirements. The core is local. The user can choose to add cloud LLM providers for more capable inference, but the system functions without them. -*On live images and binaries.* Passepartout's primary delivery path is source code running in a live SBCL process. The REPL is available. Skills hot-reload. The cognitive loop runs in an image that is mutable, inspectable, and homeiconic — the user can connect with SLIME, trace functions, inspect memory objects, and modify the system while it runs. A ~save-lisp-and-die~ binary is provided as a convenience for platforms where SBCL cannot be installed (corporate laptops, shared hosts). The binary is the same image saved to disk with Swank pre-loaded — it is not a sealed container. The REPL works. Skills hot-reload. The binary is a packaging format, not an architectural decision. The system is constitutionally open in both delivery paths. +*On live images and binaries.* Passepartout's primary delivery path is source code running in a live SBCL process. The REPL is available. Skills hot-reload. The cognitive loop runs in an image that is mutable, inspectable, and homeiconic — the user can connect with SLIME, trace functions, inspect memory objects, and modify the system while it runs. A =save-lisp-and-die= binary is provided as a convenience for platforms where SBCL cannot be installed. The binary is the same image saved to disk with Swank pre-loaded — it is not a sealed container. The REPL works. Skills hot-reload. The binary is a packaging format, not an architectural decision. -* Token Economics and Performance Advantage +** Token Economics and Performance Advantage :PROPERTIES: :ID: design-token-economics :CREATED: [2026-05-07 Wed] :END: -This section analyzes how Passepartout's architectural decisions translate into token usage, latency, and cost versus competing agent designs. It makes one empirical claim (deterministic gates cost 0 LLM tokens — provable) and several structural claims (downward cost curve, tiered pricing, REPL economics — testable). It does not claim specific cost multiples pending empirical audit at v0.5.0. +This section analyzes how Passepartout's architectural decisions translate into token usage, latency, and cost versus competing agent designs. -** The Core Insight: LLM as Expensive Resource, Not Default Engine +*** The Core Insight: LLM as Expensive Resource, Not Default Engine Passepartout treats the LLM as a resource to be minimized. Every operation is designed to reduce LLM dependency. Competitors treat the LLM as the core engine through which all operations flow. This is not a difference of degree but of architecture. @@ -325,23 +851,23 @@ The structural multipliers are: 3. *REPL verification* — code is tested in the running image before it is committed. Errors surface in milliseconds at 0 LLM tokens. Competitors discover errors after generation and pay 500–2,000 tokens per correction round-trip. The REPL eliminates the most expensive kind of LLM call: the one that produced wrong code and needs a do-over. -4. *Hot state* — in a REPL-based agent, variables, file handles, sub-routine results, and memory objects are already in memory. Every turn in a standard chat agent re-sends the full conversation history. Token costs in chat agents are quadratic: a 10-turn session pays for ~55 "turns" of context (10 + 9 + 8 + ... + 1 = 55). In Passepartout, context is stored once in the Lisp image. A 10-turn session pays for ~10 turns of context. This is an ~82% reduction on protocol overhead alone, before any foveal-peripheral pruning. This argument is testable: send the same multi-turn session through both architectures and count tokens. +4. *Hot state* — in a REPL-based agent, variables, file handles, sub-routine results, and memory objects are already in memory. Every turn in a standard chat agent re-sends the full conversation history. Token costs in chat agents are quadratic: a 10-turn session pays for ~55 "turns" of context (10 + 9 + 8 + ... + 1 = 55). In Passepartout, context is stored once in the Lisp image. A 10-turn session pays for ~10 turns of context. This is an ~82% reduction on protocol overhead alone, before any foveal-peripheral pruning. -5. *Temporal filtering* — time-scoped memory queries (what happened today? what's due in the next hour?) return only nodes matching the time window. The temporal filter is a pure-Lisp hash-table walk with a numeric comparison on ~memory-object-version~. Sub-millisecond. 0 LLM tokens. Competitors without time-indexed memory must serialize all nodes and let the LLM scan for temporal relevance — 5,000–50,000 tokens per temporal query. This is the same principle as the foveal-peripheral model applied to the time dimension. +5. *Temporal filtering* — time-scoped memory queries return only nodes matching the time window. The temporal filter is a pure-Lisp hash-table walk with a numeric comparison on =memory-object-version=. Sub-millisecond. 0 LLM tokens. Competitors without time-indexed memory must serialize all nodes and let the LLM scan for temporal relevance — 5,000–50,000 tokens per temporal query. -** The Compounding Cost Curve — Unique Among Agents +*** The Compounding Cost Curve — Unique Among Agents Every AI agent grows more expensive over time. Context histories accumulate. Safety instructions grow more elaborate. Guardrails become longer prompt paragraphs. The user's data grows. The only way to reduce cost in a standard agent is to cap context — sacrificing capability. Passepartout has a downward cost curve. Four mechanisms compound: -1. *Dispatcher learning (v0.3.0).* Every blocked action and approved exception becomes a deterministic rule. A file write that initially triggered a full LLM proposal → Dispatcher review → HITL approval → rule extraction loop eventually becomes a deterministic rule check. Each hardened rule permanently removes a future LLM call. +1. *Dispatcher learning.* Every blocked action and approved exception becomes a deterministic rule. A file write that initially triggered a full LLM proposal → Dispatcher review → HITL approval → rule extraction loop eventually becomes a deterministic rule check. Each hardened rule permanently removes a future LLM call. -2. *Symbolic induction (v0.5.0).* The agent extracts patterns from successful interaction sequences and converts them into reusable Lisp functions. A multi-step task that took 5,000 tokens today takes 0 tokens tomorrow — it's now a ~defun~. The Dispatcher learns what to block. Symbolic induction learns what to automate. +2. *Symbolic induction.* The agent extracts patterns from successful interaction sequences and converts them into reusable Lisp functions. A multi-step task that took 5,000 tokens today takes 0 tokens tomorrow — it's now a =defun=. The Dispatcher learns what to block. Symbolic induction learns what to automate. -3. *Native embedding inference (v0.4.0).* Every semantic search query runs against in-image vectors at 0 external tokens. Competitors use LLM-assisted search for most retrieval operations. Passepartout's retrieval is a vector cosine similarity check — pure math, no model call. +3. *Native embedding inference.* Every semantic search query runs against in-image vectors at 0 external tokens. Competitors use LLM-assisted search for most retrieval operations. Passepartout's retrieval is a vector cosine similarity check — pure math, no model call. -4. *Prefix caching (v0.4.0).* The static portion of the system prompt (IDENTITY, TOOLS, LOGS format) is transmitted once per session. Dynamic content (CONTEXT, user prompt) is sent on each call. Anthropic's prompt caching gives a 90% discount on cached tokens. OpenAI caches automatically. +4. *Prefix caching.* The static portion of the system prompt (IDENTITY, TOOLS, LOGS format) is transmitted once per session. Dynamic content (CONTEXT, user prompt) is sent on each call. Anthropic's prompt caching gives a 90% discount on cached tokens. OpenAI caches automatically. After 12 months of daily use, Passepartout's per-session costs are expected to be 40–60% of baseline, while competitors' costs rise to 125–140% of baseline. The crossover point is estimated at 3–6 months. This is not a model quality claim — it is a structural property of the architecture. @@ -353,125 +879,116 @@ After 12 months of daily use, Passepartout's per-session costs are expected to b Passepartout's architecture provides three layers of time awareness, each enabled by infrastructure that competitors lack: -*Level 1 — Present Awareness.* The LLM knows the current time, date, and session duration because a single ~format-time-for-llm~ call injects it into the system prompt. Most agents know the date from the OS. None know the time or session duration. The cost is ~8 incremental tokens per call (trivially prefix-cached). The saving is eliminating "I don't know the current time" preamble tokens, time-check tool calls, and incorrect temporal reasoning from a model guessing the time. +*Level 1 — Present Awareness.* The LLM knows the current time, date, and session duration because a single =format-time-for-llm= call injects it into the system prompt. Most agents know the date from the OS. None know the time or session duration. The cost is ~8 incremental tokens per call (trivially prefix-cached). The saving is eliminating "I don't know the current time" preamble tokens, time-check tool calls, and incorrect temporal reasoning from a model guessing the time. -*Level 2 — Temporal Memory.* Memory queries accept ~:since~ and ~:until~ parameters. "What did I work on in the last hour?" filters 500 nodes to 12 in sub-millisecond Lisp rather than serializing 500 nodes to the LLM at ~5,000 tokens for it to scan. Every memory node carries a ~memory-object-version~ timestamp (a monotonic ~get-universal-time~ value set at ingest since v0.1.0). The temporal filter is a hash-table walk with numeric comparison. 0 LLM tokens. >90% token reduction on time-scoped queries. +*Level 2 — Temporal Memory.* Memory queries accept =:since= and =:until= parameters. "What did I work on in the last hour?" filters 500 nodes to 12 in sub-millisecond Lisp rather than serializing 500 nodes to the LLM at ~5,000 tokens for it to scan. Every memory node carries a =memory-object-version= timestamp (a monotonic =get-universal-time= value set at ingest since v0.1.0). The temporal filter is a hash-table walk with numeric comparison. 0 LLM tokens. >90% token reduction on time-scoped queries. -*Level 3 — Proactive Triggers.* The heartbeat tick (existing infrastructure since v0.3.0) scans for approaching deadlines every 60 seconds. When a deadline is within the warning window (~DEADLINE_WARNING_MINUTES~, default 60), a temporal context note is injected into the awareness assembly. The LLM sees "3 deadlines today: Submit report (45min)" in its context without a triggering call. A "what should I work on today?" query is answered from pre-loaded context — 0 LLM tokens versus 1,500–4,000 for an unassisted agent. +*Level 3 — Proactive Triggers.* The heartbeat tick scans for approaching deadlines every 60 seconds. When a deadline is within the warning window (=DEADLINE_WARNING_MINUTES=, default 60), a temporal context note is injected into the awareness assembly. The LLM sees "3 deadlines today: Submit report (45min)" in its context without a triggering call. A "what should I work on today?" query is answered from pre-loaded context — 0 LLM tokens versus 1,500–4,000 for an unassisted agent. None of these three layers require new infrastructure. Time awareness is not a feature Passepartout builds — it is a feature Passepartout *unlocks* by having timestamped memory (v0.1.0), heartbeat+cron (v0.3.0), and the foveal-peripheral context pruning model (v0.2.0) already in place. Adding time awareness costs ~175 lines of Lisp. Building it in competitors would require building the heartbeat, the time-indexed memory, and the proactive context injection — 800+ lines each — and would still cost LLM tokens because their safety verification is prompt-based. -The structural principle generalizes: Passepartout's infrastructure investments compound. Each new subsystem (Merkle memory, heartbeat, skill engine, embedding pipeline) lowers the cost of the next feature. Time awareness is the first demonstration of this compounding — three layers unlocked by infrastructure already built for other purposes. +* Part VIII: Validation -** Tiered Pricing: Cheap Models for Simple Tasks, Free for Learned Patterns +** Philosophical Validation — The Neurosymbolic Consensus +:PROPERTIES: +:ID: design-validation +:CREATED: [2026-05-10 Sun] +:END: -The model-tier router (v0.3.0) classifies every task by complexity and routes it to the cheapest capable model. Simple lookups go to tiny local models or deterministic hash table scans (0 LLM tokens). Text processing goes to mid-tier models. Complex planning and code generation go to the premium model. The consensus loop (v0.10.0) only fires for high-impact actions. +Three papers from the neurosymbolic AI research community validate the architectural thesis from complementary angles. -The induced functions from symbolic induction (v0.5.0) compound this: every learned pattern that becomes a Lisp function moves from "cheap" to "free." Over time, an increasing fraction of the agent's daily operations cost 0 LLM tokens. +*** Marcus (2020): The Case Against Pure Deep Learning -** Version-by-Version Cost Trajectory +Gary Marcus's "The Next Decade in AI" argues that deep learning alone is "data hungry, shallow, brittle, and limited in its ability to generalize." The paper demonstrates GPT-2 failing at basic commonsense reasoning: -The following projections assume a coding session equivalent to ~20 files, 10 actions, and 3 errors, using the cheapest capable cloud provider. They are architectural estimates pending empirical audit at v0.5.0. +- "Yesterday I dropped my clothes off at the dry cleaners and have yet to pick them up. Where are my clothes?" → GPT-2: "at my mom's house." +- "There are six frogs on a log. Two leave, but three join. The number of frogs on the log is now" → GPT-2: "seventeen." -| Version | Cost relative to Claude Code | Why | -|---------+-----------------------------+-----| -| v0.4.0 (with prefix caching) | 1.5–2x cheaper | Sparse retrieval + caching; no tools yet, tasks are simple | -| v0.5.0 (with symbolic induction) | 1.5–2x cheaper, declining over time | Induced functions begin replacing LLM calls for repeated patterns | -| v0.7.0 (with MCP tools) | 2–3x cheaper | More complex tasks, but caching + induction compound | -| v1.0.0 (all pre-symbolic features) | 2–3x cheaper for coding, 10–40x for knowledge management | Full stack: sparse trees + caching + induction + native embeddings | -| v3.0.0 (neurosymbolic) | 5–10x cheaper | 80% of reasoning in symbolic middle layer costs 0 LLM tokens | -| v4.0.0 (native inference) | ~100% cheaper for local models | No API call. No per-token pricing. Electricity only. | +Marcus proposes four steps toward robust AI: hybrid architecture (combining neural and symbolic), large-scale knowledge (abstract and causal, not just statistical), reasoning (formal inference over structured representations), and cognitive models (frameworks for how entities relate). Passepartout implements all four: the perceive-reason-act pipeline is hybrid, the symbolic index is causal knowledge, Screamer + ACL2 provide reasoning, and the gate-bootstrapped ontology plus MOMo modules provide cognitive models. -Knowledge management is Passepartout's strongest domain. A 500-node knowledge base assembled for the LLM as 2,000–4,000 tokens (foveal-peripheral) versus 80,000–150,000 tokens (full serialization) is a 40–75x difference in context alone. Semantic search in-image at 0 tokens versus LLM-assisted search at 5,000+ tokens extends the gap. Note creation via deterministic Org writes at 0 tokens versus LLM-generated notes at 800+ tokens each widens it further. Background maintenance (archiving, link repair, compaction) runs on heartbeat-driven cron jobs at 0 LLM tokens. +Marcus's core claim — "we have no hope of achieving robust intelligence without first developing systems with deep understanding" — is the justification for Passepartout's entire neurosymbolic investment. The alternative is a system that works "on a good day" and fails unpredictably. The deterministic gate stack and Screamer admission gate are the engineering realization of Marcus's call for robustness. -** Engineering Challenges and Solutions +Reference: Marcus, G. (2020). The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence. arXiv:2002.06177. -The architecture's advantages are genuine but unevenly distributed across task types. Three structural challenges have specific engineering solutions in the roadmap. +*** Gaur & Sheth (2023): CREST — Trustworthy Neurosymbolic AI -*** Challenge: Situational Cost +Gaur and Sheth present the CREST framework: Consistency, Reliability, user-level Explainability, and Safety build Trust — and they argue these require neurosymbolic methods. Their empirical finding: GPT-3.5 breached safety constraints 30% of the time when asked identical questions repeatedly. Claude's 16 safety rules and Sparrow's 23 rules provide no /inherent/ safety — they are heuristic guardrails that can be breached through prompt variation. -The sparse-tree and REPL advantages apply primarily to long-running, high-context tasks. A single-turn lookup ("what's on my calendar?") without a cost-conscious routing layer may consume comparable tokens to standard RAG. The architecture must prevent the agent from spending $5 of compute on a $0.01 question. +These findings validate three Passepartout design commitments: (1) prompt-level safety is insufficient — deterministic gates run in pure Lisp, cost 0 tokens, and cannot be evaded by prompt engineering; (2) inconsistency is the norm — the cardinality model expects contradiction and surfaces it with provenance; (3) knowledge infusion is required for trust — Passepartout's symbolic index IS the knowledge infusion layer, facts extracted from prose, verified by Screamer, and available for any LLM call. -*Solution:* The Resolution Budget (v0.5.0) is a lightweight pre-routing layer that classifies complexity before the Reason stage and assigns a cost envelope. Simple lookups take the fast path (deterministic, 0 LLM tokens, sub-second). Standard interactions use cached context and tiered models. Deep reasoning engages the full deliberative pipeline. The tier classifier (v0.8.1) adds safety-based routing: dangerous operations always take the full verification path regardless of cost. Together, cheap simple tasks take the cheap fast path; dangerous complex tasks take the expensive safe path. +Reference: Gaur, M., & Sheth, A. (2023). Building Trustworthy NeuroSymbolic AI Systems: Consistency, Reliability, Explainability, and Safety. arXiv:2312.06798. -*** Challenge: Single-Turn Latency +*** Sheth et al. (2022): Knowledge-Infused Learning -The Dispatcher gate stack, structured output enforcement, and verification loop add latency to every turn. Time-to-first-token is inherently higher than a raw chat agent that processes the first response directly. The goal is not to match raw chat-agent TTFT on every interaction — it is to make the verification overhead imperceptible for trivial tasks and worth the wait for complex ones. +Sheth, Gunaratna, Bhatt, and Gaur define Knowledge-infused Learning (KiL) as "combining various types of explicit knowledge with data-driven deep learning techniques." They identify three infusion levels (shallow, semi-deep, deep) and position KiL as "a sweet spot in neuro-symbolic AI." -*Solution:* Three mechanisms compound. The Resolution Budget (v0.5.0) routes simple lookups through a fast path with minimal gate checks. Streaming responses (v0.6.3) hide latency by showing progressive output — the user sees the agent typing while verification runs. Interrupt-and-redirect (v0.6.3) lets the user kill a wrong response mid-generation and redirect the agent without waiting for a complete wrong answer. The self-configuring setup binary (v0.5.0) includes a tiny Syntax Scout model — a 1.5B parameter model fine-tuned on Common Lisp + Org-mode idioms that pre-validates Lisp forms before the Dispatcher, reducing rejection-loop cycles. +Passepartout's architecture is a specific implementation of KiL at the deepest infusion level: knowledge is not appended to prompts (shallow) or embedded in fine-tuning (semi-deep). It is a first-class data structure — the symbolic index — that the LLM queries through the archivist and the planner. The knowledge is living: it accumulates, is verified, carries provenance, and evolves through ontology versioning. -*** Challenge: Symbolic Brittleness +Reference: Gaur, M., Gunaratna, K., Bhatt, S., & Sheth, A. (2022). Knowledge-Infused Learning: A Sweet Spot in Neuro-Symbolic AI. /IEEE Internet Computing, 26/(4), 5–11. -Deterministic gates reject code with minor syntax errors that a prompt-based guardrail would pass. A 99% correct Lisp form with one mismatched parenthesis is blocked entirely during the ~read-from-string~ stage or by the syntax validation gate. This is the correct safety posture — but without mitigation, the user experience is "the agent keeps failing to do simple things because of formatting errors." +** The Competitive Argument +:PROPERTIES: +:ID: design-competitive +:CREATED: [2026-05-10 Sun] +:END: -*Solution:* Three mechanisms compound. Structured Output Enforcement (v0.6.2) validates plist syntax before the Dispatcher, providing LLM feedback with the specific parse error. The Syntax Scout — the tiny model from the setup bootstrapper — pre-validates Lisp forms during the Reason stage and auto-corrects common patterns (parenthesis balance, keyword normalization). The self-correction loop (up to 3 retries with rejection trace feedback at the Reason stage) gives the LLM multiple attempts. Together, these mechanisms drop the failure rate from "every syntax error blocks" to "the LLM learns to produce valid Lisp after the first rejection, and the Syntax Scout catches the patterns that the LLM repeatedly misses." +No competitor has this problem because no competitor has a symbolic engine. The 55 systems surveyed in the competitive landscape range from pure chat agents (Claude, ChatGPT) to agent harnesses (Claude Code, OpenCode, Hermes) to platform agents (OpenClaw). None of them encode knowledge as formal facts with provenance. None of them verify extractions against an existing knowledge base. None of them can prove properties about their own rulesets. -** Local LLM Viability +Their safety is heuristic (prompt-based guardrails that consume LLM tokens and can be evaded with clever phrasing). Their memory is flat (JSONL transcripts without content-addressed identity or provenance chains). Their reasoning is entirely neural — when you ask "why did you decide that?", the answer is a regenerated LLM explanation, not a retrieved inference chain. -Reduced context requirements change which model sizes deliver acceptable performance: +Passepartout's architectural bet is that this problem is worth solving — that a system which can surface contradictions with provenance, derive new facts from observations, and verify claims against a provenanced knowledge graph is fundamentally different from a system that can only call an LLM and hope the response is correct. -| Model | Passepartout Viability | Competitor Viability | -|--------------------------+-----------------------------+----------------------| -| Phi-3-mini 3.8B (4K ctx) | Viable for structured tasks | Context starvation | -| Llama 3.1 8B (8K ctx) | Comfortable daily driver | Marginal | -| Qwen 2.5 7B (4K ctx) | Viable for most tasks | Not viable | -| Mistral 7B (8K ctx) | Comfortable | Marginal | -| Llama 3.1 70B (128K ctx) | Overkill (but works) | Comfortable | +The cost is the ontological work that is genuinely difficult. The reward is a system that cannot hallucinate at the reasoning level, whose memory is provable rather than empirical, and whose knowledge accumulates across sessions through deduction rather than through LLM re-prompting. For a life's knowledge stored in a personal memex, this is not a performance advantage. It is a category difference. -KV cache memory scales with context length: +The competitive advantage is not any single feature. It is the architecture's ability to accumulate verified knowledge from four independent sources (gates, deduction, verified LLM proposals, human authoring) and to make that knowledge queryable with provenance. Competitors accumulate chat transcripts. Passepartout accumulates a provenanced, self-verifying knowledge graph. Transcripts become stale and unreliable. The knowledge graph becomes richer and more trustworthy with every session. -| Context Window | KV Cache (Llama 3.1 8B, FP16) | -|----------------+-------------------------------| -| 4K tokens | ~67 MB | -| 32K tokens | ~540 MB | -| 128K tokens | ~2.1 GB | +* Part IX: Open Questions -Passepartout at 4K effective context: ~67 MB KV cache. Competitor at 128K: ~2.1 GB. A 7-8B model on an RTX 3060 Ti (8 GB VRAM) or MacBook (16 GB unified memory) is a practical daily driver with Passepartout. Competitors at full context require 16-32 GB VRAM or cloud APIs. +** Open Questions +:PROPERTIES: +:ID: design-open-questions +:CREATED: [2026-05-08 Fri] +:END: -** Comparison Summary +Several design questions are unresolved and should remain unresolved at this stage. They represent research decisions that require experience running the system. -| Metric | Passepartout | Claude Code | Hermes | OpenClaw | -|-----------------------------+---------------------+-------------------------+------------------------------+-----------------------| -| Active context (tokens) | 2,000-4,000 | 10,000-50,000+ | 5,000-15,000/agent | 10,000-40,000 | -| File access cost (per file) | 200-800 tok | 1,500-5,000 tok | 1,500-5,000 tok × agents | 1,500-5,000 tok | -| Safety verification cost | 0 (deterministic) | 200-500 tok/action | 200-500 tok/action × agents | 100-300 tok/action | -| Agent coordination cost | 0 | 0 | 1,000-3,000 tok/task | 500-2,000 tok/task | -| Error recovery cost | 0 (REPL) | 500-2,000 tok/retry | 500-2,000 tok/retry × agents | 500-2,000 tok/retry | -| Long-term cost trend | Decreasing | Increasing | Increasing | Flat/Increasing | -| Min viable local model | 3-4B params, 4K ctx | 30-70B params, 32K+ ctx | 30-70B params, 32K+ ctx | 7-13B params, 8K+ ctx | -| Min VRAM for local | 4-6 GB | 16-32 GB | 24-48 GB | 8-16 GB | +*** What is the minimum viable fact language? -*Note:* Observations about OpenClaw and Hermes Agent are based on their public documentation and repositories as of 2026-05. OpenClaw (github.com/openclaw/openclaw) is a TypeScript personal AI assistant by @steipete with a Node.js gateway, 25+ messaging channels, and Canvas/voice companion apps. Hermes Agent (github.com/NousResearch/hermes-agent) is a Python fork by Nous Research with a built-in learning loop, full TUI, and sub-agent delegation. Both use prompt-based safety guardrails rather than deterministic gates. Architectural claims should be re-verified as these projects evolve. +Triples — =(:entity :relation :value)= with provenance and grounding — is the current hypothesis. It is simple enough to be parseable, expressive enough to capture the gate stack's implicit claims, and extensible enough that Screamer can operate on it. But it may be too simple. Triples do not naturally express temporal relations ("was X before Y?"), modal claims ("should not do X unless Y"), or counterfactuals — all of which may be essential for a symbolically-aided memex. The right granularity depends on what queries actually need to be made, and that cannot be known in advance. -*Conclusion:* Passepartout's architecture has a structural downward cost curve — a property that no competitor claims. The Dispatcher learning curve, symbolic induction, native embedding inference, and prefix caching compound to reduce LLM dependency over time. The cost advantage is not a magnitude claim (which depends on usage patterns and model selection) but a directional claim (costs decline with use, competitors' costs rise). The 80% of computation that moves to the symbolic middle layer at v3.0.0 (zero LLM tokens) and the 100% local-inference capability at v4.0.0 (zero API cost) define the long-term ceiling: eventually, the only LLM cost is input translation and output formatting. Everything else is pure Lisp. +*** How does ontology refactoring work? -The critical risk is implementation: achieving the retrieval precision, Dispatcher learning depth, REPL integration, and symbolic engine maturity required to realize the architecture's economic potential. The token audit harness at v0.5.0 will provide the first empirical measurements. +This question is settled. See "Ontology Versioning" above. The category hierarchy is Merkle-hashed. Every fact stores its =:ontology-version=. Re-verification is heartbeat-driven. Worldviews are preserved, not overwritten. The shift is the artifact. -*Note:* The token savings projections in this section (2–3x for coding, 13–24x for knowledge management) are architectural estimates based on the sparse-tree retrieval and deterministic safety mechanisms. They have not yet been empirically verified. A token audit harness will produce measured comparisons at v0.5.0 (Token Economics & Prompt Efficiency). Until then, the README cites the mechanisms (sparse-tree rendering, deterministic gates) rather than specific magnitudes. -* Open Questions and Risks +*** What is the appropriate role of the human? -1. *Retrieval accuracy is the bottleneck.* If sparse tree retrieval loads the wrong subtree (low-similarity but causally relevant), the LLM makes unfixable errors. The architecture assumes embedding quality is "good enough" — this is untested at scale. +The human can explicitly declare facts, write constraints, and correct wrong extractions. But how much of the ontology should the human need to maintain? If the human must write a definition for every new category the symbolic engine encounters, the overhead is prohibitive. If the symbolic engine can generalize from instances, the human role becomes supervision rather than authorship — review and approve proposed generalizations. The balance cannot be set without experience. -2. *System prompt overhead can consume savings.* Every =think= cycle builds the full system prompt from IDENTITY + TOOLS + CONTEXT + LOGS. With the foveal-peripheral context model growing over time and the tool belt expanding with skills, the fixed overhead is non-trivial. However, it is driven by context and tool descriptions, not by the ~*standing-mandates*~ list (which contributes ~40 tokens when a single mandate fires, and 0 otherwise). Prefix caching (v0.5.0) is the primary mitigation for this overhead. +*** How much Wikidata is the right amount? -3. *Model size vs context quality.* A 3.8B model with perfect context cannot match a 70B model on complex multi-file refactors regardless of context quality. Model size independently determines reasoning depth. The minimum viable model is likely 7-13B parameters for engineering work. +Query performance and memory costs are now bounded — 5 million entities ≈ 400MB RAM, O(1) hash lookups, domain-scoped Screamer checks. A large Wikidata load is a capital cost, not a recurring bill (see "Performance" above). -4. *The 3-retry dispatcher loop.* When the dispatcher rejects a proposal, the rejection trace feeds back to the LLM for self-correction (up to 3 retries). If the dispatcher rejects 30% of proposals, the effective token multiplier is 1.39x per action. At 50% rejection (plausible during early use), it is 1.75x. This penalty decreases as the dispatcher accumulates rules. +Remaining open: the right N hops from entities referenced in the memex depends on the memex's breadth. A software-engineering memex needs ~1 hop; a literary memex needs 3-4 hops (Nabokov → Kafka → expressionism → modernism → Baudelaire). The right value is empirical, testable, and user-specific — it cannot be set in the architecture. -5. *Competitor evolution.* Sparse retrieval is not patentable. Claude Code, Copilot, and others will implement similar mechanisms. The architectural advantage is real but finite in duration. The deterministic safety gate is the harder-to-replicate differentiator. +*** Can the symbolic engine satisfy queries from the user without LLM involvement? -6. *The self-repair criterion.* "What belongs in core?" is decided by a single test: if this file is corrupted, can the agent fix it without human help? Corrupted core = dead brain, dead hands, or unreachable. Corrupted skill = degraded but self-repairable. If the agent has tools, identity, and user input, it can reason about missing awareness, edit the corrupted source file, reload the skill, and continue. If it loses its own reasoning loop, it has no way to self-diagnose. This is why context assembly and heartbeat generation were extracted to skills in v0.5.0 — the agent can detect their absence and reload them. The core contracts to the absolute minimum needed for self-repair: the pipeline, the memory, the transport, and the skill loader. +The design aims for zero-LLM query answering: the user issues a structured command (=/query=, =/contradictions=, =/audit=), and the symbolic engine responds directly. But natural language questions ("what do I think about monorepos?") still require the LLM as a thin translation layer. Whether the structured command interface is sufficient for daily use, or whether users will demand natural language interaction, determines how much LLM involvement remains in the mature system. -7. *Why no subagents?* Claude Code, OpenCode, OpenClaw, and Hermes all implement multi-agent delegation (parent spawns child with separate context, tools execute, child reports back). Passepartout rejects this on principle. There are five reasons: +*** Is the triplestore physically bounded or does it explode? - *Zero coordination overhead.* Subagents spend tokens on delegation protocols — prompt templates for spawning, agent-summary messages for progress reporting, sidechain transcripts for integration. Passepartout's single-brain model pays zero tokens for inter-agent communication. +A personal memex with years of diary entries, project notes, reading logs, and literary analyses could produce millions of triples. A naive hash table scales linearly but VivaceGraph's Prolog-like queries may not. The performance characteristics of graph queries over a million-triple knowledge base have not been estimated. - *Causal traceability.* Every decision traces through a single Merkle chain, a single gate stack, a single memory space. With subagents, if a delegated agent makes a bad decision, the parent agent may never see the full reasoning — the subagent's internal context is opaque. +* Relation to Passepartout's Existing Architecture - *Memory coherence.* Subagents require either duplicated context (wasteful) or context partitioning (lossy). Passepartout's foveal-peripheral model sees everything relevant in a single memory space — there is no context to split. +The neurosymbolic engine is an extension of the existing probabilistic-deterministic split, not a replacement for it. The current architecture divides cognition into LLM-driven proposals and Lisp-driven verification. The symbolic engine deepens the verification side from "is this action safe?" to "is this claim supported?" — the same architectural pattern applied to a broader domain. - *The arXiv paper (2604.14228v1) validates this.* Section 11.3 notes that subagent isolation is a genuine trade-off: "Isolated subagent boundaries" vs unified memory coherence. The paper treats both as legitimate architectural choices. - - *When would subagents be warranted?* If Passepartout ever needs to execute background tasks that don't share the main agent's context (e.g., nightly cron jobs, cross-project analysis), the architecture can add isolated agents as a skill — not as a core mechanism. The single-brain model is the default, not the only option. +The self-repair criterion (a file belongs in core only if, when corrupted, the agent cannot fix it without human help) applies to every component of the symbolic engine. Screamer, VivaceGraph, the fact store, the archivist — all are skills, loaded at runtime, hot-reloadable, and recoverable from corruption. A corrupted symbolic engine degrades reasoning capability but does not kill the agent. The eight existing core ASDF files are unchanged. +The symbolic engine is not v1.0.0 alone. It is the layer that sits between the existing gate stack (which it makes explicit as facts) and the existing skill system (which it extends with deduction, contradiction detection, and provenance tracking). It grows within the current architecture without replacing any existing component. +See also: +- =ROADMAP.org= — the concrete phased implementation plan (neurosymbolic phases at v0.10.0 through v0.36.0) +- =ARCHITECTURE.org= — the current pipeline architecture +- =notes/passepartout-whitehead.org= — Whitehead's four concrete contributions +- =notes/passepartout-symbolic-engine-exploration.org= — the original architecture exploration +- =notes/competitive-landscape.org= — 55-system competitive survey diff --git a/docs/ROADMAP.org b/docs/ROADMAP.org index f70d7e0..76d3503 100644 --- a/docs/ROADMAP.org +++ b/docs/ROADMAP.org @@ -4,7 +4,7 @@ * The Evolutionary Roadmap -Understanding Passepartout as a function in time is not nostalgia. It is architectural guidance. Every decision in v0.x should be made with awareness of where the system is going. Code written today becomes the substrate for v3.0. Skills designed today become the vocabulary the symbolic engine speaks tomorrow. +Understanding Passepartout as a function in time is not nostalgia. It is architectural guidance. Every decision in v0.x should be made with awareness of where the system is going. Code written today becomes the substrate for the Lisp Machine. Skills designed today become the vocabulary the symbolic engine speaks tomorrow. The probabilistic beginning is not a weakness to overcome. It is the bootstrap. The system learns the domain through probabilistic inference, and that learned knowledge becomes the seed for the symbolic engine. By the time the symbolic engine takes over, it has a rich knowledge graph to reason about, grown from thousands of probabilistic interactions. @@ -12,7 +12,7 @@ This is how you build a reasoning machine: start with a learner, make it learn t Each version expands the deterministic layer. The Dispatcher writes rules from approved exceptions. Shadow mode runs trial executions. Tool permission tiers mature from simple allow/deny to nuanced context-aware policies. The agent becomes less likely to attempt dangerous actions not because it is smarter but because the guard has more complete information. -The roadmap is designed working backwards from SOTA parity (v1.0.0), guiding each version toward a fully autonomous, self-editing agent. Each version builds on the previous, with features designed to be implemented in pure Common Lisp + Org-mode. +The roadmap works backwards from Neurosymbolic Maturity (v1.0.0) and Lisp Machine Emergence (v2.0.0). Each build step is one minor version — one capability, measured in lines, verified by tests. Breadth releases (TUI, tools, gateways) alternate with depth releases (fact store, Screamer, VivaceGraph, ACL2) so the system is usable at every step. The TODO states in each version's Tasks section are the authoritative task tracker. The feature tables describe what each version delivers. @@ -34,11 +34,59 @@ On release: 2. Extract DONE items from ROADMAP (all items with LOGBOOK timestamps since the last release tag) and use as the release notes body 3. If a ~CHANGELOG.md~ is needed for packaging tools, auto-generate it from ROADMAP DONE items -** v0.8.1: Direction 2 — Rich Rendering +** DONE v0.8.0: Information Radiator (Foundation) -Full markdown, tool execution visualization, mouse support, and cost display. This makes the TUI competitive on rendering quality with Claude Code and OpenCode. +Sidebar (6 panels), sidebar overlay mode (<120 cols), command palette (Ctrl+P), TrueColor theme expansion (8 presets). -*** TODO Full markdown rendering +For the full DONE items, see ~CHANGELOG.org~. + +** v0.9.0: Eval Harness — Safety Net First + +Every subsequent release ships with automated regression protection. The eval harness is the gate that makes self-modification safe — before any neurosymbolic component modifies the system, the harness verifies nothing broke. + +*** TODO Internal evaluation harness — 10 tasks, regression detection +:PROPERTIES: +:ID: id-v090-eval-harness +:CREATED: [2026-05-08 Fri] +:END: + +- New skill: ~symbolic-evaluation.org~ → ~symbolic-evaluation.lisp~ +- ~deftask~ macro: define an eval task with ~:setup~ (create test environment), ~:prompt~ (what to ask the agent), ~:verify~ (function that checks the output), ~:teardown~ (cleanup) +- ~run-eval-suite~: run all registered tasks, produce score (pass count / total), per-task diagnostics +- Initial 10 tasks: find TODOs, create Org note, search codebase, read file, query memory, list projects, run safe shell command, find definition, set TODO state, summarize session +- Regression mode: run after each version build. Fail CI if score drops. +- Task suite grows with codebase: every bug fix adds a regression task +~200 lines. + +** v0.10.0: Phase 0 — Type-Level Gates + Core Integrity (~75 lines) + +:PROPERTIES: +:ID: id-v090-phase0 +:CREATED: [2026-05-09 Sat] +:END: + +Add ~:type-level~ metadata to the existing ~defgate~ and ~def-cognitive-tool~ macros. Before any gate predicate evaluates, the dispatcher checks structural type compatibility: a signal at type-level 5 cannot pass a gate at type-level 4 or lower. Self-modification of the safety layer becomes impossible by construction. + +*** Rationale + +The Dispatcher gate stack currently prevents self-modification through pattern matching — gate vector 2b catches writes to ~core-*~ files as a heuristic. But there is no /structural/ guarantee preventing a request from modifying the rules that validate it. Pattern-based protection can be bypassed through indirection (an ~eval~ that constructs a write, a skill that redefines a gate function at runtime). A type-level check is not heuristic — it is a category error rejected before any predicate runs, just as PM's theory of types made self-membership syntactically invalid before any logical evaluation. + +*** Implementation + +1. Add ~:type-level~ keyword argument to ~defgate~ (default 0) and ~def-cognitive-tool~ (default 0) in ~core-skills.org~. +2. Add ~gate-type-check~ to the dispatcher's ~run-gates~ function in ~security-dispatcher.org~, executed before any gate predicate. +3. Assign type levels to existing cognitive tools: self-build-core at 5, write-file at 3, read-file at 1, shell at 2, eval at 4. +4. Assign type levels to existing gate vectors: self-build boundary at 5, shell safety at 3, path protection at 2, network exfil at 2, secret content at 1. +5. Add ~dispatcher-check-self-termination~: scan shell commands for patterns targeting the Passepartout process (~kill -9 ~, ~rm -rf ~/.cache/passepartout/~, ~sudo apt remove sbcl~). Return ~:reject-self-termination~ with a diagnostic message explaining which command matched and why it would destroy the agent. Human override is possible via HITL — the gate does not prevent the human from issuing the command in a terminal. It prevents the /LLM/ from issuing it accidentally. ~20 lines. +6. Add ~integrity-verify-core-files~: on heartbeat, hash the eight core files against known-good values stored at daemon startup. On mismatch, inject an integrity alert into the signal queue. ~25 lines, uses existing SHA-256 infrastructure from v0.2.0 Merkle memory. + +*** Verification + +Existing FiveAM gate tests continue to pass. New test: signal at type-level 5 targeting a gate at type-level 4 returns ~:reject-type-violation~ without evaluating the gate predicate. New test: signal at type-level 1 passing through a gate at type-level 3 proceeds to predicate evaluation. New test: ~kill -9 ~ returns ~:reject-self-termination~. New test: modified core file is detected by integrity hash check. + +This is Contribution 1 from ~notes/passepartout-whitehead.org~. Every type-level rejection emits a structured event that Phase 1 ingests as a fact. ~30 lines implement the seed of the ontology without any new dependencies. ~75 lines total, extends dispatcher, no new skill. + +** v0.11.0: Full Markdown Rendering :PROPERTIES: :ID: id-v071-markdown-full :CREATED: [2026-05-08 Fri] @@ -52,7 +100,86 @@ Extend the markdown renderer from v0.7.1: - Syntax highlighting for code blocks: keyword/string/function colors from theme. Regex-based (no parser dependency). - All markdown features degrade gracefully to plain text on terminals without attribute support. ~100 lines. -*** TODO Tool execution visualization +** v0.12.0: Phase 0b — Layered Signal Authentication, Layer 1 (~200 lines) +:PROPERTIES: +:ID: id-v090-phase0b +:CREATED: [2026-05-09 Sat] +:END: + +Implement gate vector 0 at priority 700 — before all other gates and before any type-level checking — with Layer 1 (cryptographic authentication) active. Layers 2-4 (sensory, deterministic reasoning, probabilistic) are stubbed with ~:unavailable~ results and deferred to later phases. + +Signals carry cryptographic signatures verified against a key registry stored as fact-store facts. Automated signal sources cannot impersonate the human. The human can revoke compromised keys. The authorization matrix is per-key, per-action-class. + +*** Rationale + +Authentication is layered because no single mechanism suffices. Cryptographic authentication proves key ownership but not identity. A valid key can be used by a compromised process, can sign pre-recorded frames, can be held by someone who is not who they claim to be. The four-layer design (Layer 1: crypto, Layer 2: sensory, Layer 3: deterministic reasoning, Layer 4: probabilistic) stacks evidence. Phase 0b ships Layer 1 — the foundation — with the architecture for layers 2-4 already designed. + +The ~:source~ field in the signal plist is metadata — it /claims/ origin, it does not /prove/ it. This phase replaces it with cryptographic proof. + +*** Implementation + +**** Key generation and signature utilities — extends ~security-vault.lisp~ +Generate key pairs for signal sources. Canonicalize signal plists (sorted keys, stripped of the signature field). Sign with the source's private key. Verify with the public key from the key registry. ~50 lines. Uses Ironclad (already an ASDF dependency). The vault already stores credential material — key material extends the same storage with the same encryption. + +**** Gate vector 0 — extends ~security-dispatcher.lisp~ +Registered at priority 700 (before the policy gate at 600, before all other gates). Architecture for all four layers: +#+begin_src lisp +(defun gate-layered-authentication (signal) + (let ((results '())) + (let ((crypto-result (auth-crypto-verify signal))) + (push (cons :crypto crypto-result) results) + (when (eq (getf crypto-result :result) :reject) + (return-from gate-layered-authentication + (list :result :reject :confidence nil + :layer-results (nreverse results))))) + (let ((sensory (if (fboundp 'auth-sensory-verify) + (auth-sensory-verify signal) + '(:result :unavailable)))) + (push (cons :sensory sensory) results)) + (let ((det (if (fboundp 'auth-deterministic-verify) + (auth-deterministic-verify signal) + '(:result :unavailable)))) + (push (cons :deterministic det) results) + (when (eq (getf det :result) :reject) + (return-from gate-layered-authentication + (list :result :reject :confidence nil + :layer-results (nreverse results)))) + (let ((prob (if (fboundp 'auth-probabilistic-verify) + (auth-probabilistic-verify signal) + '(:result :unavailable)))) + (push (cons :probabilistic prob) results)) + (let ((confidence (aggregate-confidence results))) + (list :result :pass :confidence confidence + :layer-results (nreverse results)))))) +#+end_src + +Layer 1: verify cryptographic signature, check permission matrix against key registry, reject on failure. ~50 lines. Layers 2-4: stubbed, return ~:unavailable~. + +**** Key registry — facts in the fact store +Key lifecycle facts are admitted in a ~:key-lifecycle~ domain with ~:singular~ cardinality. Key creation, promotion, and revocation are facts with Merkle version chains. The human's key signs new keys into existence and signs revocation. ~50 lines. + +**** Signal provenance chain — Merkle-linked causality +When a signal triggers a downstream signal, each carries a ~:sigchain~ field with all upstream ~(:key-id :signature :auth-result )~ entries. Tampering with any link invalidates the leaf. Revocation propagates through the chain — flagged, not deleted. ~50 lines. + +**** Deferred Authentication Layers (2-4) +- Layer 2 — Sensory: Active when vision/audio processing skills are loaded. Verifies liveness, cross-modal consistency. When unavailable, returns ~:unavailable~. +- Layer 3 — Deterministic Identity Reasoning: Active when Phase 2 (Screamer + populated fact store) is complete. Queries the fact store for identity-ruling facts. +- Layer 4 — Probabilistic Identity Reasoning: Active when style profiles exist. Uses embedding infrastructure to compare writing style, behavioral patterns. Returns a confidence score; never rejects outright — downgrades authorization. +The gate architecture is designed with all four layers from Phase 0b. Adding a layer requires adding a skill, not modifying the gate. + +*** Verification — ~8 FiveAM tests +1. ~test-sign-verify-roundtrip~ — sign and verify a plist roundtrip. +2. ~test-tampered-signal-rejected~ — modify payload after signing, verification fails. +3. ~test-human-key-permits-write~ — human key with ~:write~ passes Layer 1 and the full gate. +4. ~test-sensor-key-denied-write~ — sensor key proposing a write is rejected. +5. ~test-revoked-key-rejected~ — revoked key is rejected by Layer 1. +6. ~test-sigchain-invalidated-by-revocation~ — root signer revoked flags downstream. +7. ~test-layers-2-3-4-unavailable~ — when Layers 2-4 are not loaded, they return ~:unavailable~ and the gate proceeds with Layer 1 only. +8. ~test-layer-3-rejects-on-contradiction~ — deterministic reasoning (mock) detects identity-ruling contradiction, gate rejects. + +~200 lines total. Depends on Phase 0 (type-level gates). + +** v0.13.0: Tool Execution Visualization :PROPERTIES: :ID: id-v071-tools :CREATED: [2026-05-08 Fri] @@ -67,7 +194,93 @@ When the agent invokes a tool: Uses Croatoan's ~init-pair~ + ~color-pair~ for 256-color backgrounds on tool state regions. ~100 lines. -*** TODO Mouse support +** v0.14.0: Phase 1 — Minimum Viable Fact Language (~200 lines, new skill) +:PROPERTIES: +:ID: id-v090-phase1 +:CREATED: [2026-05-09 Sat] +:END: + +Ephemeral, in-memory triple store with provenance tracking and contradiction detection. No disk persistence. All facts live in a hash table and are discarded on session end. Gate outcomes are ingested as facts. The gate stack's implicit ontology is materialized as the seed fact set. + +*** Rationale + +Three reasons ephemeral is the correct first step: +1. *The fact language is unproven.* Triples with provenance and grounding is a hypothesis that must be tested against real memex content before being committed to a serialization format. +2. *The ontology is emergent.* Categories are created on first use. A persistent format would require a migration story for every category change. Ephemeral avoids this — facts are re-derived on each session start using the evolved ontology. +3. *Rebuildability is the safety net.* Because all facts have a ~:grounding~ to an Org heading, and gate-outcome facts are regenerated from the gate stack on load, the entire symbolic index can be thrown away and rebuilt from scratch. The cost is compute, not data. + +*** Implementation — ~symbolic-facts.org~ → ~symbolic-facts.lisp~ (skill) + +**** Abstract Fact Store Interface — design before implementation +Before any code is written, the five-function API must be designed and committed: +#+begin_example +fact-assert :: fact → store → (:admitted | :rejected | :flagged) +fact-query :: (entity &key relation policy) → active-value-or-values +fact-history :: (entity relation) → ordered chain of versioned facts +fact-snapshot :: () → root-hash +fact-rollback :: root-hash → store +#+end_example + +This interface is load-bearing. Every consumer — the archivist, Screamer, ACL2, the planner — calls these five functions. They never access the backing store directly. In Phase 1-4, the backing store is an ephemeral hash table. In Phase 5, it is VivaceGraph + Merkle ~memory-object~ wrappers. The interface must be tested against both backends from the start. Every API function receives a FiveAM test that runs against both a hash-table mock and a VivaceGraph mock. + +The interface also exposes a read-only ~fact-degraded-mode-p~ function. When Screamer is not loaded, the fact store functions with basic hash-table consistency checks (string equality, not constraint solving). When VivaceGraph is not loaded, Prolog queries are unavailable. The degraded-mode flag tells consumers (and the status bar) what is and isn't operational. + +**** Triple store +A hash table keyed by ~(entity relation)~. Values are plists: +#+begin_example +(:value + :grounding + :provenance <:gate-outcome | :human-authored | :deduced | :llm-proposed> + :timestamp + :parent-id + :policy <:singular | :dual | :plural>) +#+end_example + +The ~:provenance~ field tracks how the fact entered the store. The ~:parent-id~ field links to the previous version in the Merkle chain — every fact has version history regardless of cardinality. + +**** Bootstrap from gates +On skill load, scan the Dispatcher's existing data structures and produce triples: +#+begin_example +;; From *dispatcher-protected-paths* +(:entity ".env" :relation :member-of-class :value :secret-config-file :provenance :gate-outcome) +(:entity "*id_rsa*" :relation :member-of-class :value :ssh-key-file :provenance :gate-outcome) +;; From *dispatcher-shell-blocked* +(:entity "rm -rf /" :relation :classified-as :value :catastrophic-command :provenance :gate-outcome) +;; From *dispatcher-network-whitelist* +(:entity "api.telegram.org" :relation :classified-as :value :trusted-domain :provenance :gate-outcome) +#+end_example +This produces 50-70 entity classes immediately. No LLM involvement. No human authoring. Mechanically extracted from existing code. + +**** Ingest gate outcomes +Register a post-gate hook on the Dispatcher's rejection path. Every gate rejection produces a triple with ~:provenance :gate-outcome~. + +**** Query +~(fact-query &key entity relation value source-provenance)~ — pure hash-table lookup. ~30 lines. +~(fact-query-all &key relation value source-provenance)~ — returns all triples matching filter criteria. Enables "find all files classified as secrets." + +**** Contradiction detection — policy-driven, not policy-agnostic +On every ~fact-assert~, the system checks the fact's ~entity~ class to determine its cardinality policy. Time is universal — every fact carries a ~:timestamp~ and ~:parent-id~ link regardless of policy. The policy only governs the active set: + +- ~:singular~: same ~(:entity :relation)~, same value → supersede (chain via ~:parent-id~). Same pair, different value at later timestamp → supersede, chain as new leaf. Same pair, different value at same timestamp → contradiction rejected, human resolves. +- ~:dual~: first two values admitted as complementary, cross-referenced via ~:complement~ edge. Third value → prompt: promote to ~:plural~ or demote one? Each value has its own version chain. +- ~:plural~: any value admitted. Values cross-referenced when in tension. If active count drops to 1 → collapse to ~:singular~. If active count drops to 2 and values are complementary → prompt to collapse to ~:dual~. + +The policy table maps entity classes to ~:singular~, ~:dual~, or ~:plural~. Gate-bootstrapped facts default to ~:singular~ (the filesystem is physically singular). New categories default to ~:plural~ (safe — never loses information). Categories for dialectical or complementary domains are explicitly ~:dual~. + +*** Verification — ~9 FiveAM tests +1. ~test-bootstrap-creates-facts~ — bootstrap produces correct triples from ~*dispatcher-protected-paths*~. +2. ~test-bootstrap-creates-shell-facts~ — bootstrap produces correct triples from ~*dispatcher-shell-blocked*~. +3. ~test-gate-outcome-produces-fact~ — a simulated gate rejection produces a triple with ~:provenance :gate-outcome~. +4. ~test-fact-query-returns-correct-value~ — querying by entity and relation returns the expected value plist. +5. ~test-duplicate-ingestion-idempotent~ — asserting the same fact twice does not produce a duplicate or a contradiction. +6. ~test-singular-supersedes~ — a fact with a later timestamp supersedes the old value, retained with ~:parent-id~ chain in the Merkle DAG. +7. ~test-singular-same-time-contradiction~ — contradictory fact in ~:singular~ domain at same timestamp → rejection, human resolution. +8. ~test-plural-admits-all~ — multiple values for same pair in ~:plural~ domain stores all with cross-references. +9. ~test-dual-admits-two-rejects-third~ — ~:dual~ domain admits two complementary values and rejects the third, prompting cardinality promotion. + +~200 lines. New skill: ~symbolic-facts.org~. Depends on Phase 0b (auth). + +** v0.15.0: Mouse Support :PROPERTIES: :ID: id-v071-mouse :CREATED: [2026-05-08 Fri] @@ -82,7 +295,53 @@ Croatoan supports ncurses mouse mode via ~(setf mouse-enabled-p)~. Enable: - Click on gate trace line to expand/collapse trace ~40 lines. -*** TODO Cost display +** v0.16.0: Phase 1a — Self-Preservation Mechanisms (~120 lines) +:PROPERTIES: +:ID: id-v090-phase1a +:CREATED: [2026-05-09 Sat] +:END: + +Make self-preservation active rather than architectural. The agent monitors its own integrity, quarantines failing skills, signals degradation to the user, and monitors resource pressure. The external watchdog guards the daemon process from outside the SBCL image. + +*** Rationale + +The current architecture has passive self-preservation: the self-build boundary blocks LLM-originated core modifications, memory snapshots enable rollback, and ~fboundp~ guards catch missing skills. But degradation is silent — a skill dies, the guard fires, and the agent never tells you. The status bar shows green "connected" while the symbolic reasoning layer is down. + +These mechanisms are small (~20-50 lines each), leverage existing infrastructure (Merkle hashes, heartbeat, the dispatcher gate stack), and transform self-preservation from a structural property into an active behavior. They implement the Third Law for Passepartout: preserve yourself against non-human threats — LLM proposals, environmental degradation, resource exhaustion — and signal to the human when you are wounded. + +*** Implementation + +**** Quarantine on skill failure — extends ~core-skills.lisp~ +Track per-skill error counts in a ~*skill-error-counter*~ hash table, resetting on each heartbeat cycle. When a skill accumulates three unhandled errors within a single cycle, unload the skill, log the quarantine event, and inject a system message: "Skill 'symbolic-facts' quarantined (3 errors: consistency check nil, fact-query on missing key, Screamer timeout). Reload with /skill-reload symbolic-facts." The skill's ~defskill~ struct is flagged ~:quarantined~ and excluded from trigger resolution until explicitly reloaded. ~40 lines. + +**** Degraded-mode signaling — extends ~core-reason.lisp~ and TUI +Maintain a ~*degraded-components*~ list populated by ~fboundp~ guards and the quarantine system. When ~think()~ assembles the system prompt, inject a DEGRADATION section: "I am operating in degraded mode. Screamer is unavailable (consistency checks disabled). VivaceGraph is unavailable (Prolog queries disabled). Core safety gates are all active." + +The TUI status bar renders a second line, amber-colored, when ~*degraded-components*~ is non-empty: "⚠ Degraded: Screamer, VivaceGraph. /doctor skills for details." ~30 lines across daemon and TUI. + +**** Resource self-monitoring — extends ~symbolic-events.lisp~ +On heartbeat, check memory pressure (~sb-kernel:dynamic-usage~ against total), disk space on ~~/.cache/~ (~uiop:directory-exists-p~ + stat), and open file descriptors. When a resource crosses a critical threshold, shed non-essential skills in order of ~:preservation-priority~ (~:critical~ never shed, ~:normal~ shed after ~:low~, ~:low~ shed first). + +Inject a system message: "Memory critical (94% of 16GB). Unloading embedding-native (768MB), channel-discord, channel-slack. Core safety: unchanged. Essential skills retained: 18." ~50 lines. + +Skill shed order is determined by a new ~:preservation-priority~ slot on ~defskill~ (default ~:normal~). Core safety skills carry ~:critical~ and are never shed. Heavy skills (embedding-native with its model in memory, channel gateways with connection pools) carry ~:low~. + +**** External watchdog — extends ~passepartout~ bash entry point +The bash script spawns a watchdog subprocess that polls the daemon port every ~WATCHDOG_TIMEOUT~ seconds (default 30). If the port stops responding, the watchdog snapshots the last known-good Merkle root, kills the stale process, and restarts the daemon with ~--snapshot ~. + +The watchdog is outside the SBCL image. A dead process cannot restart itself. ~25 lines of bash, no new Lisp code. + +*** Verification — ~6 FiveAM tests +1. ~test-quarantine-on-three-errors~ — a skill that errors three times in a single cycle is quarantined and removed from trigger resolution. +2. ~test-degraded-mode-visible~ — when Screamer is not loaded, the system prompt includes a DEGRADATION section. +3. ~test-resource-shed-low-priority~ — when memory exceeds threshold, ~:low~ priority skills are unloaded first. +4. ~test-critical-skills-never-shed~ — ~:critical~ priority skills are retained regardless of resource pressure. +5. ~test-resource-recovery-reloads~ — when resources recover below threshold for N consecutive heartbeats, shed skills are reloaded automatically. +6. ~test-quarantined-skill-relaodable~ — a quarantined skill can be reloaded via ~/skill-reload~ and passes sandbox validation before promotion. + +~120 lines. Extends existing skills. Depends on Phase 0-1. + +** v0.17.0: Cost Display :PROPERTIES: :ID: id-v071-cost :CREATED: [2026-05-08 Fri] @@ -94,7 +353,55 @@ Croatoan supports ncurses mouse mode via ~(setf mouse-enabled-p)~. Enable: - Color-coded: green under daily budget, yellow approaching, red exceeding - Requires token counter infrastructure from v0.5.0. ~50 lines for display; token counting is v0.5.0 infrastructure. -*** TODO Session export — ~/export~ command +** v0.18.0: Phase 2 — Screamer as Admission Gate (~200 lines, new skill) +:PROPERTIES: +:ID: id-v090-phase2 +:CREATED: [2026-05-09 Sat] +:END: + +Wrap Screamer (a constraint solver with non-deterministic backtracking) as a skill. Use it for consistency checking against the triple store and for deduction of new facts from existing ones. Screamer is the *verification* layer; VivaceGraph (Phase 5) is the *storage* layer. + +*** Rationale + +The "verified extraction" pattern requires a deterministic admission gate. Screamer's non-deterministic backtracking finds contradictions that simple string comparison misses. For example, if existing facts say "all config files with extension =.env= are classified as secrets," and the LLM proposes "=app.env= is not secret," Screamer finds the contradiction by substituting =app.env= into the existing rule. A naive string-keyed hash table comparison would miss this because ="app.env"= and =".env"= are different strings. + +Screamer also enables deduction — new facts from existing ones without any LLM involvement. If all files matching =*.env= are secrets, and =prod.env= matches =*.env=, then =prod.env= is a secret. Deduced facts carry =:provenance :deduced= and a =:derived-from= chain pointing to the facts they were derived from. + +*** Implementation — ~symbolic-screamer.org~ → ~symbolic-screamer.lisp~ (skill) + +**** Wrap Screamer +Screamer is available via Quicklisp. Load at runtime via ~ql:quickload :screamer~. Not an ASDF dependency — if Screamer is not installed, the skill degrades gracefully (no consistency checking, no deduction — the fact store still functions as a hash table with provenance tracking). + +**** Consistency check +~(screamer-consistent-p candidate-fact existing-facts)~ — expresses the fact store as Screamer constraint variables. The candidate fact is asserted. Screamer checks solvability. Returns ~:consistent~, ~:contradiction
~, or ~:redundant~ (the fact is already implied by existing facts). + +Early-stage: the consistency check works on simple triples. As the fact store grows, rules of the form "all X are Y" (representing protected paths, shell patterns, class memberships) become Screamer constraints that new facts must satisfy. + +**** Deduction +~(screamer-deduce existing-facts)~ — Screamer finds implications of the existing fact set that are not already in the store. New facts are asserted with ~:provenance :deduced~ and a ~:derived-from~ list of source fact keys. + +Deduction is not run on every assertion — it is a background task triggered by heartbeat or manually. The cost is compute (Screamer exploration), not tokens. + +**** Admission gate +~(screamer-admit candidate-fact existing-facts)~ — wraps consistency check with the cardinality policy lookup. The policy is determined by the fact's entity class (see Phase 1: ~:singular~, ~:dual~, or ~:plural~). + +- ~:singular~: same value ⇒ supersede (chain via ~:parent-id~). Different value, later timestamp ⇒ supersede. Different value, same timestamp ⇒ contradiction rejected (human resolves). +- ~:dual~: first two values admitted as complementary. Third rejected (prompt cardinality promotion). +- ~:plural~: any value admitted with cross-references. Active count transitions trigger cardinality collapse checks. + +This is the function the archivist calls before any LLM-proposed fact enters the store. It is also called on human-authored facts (which override — the human can assert facts that bypass cardinality checks). It is not called on gate-outcome facts (gates are the ground truth for security ~:singular~ domains). + +*** Verification — ~6 FiveAM tests +1. ~test-screamer-consistency-passes~ — a fact consistent with existing triples returns ~:consistent~. +2. ~test-screamer-contradiction-detected~ — "app.env is not secret" contradicts "all *.env files are secrets" and returns ~:contradiction~. +3. ~test-screamer-redundant-detected~ — asserting a fact already implied by existing facts returns ~:redundant~. +4. ~test-screamer-deduction-produces-new-fact~ — given "all *.env files are secrets" and "prod.env matches *.env", Screamer deduces "prod.env is secret." +5. ~test-admission-gate-singular-supersedes~ — a later-timestamped value for a ~:singular~ domain fact supersedes the old value, chaining via ~:parent-id~. +6. ~test-admission-gate-dual-rejects-third~ — a ~:dual~ domain rejects the third value, prompting ~:plural~ promotion. + +~200 lines. New skill: ~symbolic-screamer.org~. Depends on Phase 1 (triple store). Not an ASDF dependency — degrades gracefully. + +** v0.19.0: Session Export :PROPERTIES: :ID: id-v071-export :CREATED: [2026-05-08 Fri] @@ -108,7 +415,58 @@ Claude Code has ~/share~ (shareable URL). OpenCode has ~/export~ (Markdown). Her - ~/export json~ outputs the session as JSON (for programmatic consumption) ~50 lines. Uses existing message vector and ~memory-object-render~ for Org formatting. -*** TODO Tool output spilling — large results to file +** v0.20.0: Phase 3 — Archivist as Fact Proposer (~100 lines, extends existing archivist) +:PROPERTIES: +:ID: id-v090-phase3 +:CREATED: [2026-05-09 Sat] +:END: + +Extend the existing archivist skill (~symbolic-archivist.org~) with a fact extraction mode. The LLM reads prose, proposes triples, and Screamer verifies them before admission. The archivist's existing Scribe (log distillation) and Gardener (link scanning) functions are unchanged. + +*** Rationale + +The archivist already walks the entire memex (the Gardener scans for broken links and orphans). Adding fact extraction reuses the same traversal infrastructure rather than duplicating it. The extraction is gated by Screamer — the LLM is a proposer, not an extractor. Facts that fail consistency checking are discarded. Facts that pass are admitted with ~:provenance :llm-proposed~ and ~:grounding~ to the source heading. + +*** Implementation — extends ~symbolic-archivist.org~ + +**** Propose from prose +Given an Org heading, call the LLM with a minimal prompt (~200 tokens): +#+begin_example +Extract triples from this text as (:entity :relation :value ). +Ground each triple to the heading. Return a list of triples. +#+end_example + +The LLM returns structured triples via the existing JSON→plist structured output path from v0.4.2. The prompt is environment-aware: if the heading's file is in =literature/= or has =:literature:= tags, the prompt includes literature-specific relations (=:wrote=, =:published-in=, =:influenced=). If the heading is in =projects/=, the prompt includes coding-specific relations (=:depends-on=, =:tested-by=). + +**** Verify through Screamer +Each proposed triple runs through ~(screamer-admit candidate existing-facts)~ from Phase 2. Facts admitted follow the cardinality policy of their entity class (=:singular=, =:dual=, or =:plural=). Rejected facts are discarded with a log entry. + +**** Provenance tracking +After each extraction run, update provenance counts: +#+begin_example +(:total-facts 847 + :gate-outcome 312 + :human-authored 12 + :deduced 89 + :llm-proposed 434) +#+end_example +This is the data structure that Phase 4's sufficiency criterion reads. It is also surfaced in the TUI sidebar or ~/status~ command: "Symbolic index: 847 facts (37% from gates, 52% LLM-proposed, 10% deduced, 1% human)." + +**** Rebuildable +Because every fact has a ~:grounding~ to an Org heading, the entire LLM-extracted subset can be discarded and re-extracted without losing gate-outcome or deduced facts. The ~(fact-purge :provenance :llm-proposed)~ function removes all LLM-proposed facts. A subsequent ~(archivist-extract-all)~ re-extracts from scratch. + +This is the safety net: if the LLM produces a bad extraction that passes Screamer's consistency check (possible in the early stages when the fact store has few existing facts to check against), the extraction can be redone after the fact store has grown. The cost is compute, not data. + +*** Verification — ~5 FiveAM tests +1. ~test-archivist-extracts-triples~ — given a known Org heading with explicit triples in the prose, the archivist produces correct triples via LLM. +2. ~test-archivist-verified-extraction~ — a hallucinated triple is rejected by the Screamer admission gate. +3. ~test-provenance-counts-update~ — after extraction, the provenance breakdown is correct. +4. ~test-purge-llm-facts~ — does not delete gate-outcome or deduced facts. +5. ~test-re-extraction-idempotent~ — re-extracting from the same prose after purging produces the same facts. + +~100 lines. Extends existing archivist skill. Depends on Phase 2 (Screamer). + +** v0.21.0: Tool Output Spilling :PROPERTIES: :ID: id-v081-output-spill :CREATED: [2026-05-08 Fri] @@ -121,7 +479,61 @@ Claude Code saves tool results >30KB to ~/.claude/tool-results/ with a 200-line - The LLM can ~read-file~ the full output if it needs to analyze it ~30 lines in ~core-loop-act.lisp~ -*** TODO Read-only output caching within a turn +** v0.22.0: Phase 4 — Sufficiency Criterion ("The Flip") (~50 lines) +:PROPERTIES: +:ID: id-v090-phase4 +:CREATED: [2026-05-09 Sat] +:END: + +Make the architecture's central narrative arc operational: a measurable threshold for when the symbolic engine has enough non-lossy facts to bypass the LLM for extraction. + +*** Rationale + +The architecture describes "at some point, the non-lossy facts constitute a sufficient foundation that the symbolic engine can reverse the flow" but provides no criterion for "some point." The sufficiency score makes the flip computable and visible to the user. + +*** Implementation — extends ~symbolic-facts.lisp~ + +**** Sufficiency score +~(fact-sufficiency-ratio)~ — returns the ratio of non-lossy facts to total facts: +#+begin_src lisp +(/ (+ (count-provenance :gate-outcome) + (count-provenance :human-authored) + (count-provenance :deduced)) + (fact-total-count)) +#+end_src + +When this ratio exceeds ~SUFFICIENCY_THRESHOLD~ (configurable env var, default 0.7), the system considers its foundation sufficient. The threshold defaults to 0.7 because below this, the majority of facts are LLM-proposed and therefore uncertain. Above 0.7, the proven foundation provides enough constraint that Screamer can reliably detect incorrect LLM proposals. + +**** Auto-extraction toggle +When sufficiency is reached, the archivist switches from "LLM proposes, Screamer verifies" to "Screamer queries existing facts, applies category rules to the new prose, and deduces new facts directly." The LLM is bypassed for categories that have sufficient non-lossy coverage. The LLM is still used for novel categories that have no existing facts. + +The switch is configurable: ~AUTO_EXTRACTION_ENABLED=true/false~. When disabled, the system continues with LLM proposals regardless of sufficiency — useful for domains where extraction quality is prioritized over extraction determinism. + +**** Monitor +The TUI sidebar or ~/status~ command displays: +#+begin_example +Symbolic Index + Total facts: 1,247 + Proven: + Gate outcomes: 312 (25%) + Human-authored: 47 (4%) + Deduced: 521 (42%) + ───────────────────────── + Non-lossy: 880 (71%) + LLM-proposed: 367 (29%) + ───────────────────────── + Sufficiency: 71% ✓ (threshold: 70%) + Mode: AUTO-EXTRACTION (LLM bypassed for known categories) +#+end_example + +*** Verification — ~3 FiveAM tests +1. ~test-sufficiency-below-threshold~ — with 30% non-lossy facts, auto-extraction is not enabled. +2. ~test-sufficiency-above-threshold~ — with 75% non-lossy facts, auto-extraction is enabled. +3. ~test-auto-extraction-produces-same-facts-as-llm-extraction~ — for a category with sufficient non-lossy coverage, auto-extraction produces facts that a subsequent LLM extraction also produces (the deterministic path is consistent with the probabilistic path). + +~50 lines. Extends Phase 3 (archivist). + +** v0.23.0: Read-Only Output Caching Within a Turn :PROPERTIES: :ID: id-v081-cache-turn :CREATED: [2026-05-08 Fri] @@ -135,11 +547,7 @@ Claude Code caches read-only tool results within a turn. If the agent reads the - Prevents redundant tool calls when the agent asks the same question twice within a reasoning step ~25 lines in ~programming-tools.lisp~ -** v0.8.2: Direction 3 — Living Environment (Skin System) - -The skin system transforms Passepartout from a tool with themes into an agent with personality. Users create skins in a simple format, override only what they want (inheritance from a base skin), and swap skins at runtime via ~/skin~. The spinner has personality. The borders have personality. The agent's name and welcome message are skin-customizable. - -*** TODO Skin engine +** v0.24.0: Skin Engine + 10 Presets :PROPERTIES: :ID: id-v072-skin-engine :CREATED: [2026-05-08 Fri] @@ -158,30 +566,92 @@ The skin system transforms Passepartout from a tool with themes into an agent wi - Skin preview: ~/skin ~ with ~--preview~ flag applies temporarily; Esc or timeout reverts. - Built-in skins as plist data in a ~*skin-registry*~ hash table. ~250 lines. -*** TODO Skin presets (10+ built-in) +10 presets organized by mood: gold, professional, minimal, forest, ocean, ember, mono, retro, unicorn, midnight. Each derived systematically from accent + background. ~200 lines. + +**** NOTE: Skin Presets (10+ built-in) :PROPERTIES: :ID: id-v072-skin-presets :CREATED: [2026-05-08 Fri] :END: -Organized by mood rather than theme. Each skin is a complete personality profile: +Shipped as part of the skin engine release — the engine with 0 presets is unusable. See Skin Engine TODO above for the preset definitions. -| Skin | Mood | Accent | Spinner | Character | -|------|------|--------|---------|-----------| -| ~gold~ (default) | Warm, approachable | #FFD700 | Kawaii faces | "⚕ Passepartout" | -| ~professional~ | Cool, focused | #5C9CF5 | Minimal braille | "Passepartout" | -| ~minimal~ | Zero decoration | #AAAAAA | None | "p" | -| ~forest~ | Calm, earthy | #7CB342 | Dots | "Passepartout" | -| ~ocean~ | Deep, contemplative | #26C6DA | Pulse | "Passepartout" | -| ~ember~ | Warm, energetic | #FF6D00 | Bounce | "Passepartout" | -| ~mono~ | Grayscale | #E6EDF3 | Minimal | "Passepartout" | -| ~retro~ | Amber terminal feel | #FFB000 | Blinking cursor | "PASSEPARTOUT" | -| ~unicorn~ | Playful, colorful | #E040FB | Sparkle | "🦄 Passepartout" | -| ~midnight~ | Dark blue, calm | #82AAFF | Brain | "Passepartout" | +** v0.25.0: Phase 5 — VivaceGraph + Merkle DAG + Ontology Versioning (~400 lines, new skill) +:PROPERTIES: +:ID: id-v090-phase5 +:CREATED: [2026-05-09 Sat] +:END: -Each skin's color slots derived systematically from accent + background. ~200 lines of skin definitions. +Replace the ephemeral hash-table triple store with VivaceGraph, a Lisp-native graph database with Prolog-like queries. Add the KG type hierarchy (PM type levels applied to the knowledge layer). Define the persistence format from the fact language that survived Phases 1-4. -*** TODO Hooks on defskill — lifecycle interception +*** Rationale + +By this point, the triple fact language has been battle-tested through four phases of gate outcomes, Screamer deductions, LLM proposals, and cross-domain comparisons. The facts that proved useful define the persistent schema. The ones that weren't are left behind. The serialization format is not designed upfront; it emerges from use. + +The transition from ephemeral to persistent is justified when two conditions are met: (1) the fact language has stabilized (categories are being queried, not constantly refactored), and (2) accumulated deductions across sessions provide value that justifies the serialization cost. + +*** Implementation — ~symbolic-vivacegraph.org~ → ~symbolic-vivacegraph.lisp~ (skill) + +**** Wrap VivaceGraph +VivaceGraph is available via Quicklisp. Load at runtime. Not an ASDF dependency. If not installed, the fact store continues as a hash table (Phase 1-4 behavior) with a log warning: "VivaceGraph not available — persistence disabled." + +**** Prolog-like queries +Replace ~fact-query~ with graph traversals: +#+begin_src lisp +;; Find all files classified as secrets +(vivace-query '(:and (:entity ?e) + (:member-of-class ?e :secret-file))) + +;; Find all files classified as secrets that were modified today +(vivace-query '(:and (:entity ?e) + (:member-of-class ?e :secret-file) + (:modified-since ?e ,(today-timestamp)))) + +;; Find contradictions between Wikidata and the memex +(vivace-query '(:and (:entity ?e) + (:has-value ?e ?v1 :source :wikidata) + (:has-value ?e ?v2 :source :memex) + (:not-equal ?v1 ?v2))) +#+end_src + +**** KG type hierarchy +Every entity in the graph carries ~:pm-type-level~ metadata. Queries cannot return entities whose type level equals or exceeds the querying function's type level. A fact-finding query at type-level 2 cannot return facts at type-level 3 or higher. Self-referential knowledge — "this fact defines its own type" — becomes structurally impossible because the type level is assigned at creation and cannot be modified by a fact of the same or higher level. + +This is Contribution 1 (type-level gates) applied to the knowledge layer rather than the execution layer. The dispatcher prevents self-referential /actions/; the KG prevents self-referential /facts/. + +**** Persistence format +The fact language that survived Phases 1-4 defines the format. Each entity is a node; each triple is an edge with properties (=:grounding=, =:provenance=, =:timestamp=). The format is not a new design — it is the triple schema evolved through use, serialized by VivaceGraph's native persistence. + +If the fact language later evolves to n-ary relations, VivaceGraph's graph model accommodates this natively — edges can carry arbitrary property plists. The triple form is a special case of the general graph model. + +**** Load on startup, save on interval +On daemon start, ~(vivacegraph-load)~ reads the last saved graph. On heartbeat, ~(vivacegraph-save)~ persists the graph in its native format to ~~/.cache/passepartout/facts.vg~. The interval matches the existing ~*memory-auto-save-interval*~. The save is atomic: write to a temp file, rename on success. Corruption-safe. + +**** Merkle DAG version chains +Each ~(:entity :relation)~ pair forms an independent Merkle chain. Facts hash over ~SHA-256(value || provenance || timestamp || parent-hash || grounding)~. The ~:parent-id~ pointer forms the chain. Tampering with any version breaks all downstream hashes. + +The chains form a DAG, not a single list. Facts about =.env= evolve independently from facts about Nabokov. =:dual= and =:plural= facts cross-reference via =:complement= and =:contradiction= edges but maintain independent ancestor chains. The Merkle DAG rests on the existing ~memory-object~ infrastructure from v0.2.0 — the fact store is a new occupant of existing housing. ~50 lines to bridge the fact schema into ~memory-object~ wrappers. + +**** Ontology versioning +The category hierarchy itself is a Merkle tree. Every entity class definition hashes over its superclasses, cardinality policy, relations, and description. The aggregate hash of all active class definitions is the ~:ontology-version~ — a Merkle root of the current worldview. + +Every fact stores its ~:ontology-version~ at the time of assertion (a single 64-hex-char field). When categories change, the new hash flags affected facts for re-verification (Screamer re-evaluates each against the new category definitions). Re-verification outcomes are ~:survived~ (deduction still holds), ~:incoherent~ (premises don't translate under new categories, flagged for human review), or ~:reclassified~ (valid but under different classification). + +Queries accept an optional ~:ontology-version~ parameter. The default is ~:active~ (current worldview). Specifying a version returns facts as they were under that worldview: "Under my 2024 security model, this file was a secret. Under my 2025 model, it is an auth-secret." ~40 lines on top of VivaceGraph persistence. + +*** Verification — ~8 FiveAM tests +1. ~test-vivacegraph-roundtrip~ — save and load preserves all facts with provenance metadata. +2. ~test-prolog-query-returns-results~ — a query for all secret files returns the bootstrapped gate facts. +3. ~test-prolog-query-cross-domain~ — a query for contradictions between Wikidata and memex provenance returns correct results. +4. ~test-type-level-prevents-self-reference~ — a query from a type-level-2 function cannot return type-level-3 facts. +5. ~test-fact-store-fallback-without-vivacegraph~ — when VivaceGraph is not loaded, the hash-table fallback functions identically to Phase 1-4 behavior. +6. ~test-merkle-chain-tamper-detected~ — modifying a fact's value breaks the hash chain, detectable by re-walking the ~:parent-id~ spine. +7. ~test-ontology-version-query~ — querying with an old ~:ontology-version~ returns facts as they were under that worldview, not the current one. +8. ~test-reverification-flags-on-category-change~ — changing a category definition sets ~:re-verify-status :pending~ on all affected facts. + +~400 lines. New skill: ~symbolic-vivacegraph.org~. Depends on Phase 4 (sufficiency). Not an ASDF dependency — degrades to hash-table fallback. + +** v0.26.0: Hooks on defskill — Lifecycle Interception :PROPERTIES: :ID: id-v082-hooks :CREATED: [2026-05-08 Fri] @@ -194,9 +664,43 @@ Passepartout's skills can inject instructions and react to triggers but cannot i - ~:post-tool-hook~ receives ~(action context result)~, returns ~(values modified-result modified-context)~ or nil to leave unchanged. Called after tool execution. Useful for logging, auto-commit, notification. - ~:on-session-start~, ~:on-heartbeat~, ~:on-compact~ lifecycle hooks for maintenance skills - Hooks run in skill priority order. A ~:deny~ from any hook short-circuits the chain. -- This is Claude Code's PreToolUse pattern — 50 lines in ~defskill~ macro + ~core-perceive.lisp~ +~50 lines in ~defskill~ macro + ~core-perceive.lisp~ -*** TODO Prompt templates / output styles +** v0.27.0: Phase 6 — ACL2 Structural Verification (~200 lines, new skill) +:PROPERTIES: +:ID: id-v090-phase6 +:CREATED: [2026-05-09 Sat] +:END: + +Wrap ACL2 as a skill. Prove structural properties of the KG type hierarchy and rule sets. Not for empirical claims. + +*** Rationale + +ACL2 is often positioned as verifying LLM-proposed facts, but many facts are empirical ("this command is destructive on Linux"), not logical. The right role: structural verification. ACL2 proves that the type hierarchy has no cycles, that the rule set is non-contradictory, and that the gate-to-fact bootstrap preserves the Dispatcher's intent. These are structural properties that can be formally verified, not empirical claims that depend on external reality. + +*** Implementation — ~symbolic-acl2.org~ → ~symbolic-acl2.lisp~ (skill) + +**** Type consistency proofs +~(acl2-verify-type-hierarchy facts)~ — prove that the KG type hierarchy has no cycles: no entity of type-level 3 depends on an entity of type-level 5, no parent category has a child that subsumes it, no category is its own ancestor via the child-of relation. These are structural properties of the graph, independent of what the facts /say/. + +**** Rule set consistency +~(acl2-verify-rule-consistency rules)~ — prove that the accumulated Dispatcher rules (from HITL approvals) are non-contradictory: no rule allows a command that another rule blocks, no rule permits a path access that another denies. If the rule set is contradictory, ACL2 identifies the contradictory subset with the provenance of each rule. The human resolves the contradiction. + +**** Extraction verification +~(acl2-verify-bootstrap-preservation)~ — prove that the gate-to-fact bootstrap (Phase 0-1) preserves the Dispatcher's intent: every blocked pattern in the gate stack maps to a fact in the store; every fact with ~:provenance :gate-outcome~ is grounded in a specific gate vector; no gate-bootstrapped fact contradicts another gate-bootstrapped fact. + +**** Not in scope +ACL2 does not verify that ~rm -rf / is destructive. That is an empirical claim about Linux. Screamer handles empirical consistency (does this new claim contradict existing observations?). ACL2 handles structural consistency (does this reasoning structure have formal flaws?). The boundary is: empirical claims → Screamer; structural claims → ACL2. + +*** Verification — ~4 FiveAM tests +1. ~test-acl2-type-hierarchy-no-cycles~ — a synthetic KG with a type-level cycle is detected and reported. +2. ~test-acl2-rule-set-contradiction-detected~ — two Dispatcher rules that contradict each other produce a contradiction report with provenance. +3. ~test-acl2-bootstrap-preservation~ — the bootstrap extraction from the gate stack is verified to have no missing or extra facts. +4. ~test-acl2-not-loaded-graceful-degradation~ — when ACL2 is not installed, the skill loads but returns ":ACL2 not available — structural verification disabled" without crashing. + +~200 lines. New skill: ~symbolic-acl2.org~. Depends on Phase 5 (VivaceGraph). Not an ASDF dependency — degrades gracefully. + +** v0.28.0: Prompt Templates / Output Styles :PROPERTIES: :ID: id-v082-prompt-styles :CREATED: [2026-05-08 Fri] @@ -213,7 +717,7 @@ Claude Code has "output styles" (~default~, ~Explanatory~, ~Learning~). Hermes h - Style changes are immediate (next think() call). Survive restarts via config persistence. ~100 lines (~60 prompt templates + ~40 TUI integration). -*** TODO Skill auto-detection — file-watch hot-reload +** v0.29.0: Skill Auto-Detection — File-Watch Hot-Reload :PROPERTIES: :ID: id-v082-auto-reload :CREATED: [2026-05-08 Fri] @@ -231,7 +735,7 @@ Passepartout's image-based Lisp model enables hot-reload — redefine a function - On compile error: keep the old version loaded, log the error, show TUI warning: ~"✗ Skill 'skill-name' failed to compile — old version retained."~ ~80 lines in a new ~symbolic-file-watch.org~ skill. -*** TODO Heavy thinking skill — parallel reasoning + sequential deliberation +** v0.30.0: Heavy Thinking Skill — Parallel Reasoning + Sequential Deliberation :PROPERTIES: :ID: id-v082-heavy-thinking :CREATED: [2026-05-08 Fri] @@ -249,11 +753,7 @@ The HeavySkill paper (arXiv:2605.02396v1) demonstrates that a two-stage pipeline - Cost model: 3 parallel × 1 deliberation = 4 API calls for complex tasks (vs 1 normally). ~HEAVY_THINKING_COST_MULTIPLIER~ env var for cost-aware auto-activation ~100 lines as a skill (~60 prompt template + ~40 orchestration in ~symbolic-heavy-thinking.org~). -** v0.8.3: Direction 3 — Adaptive Layout + Personality - -The TUI adapts to the terminal it's running in — full sidebar at ultrawide, compact at standard, minimal at narrow (phone/SSH). It has a personality: spinner style, relative timestamps, progress bars, live context help. - -*** TODO Adaptive layout (3 tiers) +** v0.31.0: Adaptive Layout (3 Tiers) :PROPERTIES: :ID: id-v073-adaptive-layout :CREATED: [2026-05-08 Fri] @@ -265,7 +765,7 @@ The TUI adapts to the terminal it's running in — full sidebar at ultrawide, co Re-renders on terminal resize (already handled via ~KEY_RESIZE~). Content re-flows — not truncated. The layout remembers per-terminal-size preference. ~80 lines. -*** TODO Spinner personality +** v0.32.0: Spinner Personality :PROPERTIES: :ID: id-v073-spinner :CREATED: [2026-05-08 Fri] @@ -281,7 +781,7 @@ Configurable spinner style per skin: Stall indication: when no response for 10s, spinner color interpolates from theme color → error red (Claude Code pattern). Reduced motion preference: spinner replaced with slow-pulse ●. ~50 lines. -*** TODO Progress bar +** v0.33.0: Progress Bar :PROPERTIES: :ID: id-v073-progress-bar :CREATED: [2026-05-08 Fri] @@ -293,7 +793,7 @@ For measurable operations (file processing, test runs with known count, batch op Uses 9 block characters for sub-character precision: ~[' ', '▏', '▎', '▍', '▌', '▋', '▊', '▉', '█']~ (Claude Code pattern). Color-coded by progress: red <25%, yellow 25-75%, green 75%+. ~25 lines. -*** TODO Live timestamps +** v0.34.0: Live Timestamps :PROPERTIES: :ID: id-v073-timestamps :CREATED: [2026-05-08 Fri] @@ -305,7 +805,7 @@ Uses 9 block characters for sub-character precision: ~[' ', '▏', '▎', '▍', - Timestamps update live (per-minute recalculation, not per-frame) ~40 lines. -*** TODO Context-sensitive help +** v0.35.0: Context-Sensitive Help :PROPERTIES: :ID: id-v073-help :CREATED: [2026-05-08 Fri] @@ -319,27 +819,106 @@ Press ~?~ to show available actions in current context: Rendered as a dim help bar at the bottom of the screen (above input). Dismisses on any key or after 5 seconds. ~40 lines. -** v0.9.0: Signal Pipeline, Concurrency & Streaming +** v0.36.0: Phase 7 — 10-80-10 Planner (~500 lines, new skill, last phase) +:PROPERTIES: +:ID: id-v090-phase7 +:CREATED: [2026-05-09 Sat] +:END: -*(Renumbered from old v0.7.0. Streaming moved to v0.7.1; streaming section removed below.)* +The final neurosymbolic phase: a planning engine built on the mature symbolic index. Screamer expresses task planning as a constraint satisfaction problem. ACL2 verifies plans for structural soundness. The LLM handles the I/O boundaries (natural language → structured goal ← natural language response). The symbolic engine handles the reasoning. -The current pipeline is strictly sequential — one signal traverses Perceive → Reason → Act before the next signal begins. Background tasks (heartbeat, embedding cron, gardener scans) compete with foreground interactions. A heartbeat that fires during a long tool chain is queued. A Telegram message during a multi-step planning cycle is queued. The system feels sluggish under concurrent load even though the symbolic operations are near-instant (SBCL hash table lookups are microseconds) — the bottleneck is the single-pipeline architecture, not the hardware. +*** Rationale -*Design insight: why concurrency matters for an agent that is "one brain."* Passepartout rejects multi-agent delegation on principle (see DESIGN_DECISIONS "One Single Agent"). But a single brain handles multiple inputs simultaneously — the human brain processes vision, audio, and proprioception in parallel. Rejecting multi-agent delegation does not require rejecting concurrency within the agent. The key is that all concurrent operations share the same memory space, the same Merkle tree, and the same deterministic gate stack. They are threads of one cognition, not separate agents. +This is the culmination — it requires a populated, queried, and trusted symbolic index. The full planner is useless without a mature ontology and a proven deducer. By the time Phase 7 begins, Phases 0-6 have accumulated months of gate outcomes, Screamer deductions, verified LLM proposals, and human-authored facts. The symbolic index has achieved sufficiency. The ontology has stabilized through use. The planner is built on a foundation, not a speculation. -*** TODO Priority-queue signal processing -- Replace the linear ~process-signal~ call chain with a priority-ordered signal queue. The queue is a sorted plist-list consumed by the main loop. Priority tiers: - - ~:user-input~ / ~:chat-message~ — highest priority (the user is waiting) - - ~:approval-required~ — high (HITL re-injections need quick resolution) - - ~:tool-output~ — medium (feedback from tool execution, needs LLM assessment) - - ~:interrupt~ — medium-high (shutdown signal) - - ~:heartbeat~ / ~:cron~ / ~:delegation~ — low (background maintenance) +*** Implementation — ~symbolic-planner.org~ → ~symbolic-planner.lisp~ (skill) + +**** Task decomposition as constraint satisfaction +The user specifies a goal: "refactor the authentication module to support OAuth2." The LLM translates this to a structured goal plist. Screamer expresses the planning problem: + +- /Variables/: subtasks (write OAuth2 client, add token store, update auth middleware, write tests, update documentation) +- /Constraints/: dependency ordering (tests depend on implementation), resource limits (one file write at a time), safety invariants (no modification of ~core-*~ files) +- /Objective/: find an ordering that satisfies all constraints + +Screamer returns a viable plan or reports unsolvability with the conflicting constraints. + +**** Plan verification +ACL2 proves that the plan contains no deadlocks (two subtasks waiting on each other), no dependency cycles (A depends on B depends on C depends on A), and no safety violations (no plan step requires a gate-blocked operation). + +If verification fails, ACL2 identifies the failing subtask and the violated constraint. The planner re-decomposes the problematic branch (the existing ROADMAP's branch pruning, v0.61.0, but symbolically rather than neurally). + +**** Neuro-symbolic boundary +The LLM handles the I/O boundaries: +- *Input* (10%): natural language → structured goal plist. "Refactor auth for OAuth2" → ~(:goal :refactor-component :target :auth-module :add-feature :oauth2)~. Small prompt, formulaic translation, ~100 tokens. +- *Reasoning* (80%): Screamer plans. ACL2 verifies. VivaceGraph provides the facts about file structure, dependencies, and gate constraints. Zero LLM tokens. +- *Output* (10%): structured plan → natural language response. The verified plan plist is formatted as "I'll refactor the authentication module in 5 steps: 1) Create the OAuth2 client (depends on: nothing, modifies: auth/client.lisp) 2) Add the token store..." Small prompt, formulaic translation, ~150 tokens. + +**** TUI visualization +The plan is rendered as an Org headline tree in the TUI, with each subtask as a node showing its terminal state (=todo=, =next-action=, =in-progress=, =done=, =blocked=, =stuck=), its constraints, and its verified properties. This is the same task tree visualization planned for v0.61.0, but with the addition of Screamer constraint annotations and ACL2 verification badges. + +*** Verification — ~6 FiveAM tests +1. ~test-goal-plist-from-natural-language~ — natural language input produces correct structured goal plist (LLM-dependent but formulaic; tested with deterministic mock). +2. ~test-screamer-plan-satisfies-constraints~ — Screamer produces a plan that satisfies all specified dependencies and safety constraints. +3. ~test-screamer-report-unsolvable~ — Screamer reports unsolvability when constraints are contradictory. +4. ~test-acl2-verifies-plan-no-cycles~ — ACL2 verifies a valid plan has no dependency cycles. +5. ~test-acl2-rejects-cyclic-plan~ — ACL2 detects a dependency cycle in an invalid plan. +6. ~test-plan-to-natural-language~ — structured plan plist produces readable natural language output. + +~500 lines. New skill: ~symbolic-planner.org~. Depends on Phase 6 (ACL2) + all prior phases. + +** v0.36.1: Phase 8+ — Semantic Wikipedia Integration (TBD lines, optional acceleration) +:PROPERTIES: +:ID: id-v090-phase8 +:CREATED: [2026-05-10 Sun] +:END: + +Load Wikidata entities referenced in the memex into the symbolic index. Every entity the user's prose mentions gets its Wikidata property graph — type hierarchy, relations, dates, citations — as triples with ~:provenance :wikidata~. + +*** Rationale +The gate stack provides 50-70 entity classes — adequate for a coding agent. For a general-knowledge memex containing literature, philosophy, history, science, and daily life, 50-70 is starvation. Organic growth through prose extraction (Phase 3) would take years to cover the entities mentioned in a single reading of /Pale Fire/. Wikidata has already done this work at scale. + +The LLM's role in extraction shrinks dramatically. Without Wikidata, the archivist must /discover/ that Nabokov wrote /Pale Fire/, lectured on Kafka, and emigrated from Russia — extracting each triple from prose. With Wikidata, the Nabokov entity is pre-structured. The archivist's job changes from "discover entities" to "connect your heading to the existing entity." + +*** Implementation sketch +1. *Index referenced entities.* Scan memex prose for entity names (capitalized noun phrases, names in Org links, headings in =literature/= directories). For each, attempt Wikidata entity resolution (string match, disambiguation via context). +2. *Load N-hop property net.* For each resolved entity, load its Wikidata properties: instance-of, subclass-of, authored, published-in, influenced-by, birth-date, death-date, etc. Load the same for entities directly connected to it (1-hop neighbors). Optionally expand to 2-hop for deeply connected domains. +3. *Admit with plural policy.* Wikidata facts are admitted with ~:provenance :wikidata~ and cardinality policy ~:plural~. They do not override your memex's facts. Disagreements are surfaced, not resolved. +4. *Cross-domain query.* "What does my memex say about Nabokov that Wikidata doesn't?" "Where does my memex disagree with Wikidata?" "What entities in my memex have no Wikidata counterpart?" These queries are pure VivaceGraph traversals — zero LLM tokens. + +*** Not a Phase 0 prerequisite +Semantic Wikipedia integration is an accelerator, not a prerequisite. Phases 0-7 work without it. Wikidata compresses the timeline for the broad domain but does not change the architecture. The admission gate (Screamer), contradiction policies, provenance tracking, and neuro-symbolic boundary are identical with or without it. + +*** Open question +How much Wikidata is the right amount? Loading entities referenced in the memex is the minimum. Loading all entities within N hops of those references expands the graph exponentially. The right N depends on the memex's breadth and the user's query patterns. A memex focused entirely on software engineering may need only 1 hop. A memex spanning literature, history, philosophy, and science may need 3-4 hops. + +TBD lines. New skill. Depends on Phase 5 (VivaceGraph). + +** v0.37.0: Priority-Queue Signal Processing + +:PROPERTIES: +:ID: id-v090-priority-queue +:CREATED: [2026-05-08 Fri] +:END: + +Replace the linear ~process-signal~ call chain with a priority-ordered signal queue. The queue is a sorted plist-list consumed by the main loop. Priority tiers: +- ~:user-input~ / ~:chat-message~ — highest priority (the user is waiting) +- ~:approval-required~ — high (HITL re-injections need quick resolution) +- ~:tool-output~ — medium (feedback from tool execution, needs LLM assessment) +- ~:interrupt~ — medium-high (shutdown signal) +- ~:heartbeat~ / ~:cron~ / ~:delegation~ — low (background maintenance) - Coalesce duplicate heartbeats: if the queue already contains a ~:heartbeat~ signal when a new one arrives, discard the older one (no value in processing stale ticks). Keep at most one pending heartbeat at any time. - The main loop drains the highest-priority signal from the queue, processes it through the pipeline, and repeats. If the pipeline produces feedback (tool-output → think), the feedback is enqueued at its appropriate priority — it may preempt background signals but won't interrupt the current signal mid-processing. - Add telemetry: average queue depth by priority tier, max wait time per tier. - TUI ~/reconnect~ command: when the connection-loss detection from v0.3.3 fires, the user can reconnect without restarting the TUI. The command closes the stale socket, re-runs ~connect-daemon~ with its retry backoff, and restores the ~:connected~ state on success. -*** TODO MVCC memory concurrency +~80 lines in ~core-pipeline.lisp~ + ~30 lines TUI. + +** v0.38.0: MVCC Memory Concurrency +:PROPERTIES: +:ID: id-v090-mvcc +:CREATED: [2026-05-08 Fri] +:END: + - Replace ~*memory-store*~ (mutable global hash table) with a versioned Merkle-root pointer. The root is an ~(or null merkle-node)~ struct containing the tree and a monotonic version counter. - Read threads snapshot the root before beginning their pipeline cycle. All object lookups dereference through the snapshot — they see a consistent view of memory regardless of concurrent writes. Reads never block. - Write threads (ingest-ast, org-modify, snapshot-memory) build new object hashes, construct a new Merkle root, and CAS-replace the global root pointer. If another thread won the CAS race (root version changed), the loser re-reads the new root, replays its changes on the updated tree, and retries the CAS. @@ -347,14 +926,24 @@ The current pipeline is strictly sequential — one signal traverses Perceive - Remove the single-threaded pipeline assumption: previously, ~process-signal~ was safe because nothing else wrote to ~*memory-store*~ during its execution. With MVCC, multiple signals can process concurrently because each has its own snapshot. The ~*loop-interrupt-lock*~ becomes ~*signal-queue-lock*~ (protecting only the queue, not the memory). - Test: concurrent ingest-ast from two threads writing to different memory objects, verify both commits succeed without corruption. -*** TODO Structured output enforcement +~60 lines in ~core-memory.lisp~. + +** v0.39.0: Structured Output Enforcement +:PROPERTIES: +:ID: id-v090-structured-output +:CREATED: [2026-05-08 Fri] +:END: + - Add a plist validation step between ~markdown-strip~ and ~read-from-string~ in ~think()~. Before attempting to parse, validate: (a) the output starts with ~(~ or ~[~, (b) it contains balanced delimiters (count opens vs closes), (c) it doesn't contain ~#.~ (redundant after v0.3.1 ~*read-eval* nil~ but defense-in-depth). - On validation failure: construct a rejection trace (similar to the existing deterministic gate rejection feedback) and re-inject into the LLM prompt. The trace includes the raw output and a diagnostic ("Your response did not produce a valid plist. Ensure it starts with ( and has balanced parentheses."). - Configurable ~LLM_OUTPUT_RETRIES~ (default 2). After exhausting retries, fall through with the raw text as a ~:MESSAGE~ action (current behavior). - Track parse-failure rate per provider in telemetry. Use to guide provider cascade ordering: a provider with 20% parse-failure rate falls behind one with 2%. - If retries are exhausted without a parseable plist, the TUI renders the raw LLM output in a dimmed, collapsible region labeled "Parse failure — could not interpret this response." The user can inspect what the model produced. -*** TODO Doom-loop detection — 3 identical tool calls triggers HITL +~40 lines in ~core-reason.lisp~. + +** v0.40.0: Doom-Loop Detection + :PROPERTIES: :ID: id-v090-doom-loop :CREATED: [2026-05-08 Fri] @@ -368,7 +957,8 @@ OpenCode detects 3 consecutive identical tool calls and prompts the user. Withou - Resets on any different tool call or successful output ~15 lines in ~core-loop-act.lisp~ -*** TODO Busy-mode — queue on interrupt +** v0.41.0: Busy-Mode — Queue on Interrupt + :PROPERTIES: :ID: id-v090-busy-mode :CREATED: [2026-05-08 Fri] @@ -382,7 +972,8 @@ When the agent is processing a turn and the user types a message, the current be - The priority queue (above) naturally supports this — user input queued during a turn has higher priority than heartbeats, lower than the active turn ~20 lines in ~core-pipeline.lisp~ -*** TODO CLI / non-interactive mode — ~passepartout ask~ +** v0.42.0: CLI / Non-Interactive Mode + :PROPERTIES: :ID: id-v090-cli :CREATED: [2026-05-08 Fri] @@ -397,7 +988,8 @@ Claude Code supports ~claude -p "fix the failing test" --print~. Hermes has ~her - Uses the existing wire protocol — no new protocol, just a CLI wrapper around the framed TCP message format ~80 lines in ~passepartout~ bash script + ~50 lines daemon handler. -*** TODO Provider health tracking — success rate + latency +** v0.43.0: Provider Health Tracking + :PROPERTIES: :ID: id-v090-provider-health :CREATED: [2026-05-08 Fri] @@ -412,7 +1004,8 @@ Claude Code supports ~claude -p "fix the failing test" --print~. Hermes has ~her - Telemetry: provider health data feeds the session telemetry system ~60 lines in ~neuro-provider.lisp~ + ~30 lines TUI. -*** TODO Cost-based provider routing +** v0.44.0: Cost-Based Provider Routing + :PROPERTIES: :ID: id-v090-cost-routing :CREATED: [2026-05-08 Fri] @@ -426,7 +1019,8 @@ Claude Code supports ~claude -p "fix the failing test" --print~. Hermes has ~her - ~/routing~ TUI command: displays current cascade order with scores and reasons ~40 lines in ~core-reason.lisp~ -*** TODO Intelligent provider fallback — per-task-type routing +** v0.45.0: Intelligent Provider Fallback — Per-Task-Type Routing + :PROPERTIES: :ID: id-v090-intelligent-fallback :CREATED: [2026-05-08 Fri] @@ -441,23 +1035,8 @@ Current fallback is "try the next provider." But different providers excel at di - Bootstrap from defaults: GPT-4/Claude for reasoning, DeepSeek for code, Groq for chat, local Ollama for reflex ~60 lines in ~neuro-router.lisp~ -*** TODO Internal evaluation harness — 10 tasks, regression detection -:PROPERTIES: -:ID: id-v090-eval-harness -:CREATED: [2026-05-08 Fri] -:END: +** v0.46.0: Autonomous Certification Badge -When moved from v0.12.0: the internal eval harness must ship before v0.10.0 so it can validate the Signal Pipeline (v0.9.0) and catch regressions from MCP Tools (v0.10.0), Planning (v0.11.0), and beyond. The SWE-bench competitive scoring harness remains at v0.12.0 — this is the lightweight internal suite. - -- New skill: ~symbolic-evaluation.org~ → ~symbolic-evaluation.lisp~ -- ~deftask~ macro: define an eval task with ~:setup~ (create test environment), ~:prompt~ (what to ask the agent), ~:verify~ (function that checks the output), ~:teardown~ (cleanup) -- ~run-eval-suite~: run all registered tasks, produce score (pass count / total), per-task diagnostics -- Initial 10 tasks: find TODOs, create Org note, search codebase, read file, query memory, list projects, run safe shell command, find definition, set TODO state, summarize session -- Regression mode: run after each version build. Fail CI if score drops. -- Task suite grows with codebase: every bug fix adds a regression task -~200 lines. - -*** TODO Autonomous certification badge :PROPERTIES: :ID: id-v090-certification :CREATED: [2026-05-08 Fri] @@ -472,7 +1051,8 @@ After N HITL approvals of the same pattern, the dispatcher auto-approves it. But - This is the operational realization of "the more you use it, the cheaper it gets" — each certification represents a category of actions that will never cost another HITL prompt ~60 lines in ~security-dispatcher.lisp~ + sidebar rendering reuse. -*** TODO Autonomous certification progress bar — visible "learning" indicator +** v0.47.0: Certification Progress Bar + :PROPERTIES: :ID: id-v090-cert-progress :CREATED: [2026-05-08 Fri] @@ -485,7 +1065,8 @@ The certification badge grants permanent auto-approval. Users need to see this h - Certification velocity: ~"+2 certified this week"~ trend indicator in sidebar ~30 lines on top of existing sidebar rendering. -*** TODO Update mechanism + migrations +** v0.48.0: Update Mechanism + Migrations + :PROPERTIES: :ID: id-v090-update :CREATED: [2026-05-08 Fri] @@ -497,10 +1078,11 @@ No update mechanism exists. Users must manually ~git pull~ and re-run ~passepart - ~passepartout update~ (git-based) — ~git fetch --tags && git checkout v0.5.1~, incremental tangle (only org files changed since previous tag, via ~git diff --name-only v0.5.0..v0.5.1 -- org/*.org~), recompile changed lisp files, restart daemon - Migration hooks: ~~/memex/system/migrations/~ — ordered Lisp scripts run after tangle, before daemon restart. ~migrate-v051.lisp~ upgrades memory format, config schema, package names. Tracked by ~*migration-version*~ in ~~/.config/passepartout/version.lisp~ - Post-update verification: run internal eval suite, verify skill count ≥ 10, smoke test daemon port 9105. On failure: ~passepartout update --rollback~ → ~git checkout v0.5.0~ → re-tangle → restart -- Binary update path (when v0.14.0 ships): download binary from GitHub Releases, verify SHA-256, replace, restart +- Binary update path (when v0.63.0 ships): download binary from GitHub Releases, verify SHA-256, replace, restart ~80 lines bash + ~50 lines Lisp. -*** TODO Self-configuration — agent proposes and applies config changes +** v0.49.0: Self-Configuration — Agent Proposes and Applies Config Changes + :PROPERTIES: :ID: id-v090-self-config :CREATED: [2026-05-08 Fri] @@ -516,27 +1098,24 @@ Passepartout's config is text files (`.env`, `.lisp`) — the same format the ag Three tiers of self-configuration: 1. **Config Query** (v0.7.2) — "What providers do I have?" → answered from system prompt CONFIG section. Already implemented. -2. **Config Suggest** (v0.9.0) — "Should I use a cheaper model?" → agent analyzes telemetry, proposes specific config change with estimated savings. User decides. -3. **Config Apply** (v0.9.0) — "Add @credentials to privacy tags" → agent proposes change → HITL review → writes `.env` → daemon reloads → change takes effect within one think() cycle. -4. **Config Optimize** (v0.9.0) — "Make yourself cheaper" → agent analyzes cost patterns across all sessions, proposes multi-key optimization. User approves full batch. +2. **Config Suggest** (v0.49.0) — "Should I use a cheaper model?" → agent analyzes telemetry, proposes specific config change with estimated savings. User decides. +3. **Config Apply** (v0.49.0) — "Add @credentials to privacy tags" → agent proposes change → HITL review → writes `.env` → daemon reloads → change takes effect within one think() cycle. +4. **Config Optimize** (v0.49.0) — "Make yourself cheaper" → agent analyzes cost patterns across all sessions, proposes multi-key optimization. User approves full batch. + +** v0.50.0: Self-Diagnosis Coach — ~/coach~ Command -*** TODO Self-diagnosis coach — ~/coach~ command :PROPERTIES: :ID: id-v090-coach :CREATED: [2026-05-08 Fri] :END: -Telemetry data (v0.9.0) plus the agent's self-knowledge enables coaching: the agent detects workflow anti-patterns and suggests improvements. +Telemetry data plus the agent's self-knowledge enables coaching: the agent detects workflow anti-patterns and suggests improvements. -- ~/coach~ — analyzes telemetry from the last N sessions, produces a coaching report with 3-5 actionable tips: - - ~"💡 Tip: You type full file paths 89% of the time. Try @mention autocomplete (type @ then start typing a filename) — it's 3x faster and learns your most-used files."~ - - ~"💡 Tip: You've approved 47 git status commands. This pattern can be auto-certified to skip future HITL. /certifications to review."~ - - ~"💡 Tip: Your average context usage is 78%. Consider increasing CONTEXT_MAX_TOKENS for more awareness, or using /focus to reduce irrelevant context."~ - - ~"💡 Tip: You use /theme 0 times. Passepartout has 8 themes. Try /theme gruvbox for a warmer terminal feel."~ -- Coaching data sources: command frequency, HITL approval patterns, context usage history, feature adoption rate, telemetry aggregates -- Coaching is opt-in (privacy-respecting — no data leaves the machine). ~50 lines in telemetry skill + ~30 lines TUI rendering. +- ~/coach~ — analyzes telemetry from the last N sessions, produces a coaching report with 3-5 actionable tips. Coaching is opt-in (privacy-respecting — no data leaves the machine). +~50 lines in telemetry skill + ~30 lines TUI rendering. + +** v0.51.0: Failure Attribution — Tag Task Failures with Probable Component -*** TODO Failure attribution — tag task failures with probable component :PROPERTIES: :ID: id-v090-failure-attribution :CREATED: [2026-05-08 Fri] @@ -544,55 +1123,27 @@ Telemetry data (v0.9.0) plus the agent's self-knowledge enables coaching: the ag AHE (arXiv:2604.25850v2) shows that evolution loops work when failures are attributed to specific harness components, not just "the task failed." Passepartout's telemetry records task outcomes but doesn't classify failures by root cause. -- In telemetry skill: when a session ends with a task failure (agent couldn't complete, user interrupted with denial, or dispatcher blocked irrecoverably), the telemeter classifies the failure as one of: ~:tool-failure~ (tool timeout, tool error), ~:gate-overblock~ (dispatcher blocked a necessary command), ~:gate-underblock~ (dispatcher allowed a harmful command), ~:reasoning-error~ (LLM produced a wrong answer), ~:context-overflow~ (context budget exhausted), ~:timeout~ (session timeout) +- In telemetry skill: when a session ends with a task failure, classify as: ~:tool-failure~, ~:gate-overblock~, ~:gate-underblock~, ~:reasoning-error~, ~:context-overflow~, ~:timeout~ - Classification is deterministic: if last action was blocked by dispatcher → gate-overblock. If last action was a tool error → tool-failure. If last action was a successful tool call but wrong output → reasoning-error. -- Feeds the Skill Creator (v0.11.0) — the agent knows *which* component to fix, not just *that* something went wrong +- Feeds the Skill Creator (v0.57.0) — the agent knows *which* component to fix, not just *that* something went wrong ~20 lines in telemetry skill. -** v0.10.0: Tool Ecosystem (MCP-Native) + Voice Gateway +** v0.52.0: MCP Native Client -*(Renumbered from old v0.8.0.)* +:PROPERTIES: +:ID: id-v100-mcp +:CREATED: [2026-05-08 Fri] +:END: -The original roadmap placed MCP at v0.9.0 and planned "10+ cognitive tools" built from scratch for v1.0.0. This is inverted: the ecosystem already provides 50+ tools (filesystem, git, postgres, slack, github, web search, memory servers). Building bespoke tools from scratch duplicates work the community has already done and tested. Passepartout's advantage is not in tool *implementation* but in tool *orchestration* — the deterministic gate stack that verifies every tool invocation before execution. - -*Why MCP matters for competitive positioning:* Claude Code's native tools (Read, Write, Edit, Bash, Grep, Glob, WebSearch) are implemented in TypeScript within the Claude Code runtime. They are not extensible — you cannot add a tool without modifying the runtime. OpenClaw's tools are similarly baked into the Node.js process. By building a native MCP client, Passepartout gains tool breadth that exceeds both competitors (50+ tools via the MCP ecosystem versus ~10 native tools) without building a single tool implementation. The tool quality is maintained by the ecosystem; the safety verification is maintained by Passepartout's gate stack. This division of labor is the right architecture for a small team building a competitor to well-funded commercial agents. - -*** TODO MCP native client - Pure Common Lisp MCP client: parse JSON-RPC messages from MCP servers over stdio or SSE. No Python bridge, no Node.js subprocess. The client runs in the same Lisp image as the agent — zero serialization overhead between the agent and the MCP layer. - Implement the MCP protocol lifecycle: initialize handshake, list tools, call tool, handle notifications. Each MCP server registers its tools as entries in Passepartout's ~*cognitive-tool-registry*~ at connection time — the LLM's tool belt prompt automatically expands to include them. - ~MCP_SERVERS~ env var: comma-separated paths to MCP server config files (JSON). Each config specifies the server command, args, and env vars. Example: =MCP_SERVERS=~/.config/passepartout/mcp/filesystem.json,~/.config/passepartout/mcp/git.json=. - Tool invocation route: LLM proposes a tool call → Dispatcher verifies against permission table → MCP client serializes call as JSON-RPC → server executes → result deserialized back to plist → returned to LLM as tool output. The Dispatcher does not distinguish between native tools and MCP tools — the gate stack is uniform. - Register the MCP client as a skill (~defskill~~:passepartout-mcp-client~) so it can be hot-reloaded. The MCP client is not core infrastructure — it is a skill that extends the tool ecosystem. -*** TODO Core MCP tools (from existing roadmap items) -- Git Steward: status, diff, commit, push, branch via the MCP Git server. Policy gate enforces commit-before-modify: any file write to a git-tracked directory must be preceded by a diff review. -- Web Research: headless browser via Puppeteer/Playwright MCP server. Text extraction, screenshot capture, page interaction. -- Interactive PTY: stream long-running process output to context window, async interrupt control. +~200 lines as a new skill ~mcp-client.org~. -*** TODO TUI tool visualization -- Already implemented in v0.8.1 (tool execution visualization). This TODO confirms the rendering path works for MCP tools as well as native tools — no distinction at the TUI level. - -*** TODO Environment Steward -- Detect "command not found" in shell actuator output. -- Search system PATH and package manager registries for the missing command. -- Propose installation command and retry the failed action on user approval. -- Cache resolved dependency paths to avoid repeated searches. - -*** TODO Channels + providers — match OpenClaw on demand -:PROPERTIES: -:ID: id-v100-channels -:CREATED: [2026-05-08 Fri] -:END: - -The daemon protocol is client-agnostic hex-framed plists over TCP. Every new channel is a new client that speaks the same protocol. OpenClaw's 23+ channels are trivially copyable — each platform needs a poll loop + send function, ~30 lines each. LLM providers are a row in ~*provider-cascade*~ — a new entry in ~neuro-provider.lisp~ with API endpoint + token pricing. Neither deserves its own release. - -- Channels: match OpenClaw's 23+ channels on demand. The Emacs bridge (already done, v0.4.0) proves the pattern. Each new platform (WhatsApp, iMessage, Matrix, IRC, etc.) is a skill that registers a poll-fn + send-fn. ~30 lines per channel. -- Providers: match OpenClaw/Hermes on provider count. Adding a new provider is a table entry in ~neuro-provider.lisp~: name, API endpoint, model list, pricing. ~20 lines per provider. -- Voice: STT + TTS are REST wrappers (~whisper~ / ~elevenlabs~ / ~espeak~). Already spec'd as a skill. ~50 lines. - -No separate releases. Done when needed, shipped when ready. - -*** TODO Web search + web fetch tools — ~search-web~, ~fetch-web~ +** v0.53.0: Web Search + Web Fetch Tools :PROPERTIES: :ID: id-v100-web :CREATED: [2026-05-08 Fri] @@ -605,7 +1156,8 @@ Claude Code has ~WebSearchTool~ + ~WebFetchTool~. Hermes has ~firecrawl-py~ + ~e - Both register via ~def-cognitive-tool~ as read-only tools (auto-approve via v0.7.2 safe-tool allowlist) ~150 lines as a new skill ~programming-web.org~. No external Python/Node.js process. -*** TODO LSP integration — language server protocol client +** v0.54.0: LSP Integration + :PROPERTIES: :ID: id-v100-lsp :CREATED: [2026-05-08 Fri] @@ -620,7 +1172,23 @@ Claude Code uses LSP for code intelligence — find definitions, find references - LSP servers installed by the user (e.g., ~npm install -g typescript-language-server~). Passepartout auto-discovers installed servers via PATH. ~200 lines. Register as read-only cognitive tools. No daemon protocol changes — LSP is a background process, not a rendering concern. -*** TODO Auto-saved session transcripts — ~/memex/system/sessions/~ +** v0.55.0: ~debug-inspect~ Cognitive Tool + +:PROPERTIES: +:ID: id-v100-debug-inspect +:CREATED: [2026-05-08 Fri] +:END: + +Lisp enables live state inspection that no TypeScript/Python agent can match. Claude Code has no REPL. Passepartout can inspect and modify its own running state. + +- ~debug-inspect~ cognitive tool: evaluates a Lisp form in the running image and returns the result as a structured plist. Parameters: ~code~ (Lisp form string), ~package~ (optional). +- Read-only tool: auto-approve via v0.7.2 safe-tool allowlist. No side effects — inspection only. +- Use cases: ~(hash-table-count *memory-store*)~, ~(inspect memory-object-by-id "node-42")~, ~(map 'list #'car *skill-registry*)~ +- The agent can introspect its own state to answer meta-questions: "How many objects are in memory?" "What skills are loaded?" "What was the last HITL decision?" +~30 lines in ~programming-repl.lisp~ (extends existing repl-eval with safety guard). + +** v0.56.0: Session Transcripts — ~/memex/system/sessions/~ + :PROPERTIES: :ID: id-v100-transcripts :CREATED: [2026-05-08 Fri] @@ -636,7 +1204,8 @@ Passepartout has no session persistence beyond Merkle tree snapshots. Chat histo - Survives daemon restarts. Resume via ~/resume ~ (existing session resume from v0.7.2) ~80 lines in ~core-transport.lisp~ (append on message send) + reuse existing Org rendering. -*** TODO Auto-memory extraction — learnings from sessions +** v0.57.0: Auto-Memory Extraction — Learnings from Sessions + :PROPERTIES: :ID: id-v100-auto-memory :CREATED: [2026-05-08 Fri] @@ -651,7 +1220,8 @@ Claude Code's ~extractMemories~ runs at the end of each query loop, scanning the - Opt-out via ~AUTO_MEMORY=false~ env var. Extraction frequency capped at one per minute to prevent runaway API costs. ~80 lines in ~core-reason.lisp~ + reuse session transcript for context. -*** TODO Universal cross-project Org query +** v0.58.0: Universal Cross-Project Org Query + :PROPERTIES: :ID: id-v100-org-query :CREATED: [2026-05-08 Fri] @@ -663,62 +1233,23 @@ Passepartout's entire memex is Org — one format for memory, tasks, documents, - ~(org-query :property "DEADLINE" :before "-1d")~ — overdue items. Feeds ~/agenda~ command. - ~(org-query :where "dispatch" :in-title-p t)~ — search headlines containing a term across all projects. - ~(org-query :limit 20 :sort :priority)~ — sorted, capped results. -- This is the infrastructure that makes the GTD weekly review (v0.13.0) possible — pure Lisp tree traversal with no external database. ~150 lines in ~programming-org.lisp~ (extends existing Org manipulation primitives). -*** TODO ~debug-inspect~ cognitive tool — live state inspection +** v0.59.0: Skill Creator — LLM-Drafted, Verified Skills + :PROPERTIES: -:ID: id-v100-debug-inspect +:ID: id-v110-skill-creator :CREATED: [2026-05-08 Fri] :END: -Lisp enables live state inspection that no TypeScript/Python agent can match. Claude Code has no REPL. Passepartout can inspect and modify its own running state. - -- ~debug-inspect~ cognitive tool: evaluates a Lisp form in the running image and returns the result as a structured plist. Parameters: ~code~ (Lisp form string), ~package~ (optional). -- Read-only tool: auto-approve via v0.7.2 safe-tool allowlist. No side effects — inspection only. -- Use cases: ~(hash-table-count *memory-store*)~, ~(inspect memory-object-by-id "node-42")~, ~(map 'list #'car *skill-registry*)~ -- The agent can introspect its own state to answer meta-questions: "How many objects are in memory?" "What skills are loaded?" "What was the last HITL decision?" -- ~30 lines in ~programming-repl.lisp~ (extends existing repl-eval with safety guard). - -*** Competitive Advantage Analysis — v0.10.0 Summary - -MCP-native tool architecture gives Passepartout a tool breadth advantage that no single team could achieve through bespoke implementation. The MCP ecosystem is growing faster than any individual agent's tool set. By connecting to it rather than competing with it, Passepartout's tool count scales with the ecosystem — every new MCP server is a new Passepartout tool. - -The Dispatcher's tool permission table (allow/ask/deny) applies uniformly to MCP tools, giving Passepartout tool-level security granularity that competitors lack. Claude Code's tools are binary: available or not. Passepartout can conditionally allow filesystem writes to ~/projects/*~ while requiring HITL for writes to ~~/.config/*~ — per-path, per-tool, per-session. This is the deterministic gate stack's natural application domain. - -The Git policy gate (commit-before-modify) is a safety feature no competitor provides. It prevents the most common agent failure mode: modifying files without preserving the prior state. Combined with memory snapshots (v0.2.0), this gives every action a dual audit trail: the git history and the memory object history. - -The TUI tool visualization (v0.8.1) extends seamlessly to MCP tools — the rendering layer doesn't distinguish between native tools and MCP tools. The same colored backgrounds, collapsible outputs, and gate traces apply universally. - -The voice gateway and additional channels add parity with OpenClaw's multi-surface approach without architectural changes — every channel is a thin client speaking the same framed TCP protocol to the same daemon. Channels and providers are trivially copyable: each new platform is ~30 lines of poll-loop, each new provider is ~20 lines of API config. Passepartout matches OpenClaw's channel count on demand, shipping when needed rather than as a scheduled milestone. - -** v0.11.0: Planning, Self-Modification & Deterministic Routing - -*(Renumbered from old v0.9.0.)* - -*Design insight: the inverted tier classifier.* The current tier classifier routes "rm", "write-file", and "shell" to ~:REFLEX~ (no LLM). This routes the most dangerous operations to the path with the least oversight. It should be inverted: ~:REFLEX~ handles deterministic lookups (list TODOs, check file existence, query memory), ~:COGNITION~ handles text processing and summarization, ~:REASONING~ handles planning and code generation. Dangerous operations should always route through ~:REASONING~ where the full LLM cycle and Dispatcher gate stack apply. v0.11.1 fixes this. - -*** TODO Long-horizon planning (task tree DAG) -- Decompose complex tasks into Org-mode headline trees. Each task node is a memory-object with terminal states: ~:todo~ → ~:next-action~ → ~:in-progress~ → ~:done~ / ~:blocked~ / ~:stuck~. -- The LLM generates the initial task tree from the user's request. The REASONING tier processes each leaf task sequentially, updating node states as it progresses. -- Parent nodes summarise child results: when all children of a node reach ~:done~, the parent is promoted to ~:done~ with a synthesised summary. When any child reaches ~:stuck~, the parent is promoted to ~:blocked~ with the blocking child's diagnostic. -- Branch pruning: if a child is ~:stuck~ after three retries with different LLM providers, the parent re-plans the branch — the LLM generates alternative decomposition paths for the blocked sub-task. -- Task trees persist as Org headlines in ~/memex/system/tasks/~. Survive restarts. Visible to the user as editable Org files. -- TUI task tree visualization: a collapsible Org headline tree rendered in the chat area. Each node shows its terminal state with a colored indicator (~○~ todo, ~▶~ next-action, ~◉~ in-progress, ~✓~ done, ~✗~ blocked, ~⏸~ stuck). Nodes expand/collapse on Enter. The tree updates in real time as the agent progresses through subtasks. - -*** TODO Tier classifier fix -- Invert the current classifier: ~:REFLEX~ = deterministic lookups only (memory query, file-exists-p, check time, list TODOs by tag). ~:COGNITION~ = text processing, summarization, simple Q&A, note formatting. ~:REASONING~ = planning, code generation, multi-step task execution, dangerous operations. -- Track classifier accuracy via telemetry: for each classified action, record whether the classification was appropriate. -- The classifier function is overrideable via ~*tier-classifier*~, allowing users or skills to customize routing. -- The classifier should be a skill, not core infrastructure — reloadable and replaceable without restart. - -*** TODO Skill Creator - LLM drafts complete skill org-file from natural language description. - Mandatory pipeline: (a) syntax validation via ~lisp-syntax-validate~, (b) sandbox-load in temporary jailed package (v0.3.2), (c) run registered trigger function against mock contexts, (d) run registered deterministic gate against mock proposals, (e) on pass, promote to live registry under ~passepartout.skills.~. - Required ~:repl-verified~ flag on all ~defun~ forms — the existing Dispatcher lint check warns on writes without verification. The Skill Creator enforces this at creation time. - Skills are the primary extension mechanism for users. The Skill Creator makes skill authoring accessible to non-Lisp-programmers: describe what you want in English, the LLM drafts the Org file, the system verifies it, and the skill is live. +~150 lines as a new skill ~symbolic-skill-creator.org~. + +** v0.60.0: Change Manifest — Skills Ship with Falsifiable Predictions -*** TODO Change manifest — skills ship with falsifiable predictions :PROPERTIES: :ID: id-v110-change-manifest :CREATED: [2026-05-08 Fri] @@ -726,49 +1257,73 @@ The voice gateway and additional channels add parity with OpenClaw's multi-surfa AHE (arXiv:2604.25850v2) shows that harness edits work better when each edit ships with a self-declared prediction, verified by next-round outcomes. Passepartout's Skill Creator should do the same — every new or modified skill carries predictions that telemetry verifies. -- When the Skill Creator generates a skill, it also generates a ~#+PREDICTION:~ block in the Org frontmatter: - - ~#+PREDICTION: reduces token usage by 15% for code-generation tasks~ - - ~#+PREDICTION: may increase HITL prompts for shell commands outside workspace~ - - ~#+PREDICTION: should improve success rate on refactoring tasks~ -- Over the next 10 sessions, telemetry compares actual outcomes against predictions. The verification result is appended to the skill file: ~#+VERIFIED: Y token change: -18% (predicted -15%) on 2026-06-01~ -- Disproven predictions flag the skill for review: ~#+DISPROVEN: token usage increased +3% on code tasks (predicted -15%). Skill scheduled for revision.~ -- The change manifest persists in the skill's Org file — every skill carries its own evidence ledger. Users can see which skills worked as predicted and which didn't. +- When the Skill Creator generates a skill, it also generates a ~#+PREDICTION:~ block in the Org frontmatter. +- Over the next 10 sessions, telemetry compares actual outcomes against predictions. The verification result is appended to the skill file. +- Disproven predictions flag the skill for review. +- The change manifest persists in the skill's Org file — every skill carries its own evidence ledger. ~40 lines in Skill Creator + telemetry integration. -*** Competitive Advantage Analysis — v0.11.0 Summary +** v0.61.0: Long-Horizon Planning (Task Tree DAG) -The task tree DAG with terminal states and branch pruning is Passepartout's planning primitive — analogous to Claude Code's TODO list but structural (Org headlines with parent-child relationships) rather than flat. +:PROPERTIES: +:ID: id-v110-planning +:CREATED: [2026-05-08 Fri] +:END: -The tier classifier fix is a safety correctness issue. The current inverted classifier (dangerous ops → no-LLM path) is actively harmful — it reduces oversight on the operations that need it most. +- Decompose complex tasks into Org-mode headline trees. Each task node is a memory-object with terminal states: ~:todo~ → ~:next-action~ → ~:in-progress~ → ~:done~ / ~:blocked~ / ~:stuck~. +- The LLM generates the initial task tree from the user's request. The REASONING tier processes each leaf task sequentially, updating node states as it progresses. +- Parent nodes summarise child results: when all children of a node reach ~:done~, the parent is promoted to ~:done~ with a synthesised summary. When any child reaches ~:stuck~, the parent is promoted to ~:blocked~ with the blocking child's diagnostic. +- Branch pruning: if a child is ~:stuck~ after three retries with different LLM providers, the parent re-plans the branch — the LLM generates alternative decomposition paths for the blocked sub-task. +- Task trees persist as Org headlines in ~/memex/system/tasks/~. Survive restarts. Visible to the user as editable Org files. +- TUI task tree visualization: a collapsible Org headline tree rendered in the chat area. Each node shows its terminal state with a colored indicator (~○~ todo, ~▶~ next-action, ~◉~ in-progress, ~✓~ done, ~✗~ blocked, ~⏸~ stuck). Nodes expand/collapse on Enter. The tree updates in real time as the agent progresses through subtasks. +~200 lines. -The Skill Creator is the mechanism by which Passepartout escapes the "team of Lisp programmers" constraint. Most agent frameworks require Python/TypeScript to extend. Passepartout's extension language is English — the LLM writes the Lisp, the system verifies it. +** v0.62.0: Tier Classifier Fix -** v0.12.0: Evaluation & Vision +:PROPERTIES: +:ID: id-v110-tier-fix +:CREATED: [2026-05-08 Fri] +:END: -*(Renumbered from old v0.10.0.)* +- Invert the current classifier: ~:REFLEX~ = deterministic lookups only (memory query, file-exists-p, check time, list TODOs by tag). ~:COGNITION~ = text processing, summarization, simple Q&A, note formatting. ~:REASONING~ = planning, code generation, multi-step task execution, dangerous operations. +- Track classifier accuracy via telemetry: for each classified action, record whether the classification was appropriate. +- The classifier function is overrideable via ~*tier-classifier*~, allowing users or skills to customize routing. +- The classifier should be a skill, not core infrastructure — reloadable and replaceable without restart. +~40 lines. -With tools (v0.10.0) and planning (v0.11.0) in place, the agent can execute complex multi-step tasks. v0.12.0 answers two questions: (1) how do we *prove* it works? (SWE-bench evaluation harness), and (2) can the agent interact with visual interfaces? (computer use / vision). +** v0.63.0: SWE-Bench Harness + +:PROPERTIES: +:ID: id-v120-swebench +:CREATED: [2026-05-08 Fri] +:END: -*** TODO SWE-bench harness - Automated pipeline: clone a repository from SWE-bench dataset, parse the GitHub issue, feed the issue description into Passepartout's cognitive loop, track the resolution trajectory as an Org headline tree, apply the generated patch, run the repository's test suite, score success (tests pass yes/no). - Trajectory persistence: each benchmark run produces an Org file under ~/memex/system/benchmarks/~ recording every ~think()~ call, every tool invocation, every Dispatcher decision, and the final test result. - Regression mode: run the same benchmark after each version release. Track score trends. A version that regresses on SWE-bench does not ship. - Target: competitive score with Claude Code and OpenClaw on SWE-bench-verified by v1.0.0. +~200 lines. + +** v0.64.0: Computer Use / Vision + +:PROPERTIES: +:ID: id-v120-vision +:CREATED: [2026-05-08 Fri] +:END: -*** TODO Computer Use / Vision - Screenshot capture: X11 (~xwd~ / ~import~) and Wayland (~grim~) bridge. - Vision model integration: send screenshot to a vision-capable model (GPT-4V, Claude 3.5, Gemini 2.0 Flash). - Coordinate-based interaction: ~xdotool~ / ~ydotool~ for click and type commands. Dispatcher approval gate applies — screen interaction requires HITL by default. - Use case: "open Firefox, search for the Passepartout GitHub repo, and star it." +~100 lines. + +** v0.65.0: Telemetry / Observability -*** TODO Telemetry / observability — structured event logging :PROPERTIES: :ID: id-v120-telemetry :CREATED: [2026-05-08 Fri] :END: -Claude Code tracks everything via GrowthBook feature flags. OpenClaw has structured telemetry with trajectory sidecars. Hermes logs session metrics to SQLite. Passepartout has ~log-message~ — unstructured, no aggregation. Without telemetry, Passepartout cannot answer: "How many HITL prompts per session?" "What's the approval rate?" "Which gate blocks most often?" "What's the average context usage?" These are the metrics that would validate the README's "2-3x fewer tokens" claim. - - Structured event log as JSONL in ~~/.local/share/passepartout/telemetry/~ (one file per session + aggregate) - Event types: ~:session-start~, ~:think-call~ (tokens in/out, provider, model, duration), ~:tool-execution~ (name, duration, success/error), ~:gate-decision~ (gate name, result, pattern), ~:hitl-decision~ (approved/denied, pattern, session count), ~:context-snapshot~ (tokens used, foveal node, pruned count), ~:session-end~ (total tokens, total cost, tool calls, HITL count) - Aggregate keys tracked as a hash table: HITL approval rate, average context usage, most-blocked gate, tokens saved by foveal pruning vs full context @@ -776,126 +1331,121 @@ Claude Code tracks everything via GrowthBook feature flags. OpenClaw has structu - Feeds the evaluation harness (SWE-bench trajectory data comes from the same telemetry system) ~200 lines as a new skill ~symbolic-telemetry.org~. No daemon protocol changes. -*** Competitive Advantage Analysis — v0.12.0 Summary +** v0.66.0: Consensus Loop -SWE-bench evaluation is the industry standard for coding agent capability claims. Passepartout's trajectory persistence is a differentiator: most harnesses produce a pass/fail score. Passepartout's produces a complete Org-mode audit trail showing exactly where the reasoning succeeded or failed. +:PROPERTIES: +:ID: id-v130-consensus +:CREATED: [2026-05-08 Fri] +:END: -Vision + screen interaction is table stakes for competing with Claude Code's computer use feature. The Passepartout advantage: every screen interaction passes through the Dispatcher gate stack. - -** v0.13.0: Consensus, GTD & Deep Emacs Integration - -*(Renumbered from old v0.11.0.)* - -Near-SOTA. The agent has tools, planning, evaluation, and streaming. v0.13.0 adds reliability (consensus), productivity methodology (GTD), and environment depth (Emacs integration). - -*** TODO Consensus loop - Multi-provider parallel inference for critical decisions. When the action's impact score exceeds a threshold, the system sends the same prompt to 2–3 independent providers. - Disagreement detection: compare structured outputs. If all providers agree, proceed with highest-confidence result. If they disagree, flag for HITL approval. - Cost-aware: consensus mode doubles/triples cost. Only trigger when impact exceeds cost threshold. Configurable via ~CONSENSUS_THRESHOLD~. - TUI consensus display: collapsible region listing each provider, its model, its proposal, and its confidence score. ~✓ 3/3 providers agree~ in green; ~✗ 2/3 agree~ in yellow. +~80 lines. + +** v0.67.0: GTD Integration + +:PROPERTIES: +:ID: id-v130-gtd +:CREATED: [2026-05-08 Fri] +:END: -*** TODO GTD integration - Full GTD cycle: capture → process → clarify → organize → reflect → engage. - Org properties: ~:TRIGGER:~ (what context), ~:BLOCKER:~ (what must complete first). - Weekly review: agent scans all projects and tasks, surfaces stalled items, suggests next actions. Produced deterministically — zero LLM tokens. - TUI agenda view: ~/agenda~ command renders Org-agenda as formatted scrollable region within the chat area. +~150 lines. + +** v0.68.0: Deep Emacs Integration + +:PROPERTIES: +:ID: id-v130-emacs +:CREATED: [2026-05-08 Fri] +:END: -*** TODO Deep Emacs integration - Phase II — Interpreter: ELisp compatibility layer runs inside Passepartout's Common Lisp image. Key Emacs packages (Org-mode, Magit) run natively without an Emacs process. - Org-agenda awareness: agent queries agenda view, incorporates agenda context into planning. - Clock time tracking: agent starts/stops clocks on Org headlines, produces clock tables. - Refile and archive: agent refiles headlines between Org files and archives completed items. +~300 lines. -*** Competitive Advantage Analysis — v0.13.0 Summary +** v0.69.0: Save-Lisp-and-Die Binary -The consensus loop benefits from structured output enforcement (v0.9.0) — comparing plists for semantic equivalence is simpler than comparing free-text responses. +:PROPERTIES: +:ID: id-v140-save-lisp +:CREATED: [2026-05-08 Fri] +:END: -The GTD and Emacs integration are Passepartout's "unfair advantages" — no competitor has either. Claude Code and Copilot are development tools, not life management tools. Org-mode is the bridge: the same format that holds the agent's memory holds the user's tasks, calendar, and notes. +- The setup binary (~passepartout-setup~) is a ~save-lisp-and-die~ executable (~100MB: SBCL runtime + core Lisp code + native embedding inference from v0.4.0 + 23MB embedding model). No SBCL install required. No Quicklisp. No bash script. The user runs one file. +- Deterministic path (default, always runs first): the same distro detection, package installation, and configuration logic from today's bash script, reimplemented in Lisp. Handles Debian and Fedora families. +- LLM-assisted path (optional, activates on deterministic failure): downloads Qwen2.5-0.5B (~500MB GGUF, pinned by hash). The model classifies success/failure/recoverable-error and selects the next corrective action from a constrained decision tree. +- Model hash verification: the GGUF file is pinned by SHA-256 hash. +- After setup completes, the binary exits. The user runs ~passepartout daemon~ to start the full system (a live SBCL process, not a sealed binary — REPL, hot-reload, self-modification all available). +- Add FiveAM test: the deterministic path succeeds on a system with all dependencies pre-installed; the LLM-assisted path correctly classifies 10 common package-manager error messages. +~200 lines Lisp + build configuration. -** v0.14.0: Self-Configuring Setup Binary +** v0.70.0: Channels + Providers — Match OpenClaw on Demand -Rationale: The current ~passepartout configure~ flow is a bash script that detects -Debian or Fedora, installs packages, installs Quicklisp, tangles Org sources, and -runs the setup wizard. It handles 2 distro families. A ~save-lisp-and-die~ binary -distributes Passepartout as a single executable with no SBCL or Quicklisp -prerequisite, and an optional small LLM fallback expands coverage to any distro -with a package manager. +:PROPERTIES: +:ID: id-v100-channels +:CREATED: [2026-05-08 Fri] +:END: -Installation is handled by the bash script or this binary. Configuration is -handled by the TUI setup wizard (the new decision from v0.8.0). +The daemon protocol is client-agnostic hex-framed plists over TCP. Every new channel is a new client that speaks the same protocol. OpenClaw's 23+ channels are trivially copyable — each platform needs a poll loop + send function, ~30 lines each. LLM providers are a row in ~*provider-cascade*~ — a new entry in ~neuro-provider.lisp~ with API endpoint + token pricing. Neither deserves its own release. -*** TODO Save-lisp-and-die executable +- Channels: match OpenClaw's 23+ channels on demand. The Emacs bridge (already done, v0.4.0) proves the pattern. Each new platform (WhatsApp, iMessage, Matrix, IRC, etc.) is a skill that registers a poll-fn + send-fn. ~30 lines per channel. +- Providers: match OpenClaw/Hermes on provider count. Adding a new provider is a table entry in ~neuro-provider.lisp~: name, API endpoint, model list, pricing. ~20 lines per provider. +- Voice: STT + TTS are REST wrappers (~whisper~ / ~elevenlabs~ / ~espeak~). Already spec'd as a skill. ~50 lines. -- The setup binary (~passepartout-setup~) is a ~save-lisp-and-die~ executable - (~100MB: SBCL runtime + core Lisp code + native embedding inference from - v0.4.0 + 23MB embedding model). No SBCL install required. No Quicklisp. - No bash script. The user runs one file. -- Deterministic path (default, always runs first): the same distro detection, - package installation, and configuration logic from today's bash script, - reimplemented in Lisp. Handles Debian and Fedora families. Covers the common - case without touching an LLM. -- LLM-assisted path (optional, activates on deterministic failure): downloads - Qwen2.5-0.5B (~500MB GGUF, pinned by hash, cached to - ~~/.local/share/passepartout/models/~). The model reads command output, - classifies success/failure/recoverable-error from a finite set of outcomes, - and selects the next corrective action from a constrained decision tree. - On unrecognized failures, generates a diagnostic for the user. -- Model hash verification: the GGUF file is pinned by SHA-256 hash. If the - hash doesn't match (wrong version, corrupted download), fall back to - deterministic setup with a warning. -- After setup completes, the binary exits. The user runs ~passepartout daemon~ - to start the full system (a live SBCL process, not a sealed binary — REPL, - hot-reload, self-modification all available). -- Add FiveAM test: the deterministic path succeeds on a system with all - dependencies pre-installed; the LLM-assisted path correctly classifies - 10 common package-manager error messages. +No separate releases. Done when needed, shipped when ready. -** v1.0.0: SOTA Parity (verified) +** v0.71.0: Lish Shell +- plist-returning commands: ~(ls :path "~/memex/projects/")~ → structured result +- Pipe as function composition: ~(pipe (ls ...) (filter :state 'TODO))~ +- Org-buffer output: shell output rendered as Org headlines +- External bash compatibility: ~(bash "npm run build")~ → plist with exit code, stdout, stderr +~500 lines CL. Useful immediately for the agent. -Feature-complete, benchmark-verified, production-hardened. All capabilities from v0.3.0 through v0.14.0 integrated and tested end-to-end. +** v0.72.0: Buffer-as-CLOS Prototype +- buffer class: source (file path or Org AST), content, cursor, marks, overlays +- Key editing primitives: insert, delete, move, search, replace +- Org-AST-backed: editing mutates the AST, text rendering is a view +~300 lines CL. No display dependency. -v1.0.0 is not a feature release — it is a verification release. Every feature from the v0.x series is tested under concurrent load, resource starvation, adversarial input, and benchmark scoring. The evaluation harness (v0.12.0) provides the scoring apparatus; v1.0.0 is the scored release. +** v0.73.0: EQL5 Feasibility +- Add EQL5 to Quicklisp dependencies (optional, like croatoan) +- Compile and verify on Linux (primary target) +- Single QML window: "Passepartout" title, 800x600, dark background +- Verify event loop integration with SBCL threads +~100 lines QML + build config. -| Area | Parity Target | Verification Method | -|-------------------+---------------------------------------------+---------------------------------------| -| Self-improvement | Skill Creator + self-edit + hot-reload | Skill regression suite | -| Planning | Task tree DAG with terminal states | Multi-step integration tests | -| Tool ecosystem | 15+ MCP tools + native shell + git | MCP protocol compliance tests | -| Context window | Semantic search + foveal-peripheral + caching| Token budget vs competitor audit | -| Safety | 10-vector Dispatcher + policy + permissions | Chaos testing | -| Multi-step tasks | Task trees with terminal states | SWE-bench score (v0.12.0 harness) | -| Code editing | Full file read/write via MCP + Org | SWE-bench-verified subset | -| Memory | Vector recall + Merkle integrity + MVCC | Concurrency stress test (v0.9.0) | -| Emacs integration | Full org-mode control (exceeds Claude Code) | Org-agenda round-trip test | -| Streaming | Live text + interrupt-and-redirect (v0.7.1) | TUI UX latency benchmark | -| TUI | Streaming, markdown, gate trace, sidebar, | TUI integration test suite | -| | theme system, adaptive layout, mouse, search | | -| Packaging | Source install + save-lisp-and-die binary | Install test matrix across distros | -| Offline | 100% local capable (7-13B model) | Air-gapped integration test | -| Cost | 2-3x fewer tokens than competitors | SWE-bench token audit | -| Concurrency | Priority queue + MVCC + parallel signals | Concurrent load test (3 users + bg) | +** v0.74.0: EQL5 TCP Client +- QML window with terminal widget, input area, status bar +- Connects to daemon via existing framed TCP protocol +- Renders agent responses, gate trace, sidebar panels as QML components +- Lives alongside croatoan TUI (two clients, one daemon) +~300 lines QML + ~200 lines CL. -**Performance projection at v1.0.0:** +** v0.75.0: Minibuffer Prototype +- Universal command line at bottom of Qt window +- /chat /edit /shell /eval dispatch +- Goes through same gate stack as agent actions +~200 lines CL. -| Scenario | Passepartout v1.0.0 | Claude Code | OpenClaw | -|-------------------------------+----------------------------------+------------------------------------+------------------------------------| -| Single-turn chat (local 8B) | 2-4s, ~1,500 tok | N/A (cloud-only) | N/A (cloud-only) | -| Single-turn chat (cloud) | 1-3s, ~1,500 tok | 1-3s, ~3,000 tok | 1-3s, ~3,500 tok | -| Multi-step coding (5 files) | 15-30s, ~30,000 tok | 10-20s, ~65,000 tok | 20-40s, ~85,000 tok | -| Knowledge base query (500 nodes)| <1s (in-image vector), 0 LLM tok | 3-5s, ~5,000 tok (LLM-assisted) | 3-5s, ~5,000 tok (LLM-assisted) | -| Background maintenance | 0 LLM tok (deterministic cron) | Variable or skipped | Variable or skipped | -| Offline operation | Full capability | None | None | -| Cost per coding session | ~$0.15 (gpt-4o-mini) | ~$0.45 (gpt-4o-mini) | ~$0.55 (gpt-4o-mini) | +* v1.0.0: Neurosymbolic Maturity -Passepartout wins on cost (2-3x savings from sparse trees + deterministic gates + caching), offline capability (unique), and knowledge management (10-40x savings from in-image vector lookup + Org-native format). It is competitive on single-turn latency and slightly behind on multi-step latency (the single-pipeline architecture adds ~5s overhead per tool execution versus competitors' parallel tool dispatch). +v1.0.0 is where the agent achieves symbolic-first reasoning in the 10-80-10 architecture. The probabilistic engine (LLM) handles 10% input translation and 10% output formatting. The symbolic engine (VivaceGraph + Screamer + ACL2) handles 80% of reasoning — task planning, fact retrieval, constraint solving, and formal verification. Zero LLM tokens for the reasoning core. -The TUI at v1.0.0 is a SOTA competitive agent interface: streaming responses, gate trace visualization, Information Radiator sidebar, skin system with 10+ presets, adaptive layout, full markdown, mouse support, and personality. The sidebar's gate trace, focus map, and rule counter are capabilities no competitor can replicate — Passepartout's permanent UX differentiator. +Hallucination becomes structurally impossible because the symbolic engine will not accept a fact that contradicts its knowledge graph. Safety becomes provable because ACL2 can prove properties about the system's behavior. Self-improvement becomes stable because the agent modifies skills that are then verified before execution. -The key insight at v1.0.0: Passepartout does not beat competitors at everything. It wins decisively where the architecture's structural advantages apply (safety, cost, offline operation, knowledge management, TUI transparency) and is competitive where they don't (raw LLM inference speed, parallel tool dispatch). This is a defensible position — the niches Passepartout dominates are exactly the niches that matter for a sovereign, local-first AI assistant. +The system is benchmarked against SWE-bench (competitive score with Claude Code and OpenClaw), verified under concurrent load (MVCC from v0.38.0), and validated by the eval harness (v0.9.0). The 10-80-10 planner operates on a mature symbolic index seeded from months of gate outcomes, Screamer deductions, LLM-proposed facts with provenance, and human-authored facts. -But it is still fundamentally probabilistic at its core. The symbolic engine verifies and constrains, but the generative engine is still the primary reasoning source. The architectural transition to symbolic-first reasoning happens in v3.0.0. +The TUI at v1.0.0 is competitive: streaming responses, gate trace visualization, sidebar with 10 panels, skin system with 10+ presets, adaptive layout, full markdown, mouse support, spinner personality, and progress bars. The sidebar's gate trace, focus map, rule counter, sufficiency score, and provenance breakdown are capabilities no competitor can replicate — Passepartout's permanent UX differentiator. -** v2.0.0: Lisp Machine Emergence +v1.0.0 is the brain at maturity. The symbolic engine reasons. The probabilistic engine translates. The gate stack verifies. The Merkle tree preserves provenance. The eval harness guards against regression. + +* v2.0.0: Lisp Machine Emergence v2.0.0 is where Passepartout stops being a daemon with clients and becomes the environment. The agent's cognitive loop, the user's editor, the user's shell, and the user's browser run in the same Common Lisp image. The Dispatcher gate stack verifies every action regardless of who initiated it — user or agent. The distinction between "tool" and "self" dissolves. @@ -903,7 +1453,7 @@ v2.0.0 is where Passepartout stops being a daemon with clients and becomes the e *Architectural principle: Browser inside Lisp, not Lisp inside browser.* Lisp is the parent process. It owns the window, the memory, and the input loop. The rendering engine (WebKit/Blink) is a library that paints pixels inside a Lisp buffer. The user can redefine functions while browsing without restarting. Keybinding lookups happen in microseconds (SBCL machine code) — the browser cannot "steal" shortcuts. -*** Qt/QML via EQL5 — the rendering surface +** Qt/QML via EQL5 — the rendering surface - Qt/QML (via EQL5) is the UI framework. EQL5 exposes the full Qt C++ API from Common Lisp. QML is declarative — it matches Lisp's generation model. - Desktop: native look and feel on Linux, macOS, and Windows. @@ -917,7 +1467,7 @@ Not elisp. Not Emacs. A multi-threaded Common Lisp editor rendered via Qt/QML. T Org-babel for interactive evaluation: source blocks in Org files are executable. The user evaluates a ~#+begin_src lisp~ block and the result appears inline. The agent evaluates blocks to verify code before writing. The REPL is not a separate window — it is the Org buffer in which the agent and user both work. -The editor and the agent share the same Lisp image. The editor is not a client that connects to a daemon — it IS the daemon process. The TUI from v0.3.6 (with word wrap, streaming, gate trace, focus map) is the editor's rendering surface. +The editor and the agent share the same Lisp image. The editor is not a client that connects to a daemon — it IS the daemon process. The TUI from v0.x is the editor's rendering surface. *** Nyxt — the Common Lisp browser (three erosion stages) @@ -951,74 +1501,58 @@ The Emacs bridge (v0.4.0) is Phase I. The deep integration is three phases, not *** Strategic timeline -v0.4.0 Emacs bridge (Phase I Parasite) → v1.0.0 SOTA parity → v2.0.0 Lish editor + Nyxt browser (Stage 1) + Emacs Phase II/III + mobile. The Qt/QML surface enables gradual erosion of the rendering stack without rewriting the application logic. The three-phase Emacs migration ensures Lisp users are never abandoned — the bridge works from day one, the native experience grows under it. +v0.4.0 Emacs bridge (Phase I Parasite) → v1.0.0 Neurosymbolic Maturity → v2.0.0 Lish editor + Nyxt browser (Stage 1) + Emacs Phase II/III + mobile. The Qt/QML surface enables gradual erosion of the rendering stack without rewriting the application logic. The three-phase Emacs migration ensures Lisp users are never abandoned — the bridge works from day one, the native experience grows under it. -** v3.0.0: Neurosymbolic Maturity +* v3.0.0+: Cannibalization — Eat Your Dependencies -Deterministic planner takes the wheel. LLM relegated to semantic translation. +v3.0.0 begins the erosion of external dependencies — the system that was bootstrapped on Qt, WebKit, C runtime, and Linux starts replacing them piece by piece with native Lisp components. This is the realization of the Lisp Machine: not built from scratch, but arrived at through gradual replacement of a working system. -*Architectural approach: Stitching, not building.* The symbolic engine is not a from-scratch reasoner. It is an integration of existing Common Lisp libraries connected by macros and DSLs. The Lisp advantage is the macro system — it transforms human-readable rules into formal logic queries without requiring a new engine. +*** v3.0.0: Single-Process Convergence +- TCP bridge between daemon and EQL5 client becomes an internal function call +- One SBCL image: daemon + editor + shell + browser share one address space +- The wire protocol becomes nil — all communication is plist exchange in memory -*** Open-source Lisp stack +*** v3.1.0: Lisp-Native Layout Engine +- Replace QML layout with Lisp layout (Yoga FFI as intermediate step) +- CLOS-based widget tree with computed dirty regions +- Diff-based redisplay: only changed cells re-render -- *Knowledge Graph:* VivaceGraph v3 — Lisp-native graph database with a Prolog-like query language built in. Stores facts, relationships, and rules as native Lisp objects in the same image as the agent. -- *Constraint Solver:* Screamer — non-deterministic backtracking. Given a set of constraints, finds all valid solutions or proves none exist. Used to verify that proposed actions do not violate invariants. -- *Formal Verifier:* ACL2 — a theorem prover for Common Lisp, BSD licensed. Proves properties about functions before they are committed to the running image. Used for skill verification and Dispatcher rule validation. +*** v3.2.0: Browser Stage 2 — S-Expression DOM +- Lisp builds its own DOM as native s-expressions +- WebKit reduced to pixel painting only +- Agent traverses and manipulates DOM as Lisp data without serialization -*** The 10-80-10 architecture +*** v3.3.0: Browser Stage 3 — Pure Lisp Browser +- Lisp-native layout engine handles CSS subset +- JavaScript via QuickJS remains +- WebKit turned off entirely +- The browser is now a Lisp application -Ten percent neural for input translation, eighty percent symbolic for reasoning against a knowledge graph, ten percent neural for output formatting. +*** v3.4.0+: Qt/QML Erosion +- Replace QML components with Lisp-native widgets (one at a time) +- Window management via Lisp-native X11/Wayland bindings +- Font rendering via HarfBuzz FFI → Lisp replacement +- Event loop: Qt's → SBCL's native thread scheduler +- Each replacement is verified by the eval harness; the system remains usable at every step -- *10% Input:* The LLM translates natural language into structured queries (Prolog facts, knowledge graph lookups). The neural translator is trained via EGGROLL (low-rank evolution strategies) on the reward signal from the symbolic verifier — it learns to produce queries that the symbolic engine accepts. -- *80% Reasoning:* Pure Lisp. Task graphs generated by the deterministic planner against the knowledge graph. Formal verification via ACL2. Constraint checking via Screamer. Fact retrieval via VivaceGraph. Zero LLM tokens. Zero hallucinations. -- *10% Output:* The LLM formats symbolic results back into natural language. The neural formatter is structurally identical to the translator — same training loop, reversed direction. +*** v3.6.0: Stage0 Lisp Bootstrap +- 500-byte hex bootstrap → self-hosting Lisp +- Replace Linux bootloader +- The Lisp machine runs on bare metal -*** The auto-formalizer bootstrap - -The symbolic engine needs a populated knowledge graph. The auto-formalizer populates it: - -1. Feed unstructured data (documentation, manuals, logs, session histories) to the LLM in ~auto-formalizer~ mode. -2. The LLM extracts facts, relationships, and rules as structured S-expressions. -3. The symbolic verifier (Screamer + ACL2) checks each extracted fact for consistency with the existing knowledge graph. -4. Consistent facts are added. Conflicting facts are flagged for human review. -5. Over time, the knowledge graph grows without manual ontology engineering. - -*** DSL approach over engine building - -Domain-specific languages, not general-purpose reasoners: - -- Lisp macros transform human-readable rules into Prolog queries that run against VivaceGraph. -- ~(defrule check-privacy :when (contains-tag payload "@personal") :then :block)~ expands to a VivaceGraph query with Screamer constraint checking. -- Users write rules in a domain-specific DSL. The macros handle the translation to formal logic. -- The Skill Creator (v0.9.0) generates DSL rules from English descriptions. The auto-formalizer verifies them. -- ~(macroexpand-1 '(defrule ...))~ shows exactly how the rule compiles — 100% auditable. - -*** Self-correcting gates - -Gates learn from the full history of outcomes — did the plan succeed? Where did it fail? The symbolic engine updates its own rules based on results: - -- Induced functions from v0.5.0 feed into the symbolic engine as candidate rules. -- The symbolic verifier checks each candidate against the knowledge graph for consistency. -- Rules that pass verification are promoted to the active gate stack. -- Rules that fail verification are discarded with a diagnostic — the agent learns why the pattern doesn't generalize. - -*** Implications - -Hallucination becomes structurally impossible because the symbolic engine will not accept a fact that contradicts its knowledge graph. Safety becomes provable because ACL2 can prove properties about the system's behavior. Self-improvement becomes stable because the agent modifies skills that are then verified before execution. The 80% of computation that happens in the symbolic middle layer costs zero LLM tokens. - -** v4.0.0: Native Inference +* v4.0.0: Native Inference LLM inference moves in-process. No external servers. No API keys required for inference. *Lisp as Sovereign Governor, not as Math Engine.* The weights themselves are not stored as Lisp objects — this would waste 50% memory on type tags and destroy cache locality through pointer-chasing. Instead, the entire tensor is tagged as a single Lisp object (~macro-tag~). The Lisp image holds a pointer to optimized flat binary (GPU-friendly, FPGA-compatible). The tag is checked once. After that, all math happens in the optimized backend. -*** Native inference (FFI binding to llama.cpp) +** Native inference (FFI binding to llama.cpp) - FFI binding to llama.cpp via CFFI: load GGUF models, run inference, manage KV cache. Single SBCL image, zero process boundaries. The agent and the model share memory. - Speculative safety: the Dispatcher gate stack intercepts token generation in real time. A token that would produce a blocked action is preemptively suppressed before generation. No external inference API supports this. - Foveal-peripheral compute: the model skips pruned context nodes during attention computation. External APIs compute full attention regardless of what you send. In-process inference makes the sparse-tree rendering pay off at the compute level, not just the token level. -*** Live surgery on cognition +** Live surgery on cognition With in-process inference, the agent's internal state becomes inspectable: @@ -1027,7 +1561,7 @@ With in-process inference, the agent's internal state becomes inspectable: - Detect when the agent is likely to hallucinate by comparing current activation patterns against historical baselines. - The REPL becomes a surgical instrument for the agent's own cognition — not just for verifying code, but for inspecting and correcting the neural process that generates it. -*** DSL-compiled model architectures +** DSL-compiled model architectures Model architectures are described as Lisp DSL: @@ -1035,19 +1569,19 @@ Model architectures are described as Lisp DSL: - The DSL compiles to machine code for the target backend (GPU via CUDA, FPGA via VexRiscv, CPU via llama.cpp). - Python interprets at runtime. Lisp compiles once. Model architecture changes are treated the same as code changes — edited, verified, hot-reloaded. -** v5.0.0: Hardware — Tagged Lisp Architecture +* v5.0.0: Hardware — Tagged Lisp Architecture The Lisp machine becomes physical. RISC-V with tagged architecture, hardware-enforced type checking, and FPGA prototype for the symbolic core. *Not a from-scratch processor.* Use RISC-V as the skeleton, add custom Lisp extensions. RISC-V provides the carrier architecture (standard instruction set, existing toolchain, LLVM support). Lisp extensions provide tagged computation (type checking in hardware, parallel garbage collection, S-expression traversal as atomic operations). -*** The macro-tag approach +** The macro-tag approach - Top 4–8 bits of every memory word = Type Tag. Hardware checks tags in parallel with ALU operations. Trap on type mismatch. - A tensor (70B weights) is one macro-tagged Lisp object — a pointer to flat binary. The tag is checked once. Math happens at native speed. This replaces "weights as sexps" (which wastes 50% memory on per-weight tags and destroys cache locality). - Custom instructions: TADD (tagged add), LISP.CAR, LISP.CDR — Lisp primitives as single-cycle hardware operations. -*** Phase migration: Host → Co-processor → Self-hosted +** Phase migration: Host → Co-processor → Self-hosted 1. *Parasitic.* Lisp card (FPGA) is a PCIe co-processor. Host CPU (Intel/AMD, Linux/Windows) handles "dirty" I/O — networking, display, file systems. Lisp card handles tagged computation and the agent's cognitive loop. If Lisp crashes, host survives. Reset card, reload. Memory mapping: the card can see the host's memory. The Lisp environment reaches out and inspects data. @@ -1057,7 +1591,7 @@ The Lisp machine becomes physical. RISC-V with tagged architecture, hardware-enf 4. *Self-Hosting.* Replace the Linux bootloader with Stage0 Lisp (a bootstrap from 500 bytes of hex to a self-hosting Lisp). Cut the umbilical cord. The Lisp machine runs on bare metal. -*** Concrete prototyping milestones +** Concrete prototyping milestones | Stage | Hardware | Cost | What it delivers | |-------+----------+------+-----------------| @@ -1068,22 +1602,89 @@ The Lisp machine becomes physical. RISC-V with tagged architecture, hardware-enf Start at TinyTapeout. Validate the tagged architecture works. Move to FPGA. Validate at speed. Only then consider silicon. -*** Garbage collection in hardware +** Garbage collection in hardware Dedicated bus master (Scavenger) runs background garbage collection while the main CPU executes code. No "GC pause." The scavenger traverses the heap in parallel with computation, freeing unreachable objects without stopping the agent. -*** Persistent single-address-space memory +** Persistent single-address-space memory NVRAM for the entire heap. Turn on the machine — state is exactly where you left it. No "booting." No "loading memory from disk." The agent's Merkle-tree memory, skill registry, knowledge graph, and induced functions survive restarts as a contiguous hardware state. -*** Why this is not "Lisp inside browser" +** Why this is not "Lisp inside browser" Most Lisp-on-hardware attempts fail because they try to compete with Intel on raw math. That's the wrong axis. The tagged architecture doesn't need to beat a GPU at matrix multiplication. It needs to beat a CPU at symbolic computation — graph traversal, constraint solving, theorem proving, garbage collection. These are the v3.0.0 symbolic engine's workload. Hardware that makes them single-cycle is the differentiator, not hardware that runs matrix math faster. -** v6.0.0: True Agency +* v6.0.0: True Agency World models, temporal reasoning, goal persistence across restarts. - World models: Predictive models of user behavior, project dynamics, system state. - Temporal reasoning: Scheduling, deadlines, elapsed duration awareness. - Goal persistence: Goals survive restarts. Long-term projects in memory-objects. + +* Neurosymbolic Phase Reference + +Each phase has a detailed implementation spec in its version section above. Summary of what is and isn't built: + +| Phase | Component | Lines | Release | +|-------+-----------------------------------------+-------+----------| +| 0 | PM-type-level gates + core integrity | ~75 | v0.10.0 | +| 0b | Layered auth — Layer 1 (cryptographic) | ~200 | v0.12.0 | +| 1 | Triple fact store + abstract API | ~200 | v0.14.0 | +| 1a | Self-preservation mechanisms | ~120 | v0.16.0 | +| 2 | Screamer admission gate | ~200 | v0.18.0 | +| 3 | Archivist as fact proposer | ~100 | v0.20.0 | +| 4 | Sufficiency criterion — the flip | ~50 | v0.22.0 | +| 5 | VivaceGraph + Merkle DAG + ontology ver | ~400 | v0.25.0 | +| 6 | ACL2 structural verification | ~200 | v0.27.0 | +| 7 | 10-80-10 planner | ~500 | v0.36.0 | +| 8+ | Semantic Wikipedia integration | TBD | v0.36.1+ | +|-------+-----------------------------------------+-------+----------| +| Total | | ~2045 | | + +** What Is NOT Built by the Neurosymbolic Phases + +1. *A separate knowledge graph serialization format before the ephemeral phase proves what facts are useful.* Premature format commitment is the ontology problem writ small. Let use determine the format. + +2. *ACL2 verification of empirical claims.* Apple is red. rm -rf / is destructive. These are observations, not theorems. Screamer handles empirical consistency. ACL2 handles structural verification. + +3. *VivaceGraph before Screamer.* The admission gate is the critical path. The persistence layer is an optimization of a working system. + +4. *A per-fact ontology designed upfront.* Extract from the gate stack, extend from deductions and observations, prune through contradiction detection. The ontology is a garden, not a building. + +5. *New core ASDF components.* Every phase is a skill. A corrupted symbolic engine degrades reasoning but does not kill the agent. Satisfies the self-repair criterion. + +6. *A "complete" symbolic index for the broad domain.* The neural index is the permanent gateway to the richness of prose. The symbolic index handles what can be mechanically verified. The boundary is permanent, not transitional. The neuro is the brain. The symbolic is the education. + +** Competitive Advantage Analysis + +*** Phase 0-1: Deterministic safety, now with type-level guarantees +The existing Dispatcher gate stack already provides 0-LLM-token safety verification. Phase 0 adds structural guarantees: no heuristic bypassing of the type hierarchy. A request to modify the dispatcher's own rules is impossible by construction, not just caught by pattern matching. No competitor has this — their equivalent of "core file protection" is a prompt instruction, not a type system. + +*** Phase 0b: Layered signal authentication — verified origin, not claimed origin +No competitor verifies /who/ issued a signal. Every agent harness accepts signals from any source that speaks its protocol. A compromised dependency can impersonate any signal source. Passepartout's four-layer authentication gate makes signal source spoofing impossible at Layer 1 (cryptographic), detectable at Layers 2-3 (sensory + deterministic reasoning), and probabilistically flagged at Layer 4 (style analysis). The key registry has Merkle-hashed provenance — key creation, promotion, and revocation are auditable, versioned, and survivable across restarts. + +*** Phase 2-3: Verified extraction — the symbolic index grows without corruption +No competitor verifies extracted facts against an existing knowledge base. Their memory systems (Claude Code's ~extractMemories~, Hermes's MemoryProvider, OpenClaw's session transcripts) record what the LLM /said/ happened, not what the system /proved/ happened. Passepartout's Screamer-gated admission makes the symbolic index a monotonic, verified structure. Facts are admitted because they are consistent, not because the LLM generated them. + +*** Phase 4-5: Self-accelerating knowledge — the downward cost curve +The sufficiency criterion makes Passepartout's "cheaper over time" thesis measurable. As the ratio of non-lossy facts grows, LLM calls for extraction decrease. At sufficiency, extraction of known categories becomes deterministic. The downward cost curve is not a marketing claim — it is a structural property of the architecture, visible through the sufficiency score. + +*** Phase 6-7: Provable plan soundness +No competitor verifies task plans against formal constraints. Claude Code plans in a single LLM call with no post-hoc verification. Hermes decomposes tasks into subtasks but does not prove them non-contradictory. Passepartout's ACL2-verified plans are structurally guaranteed to have no deadlocks, no dependency cycles, and no safety violations. The verification is a proof, not a prompt. + +*** Phase 0-1a: Self-preservation — the agent knows when it is wounded +No competitor detects its own degradation. Claude Code, OpenCode, and Hermes all fail silently when a tool crashes or a dependency is missing — the agent keeps running, producing degraded output, never telling the user. Passepartout's quarantine system detects failing skills, unloads them automatically, and displays a degraded-mode indicator in the status bar. The external watchdog restarts the daemon if the process dies. The integrity monitor detects corrupted core files. The agent refuses to execute commands that would destroy its own runtime, explaining /why/ and redirecting to the safe termination path. + +*** Semantic Wikipedia: Entity coverage at zero marginal cost +No competitor has a general-knowledge entity graph because no competitor has a symbolic engine to populate. Claude Code knows codebases; it doesn't know that Nabokov wrote /Pale Fire/ and lectured on Kafka. Passepartout with Wikidata loaded knows both, and the entity knowledge costs zero LLM tokens — it is loaded once as structured data and queried via VivaceGraph traversals. + +*** The permanent competitive advantage +The competitive advantage is not any single feature. It is the architecture's ability to accumulate verified knowledge from four independent sources (gates, deduction, verified LLM proposals, human authoring) and to make that knowledge queryable with provenance. Competitors accumulate chat transcripts. Passepartout accumulates a provenanced, self-verifying knowledge graph. Transcripts become stale and unreliable. The knowledge graph becomes richer and more trustworthy with every session. + +Design rationale is in: +- ~notes/passepartout-neurosymbolic-design-decisions-and-options.org~ — design rationale for every decision +- ~notes/passepartout-symbolic-engine-exploration.org~ — original architecture exploration +- ~notes/passepartout-whitehead.org~ — Whitehead's four concrete contributions +- ~docs/ARCHITECTURE.org~ — current pipeline architecture +- ~docs/DESIGN_DECISIONS.org~ — foundational architectural decisions