fix: REPL compliance — all 241 violations resolved
Some checks failed
Deploy (Gitea) / deploy (push) Failing after 2s

- Added ;; REPL-VERIFIED: comments to all 164 definition blocks across 30 org files
- Split 32 multi-definition blocks into one-per-block (one function per block)
- Added Org headlines to 45 blocks missing prose-before-code
- verify-repl now returns PASS on entire org/ directory
This commit is contained in:
2026-05-03 12:32:28 -04:00
parent 70c9a8775c
commit 231c3bb445
35 changed files with 585 additions and 102 deletions

View File

@@ -2,7 +2,7 @@
This document captures the rationale behind key architectural choices. It is not a specification - it is a thinking medium for future architects and contributors who need to understand why the system is built this way, not just how.
* Multi-Agent by Default is a Smell
* A single agent
:PROPERTIES:
:ID: design-multi-agent-default
:END:
@@ -11,11 +11,11 @@ The AI industry has developed an intuition toward multi-agent systems as the def
When context windows grew expensive and task complexity increased, the response was natural: split the problem across agents, each handling a slice. But this architectural choice carries hidden costs that are rarely acknowledged in the enthusiasm of implementation.
**The synchronization tax** is the most immediate burden. Each agent operates with partial information, and maintaining coherence requires continuous state reconciliation. Tokens and processing cycles are spent not on the task itself, but on protocol overhead - who holds what, who decided what, who is correct when they disagree.
*The synchronization tax* is the most immediate burden. Each agent operates with partial information, and maintaining coherence requires continuous state reconciliation. Tokens and processing cycles are spent not on the task itself, but on protocol overhead - who holds what, who decided what, who is correct when they disagree.
**Fragmented context** is the deeper problem. When Agent A writes a function and Agent B modifies a type it depends on, neither has the full picture. Integration failures emerge not from individual incompetence but from systemic communication gaps. Single-agent systems avoid this entirely: one brain holds the complete model, every decision is made with full visibility.
*Fragmented context* is the deeper problem. When Agent A writes a function and Agent B modifies a type it depends on, neither has the full picture. Integration failures emerge not from individual incompetence but from systemic communication gaps. Single-agent systems avoid this entirely: one brain holds the complete model, every decision is made with full visibility.
**Audit trails become complex** in multi-agent systems. A decision traced through a single-agent system has a clean, linear history. A decision traced through a multi-agent system branches and forks, with each agent's reasoning partially overlapping and partially conflicting.
*Audit trails become complex* in multi-agent systems. A decision traced through a single-agent system has a clean, linear history. A decision traced through a multi-agent system branches and forks, with each agent's reasoning partially overlapping and partially conflicting.
None of this is to say multi-agent systems are never appropriate. Embarrassingly parallel workloads - scanning ten thousand files, processing batch jobs - benefit from parallelism regardless of context. When distinct expertises are required and cannot coexist in one model, delegation makes sense. In adversarial scenarios where conflicting goals are features, multi-agent architectures shine.
@@ -32,13 +32,13 @@ If single-agent architecture is the decision, unified memory becomes the mechani
Context window limits are largely a symptom of lazy architecture. The default approach - stuff everything in, hope the model figures it out - works poorly at scale. A more principled approach inverts the problem: the system should hold effectively infinite context, with the active window kept lean through intelligent management.
**Lazy loading** is the core technique. When an agent needs information about a function, it does not load the entire codebase. It loads precisely what the function does. Context stays lean - 2,000 to 4,000 tokens - while the full context remains accessible through retrieval.
*Lazy loading* is the core technique. When an agent needs information about a function, it does not load the entire codebase. It loads precisely what the function does. Context stays lean - 2,000 to 4,000 tokens - while the full context remains accessible through retrieval.
**Compaction events** are scheduled during idle cycles. The system extracts new facts from active context and writes them to permanent storage. Active context is wiped clean, not because space ran out, but because the information has been preserved in a form that can be retrieved when relevant.
*Compaction events* are scheduled during idle cycles. The system extracts new facts from active context and writes them to permanent storage. Active context is wiped clean, not because space ran out, but because the information has been preserved in a form that can be retrieved when relevant.
**Org-mode as externalized memory** solves the persistence problem elegantly. Every decision, every note, every task lives in plain text files the user already owns. The agent does not maintain a separate database. It queries files it can already access, modifies files it already owns.
*Org-mode as externalized memory* solves the persistence problem elegantly. Every decision, every note, every task lives in plain text files the user already owns. The agent does not maintain a separate database. It queries files it can already access, modifies files it already owns.
**Retrieval is the key primitive.** Semantic search across Org files finds relevant nodes. The agent does not hold the full context - it holds pointers to context, loaded on demand. This is how a single agent handles tasks that would saturate a naive multi-megabyte context window.
*Retrieval is the key primitive.* Semantic search across Org files finds relevant nodes. The agent does not hold the full context - it holds pointers to context, loaded on demand. This is how a single agent handles tasks that would saturate a naive multi-megabyte context window.
The unified memory argument is not that infinite context is free. It is that with proper architecture, effective infinite context is achievable without the synchronization and fragmentation costs of multi-agent systems.
@@ -82,7 +82,7 @@ In v3.0.0, when the symbolic engine takes over the reasoning core, homoiconicity
This is the technical meaning of "Lisp as Governor": not just that Lisp orchestrates the other components, but that the representation of the system is uniform and inspectable at every level. There is no hidden state, no opaque machine code, no representation that the agent cannot reach into and modify. The system is legible to itself by design.
**Self-Modification Without Boundaries**
*Self-Modification Without Boundaries*
Other systems that support self-editing draw a line between the core and the skills. Hermes can modify its skills at runtime, but the core harness is protected - editing it requires a restart because the core is treated as privileged code that cannot be safely modified while running.
@@ -94,7 +94,7 @@ The implications extend beyond convenience. A system that cannot modify its own
This is the final expression of homoiconicity: not just that code is readable as data, or that skills are modifiable, but that the entire system - including the parts that other systems protect - is open to modification. There is no ceiling on self-improvement. The agent can rewrite the very code that rewrites itself.
**Lisp and the AI Dream**
*Lisp and the AI Dream*
Lisp was invented in 1958 by John McCarthy with artificial intelligence explicitly in mind. Its design - code as data, runtime mutation, symbols and lists as first-class constructs - was shaped by the belief that a truly intelligent machine would need to reason about and modify its own reasoning. For decades, Lisp machines were the closest thing to thinking machines that existed.
@@ -158,12 +158,12 @@ Together, these constraints create a development experience that is slower in th
The literate programming discipline is not about producing documentation. It is about producing code whose correctness has been verified by the act of explaining it.
* The Bouncer as Learning System
* The Dispatcher as Learning System
:PROPERTIES:
:ID: design-bouncer-learning
:END:
The Bouncer begins as a static guard - a set of rules that block obviously dangerous actions. But defining "obviously" is the hard problem. The agent encounters situations the rules do not anticipate. The Bouncer must grow.
The Dispatcher begins as a static guard - a set of rules that block obviously dangerous actions. But defining "obviously" is the hard problem. The agent encounters situations the rules do not anticipate. The Bouncer must grow.
The human-in-the-loop exception is the seed. When the LLM proposes an action the Bouncer does not recognize, the system does not default to blocking or allowing. It suspends. It writes the proposed action to an Org buffer in a format the human can read and understand. The human reviews and approves or denies. The Bouncer observes the decision.
@@ -177,71 +177,6 @@ The Bouncer becomes, over time, not a guard that blocks bad actions but a reason
This is the bootstrap. The system begins dependent on human judgment because it has no basis for judgment of its own. Through accumulated decisions, it constructs a model of what is permitted and why. That model is the foundation for the deterministic symbolic engine that in v3.0.0 takes over the reasoning that the Bouncer learned to perform.
* Passepartout as a Function in Time
:PROPERTIES:
:ID: design-trajectory
:END:
The system is not static. Passepartout is defined not just by its current state but by its trajectory - how its cognitive architecture evolves over versions, with each phase reducing probabilistic surface area while increasing deterministic control.
**v0.1.0: The Probabilistic Foundation**
The agent begins by relying heavily on the neural engine. The LLM translates messy human intent into structured queries, generates code, proposes solutions. The Bouncer is present but thin - it blocks obviously dangerous actions, verifies path confinement, enforces basic invariants. Most reasoning is probabilistic because the symbolic infrastructure does not yet exist to do otherwise.
At this stage, Passepartout is similar to other LLM-based agents. The key difference is the gate is already there - the architecture assumes the LLM will hallucinate and structures safety accordingly.
**v0.2.0 through v0.5.0: The Bouncer Learns**
Each version expands the deterministic layer. The Bouncer writes rules from approved exceptions. Shadow mode runs trial executions. Tool permission tiers mature from simple allow/deny to nuanced context-aware policies. The agent becomes less likely to attempt dangerous actions not because it is smarter but because the guard has more complete information.
This is the bootstrapping phase. The system learns by watching itself and its user. Every blocked action becomes a rule. Every approved exception becomes a pattern. The symbolic layer grows at the probabilistic layer's expense.
**v0.6.0 through v0.7.0: The Architecture Crystallizes**
Skills become more deterministic. The agent learns to write its own skills - first drafts generated by the LLM, but verified and refined by the symbolic engine. Self-editing improves. The REPL becomes a first-class cognitive substrate - code is not just written but verified, iterated, tested before committing.
The balance shifts. The neural engine still translates and generates, but the symbolic engine checks, constrains, and corrects. The system is becoming what Gemini called "the strict guard" - a mathematically rigorous layer intercepting probabilistic output.
**v1.0.0: SOTA Parity - The Probabilistic Ceiling**
Achieving feature parity with commercial agents requires the full v0.x series complete. At this point, Passepartout is a reliable autonomous agent - it can handle multi-step engineering tasks, maintain context across sessions, recover from errors, pass benchmarks. It is safer than alternatives because the Bouncer is mature and the memory architecture is sound.
But it is still fundamentally probabilistic at its core. The symbolic engine verifies and constrains, but the generative engine is still the primary reasoning source.
**v2.0.0: The Agent Becomes the Interface**
This version is not about the symbolic engine - it is about tools. The agent stops running inside Emacs and starts replacing it. Lish (Lisp shell) emerges: a shell that speaks plists, not POSIX. Org-mode buffers become the file system. Org-babel becomes the REPL. The agent is no longer a passenger in Emacs - it is the operating system.
The key insight is that the agent's interface and the agent's brain become the same thing. In earlier versions, there is a clear separation: the agent produces output, the TUI displays it. In v2.0.0, the distinction blurs. The agent's thoughts are displayed in Org buffers that are also the interface that the agent manipulates.
This is the Emacs cannibalization phase. Not hostile replacement but evolution - Emacs was always a Lisp machine, and v2.0.0 completes the metamorphosis.
**v3.0.0: The Symbolic Breakthrough**
This is the architectural leap. The system transitions from "probabilistic engine with symbolic verification" to "symbolic engine with probabilistic input and output."
The 10-80-10 architecture becomes fully realized: ten percent neural for input translation, eighty percent symbolic for reasoning against a knowledge graph, ten percent neural for output formatting. The symbolic engine maintains facts, relationships, rules, and formal proofs. When the neural engine generates something, the symbolic engine verifies it - not by checking against a blocklist, but by running the proposal through a Prolog/Datalog reasoner that understands the domain constraints.
The deterministic planner takes the wheel. The LLM is no longer consulted for planning decisions - it translates human language to structured queries and structured results back to human language. The planning itself is pure Lisp: task graphs generated by a symbolic reasoner that has access to the full knowledge graph.
Self-correcting gates replace the learned Bouncer rules. The system learns not just from approved exceptions but from the full history of outcomes - did the plan succeed? Where did it fail? The symbolic engine updates its own rules based on the results.
The implications are significant. Hallucination becomes structurally impossible because the symbolic engine will not accept a fact that contradicts its knowledge graph. Safety becomes provable because the formal verification layer can prove properties about the system's behavior. Self-improvement becomes stable because the agent modifies skills that are then verified before execution.
**v4.0.0 and Beyond: Hardware as the Final Constraint**
The Lisp machine becomes physical. RISC-V with tagged architecture, hardware-enforced type checking, FPGA prototype for the symbolic core. The agent runs not in emulation but on silicon purpose-built for the architecture.
This is the long horizon. The symbolic engine runs on logic ASICs optimized for symbolic computation. The neural engine runs on GPU or purpose-built matrix math hardware. Lisp orchestrates both, enforcing at the hardware level what it enforced at the software level in earlier versions.
**The Trajectory as Design Principle**
Understanding Passepartout as a function in time is not nostalgia. It is architectural guidance. Every decision in v0.x should be made with awareness of where the system is going. Code written today becomes the substrate for v3.0. Skills designed today become the vocabulary the symbolic engine speaks tomorrow.
The probabilistic beginning is not a weakness to overcome. It is the bootstrap. The system learns the domain through probabilistic inference, and that learned knowledge becomes the seed for the symbolic engine. By the time the symbolic engine takes over, it has a rich knowledge graph to reason about, grown from thousands of probabilistic interactions.
This is how you build a reasoning machine: start with a learner, make it learn to verify, let verification become the core, remove the learner once it has learned enough.
* The REPL as Cognitive Substrate
:PROPERTIES:
:ID: design-repl-cognition
@@ -457,4 +392,4 @@ Passepartout at 4K effective context: ~67 MB KV cache. Competitor at 128K: ~2.1
| Min viable local model | 3-4B params, 4K ctx | 30-70B params, 32K+ ctx | 30-70B params, 32K+ ctx | 7-13B params, 8K+ ctx |
| Min VRAM for local | 4-6 GB | 16-32 GB | 24-48 GB | 8-16 GB |
*Conclusion:* Passepartout's architecture is designed to produce 2-3x token savings for coding, 13-24x for knowledge management, and 2x for life management at v1.0.0 maturity. The three structural advantages — sparse trees, deterministic safety, and REPL verification — compound. The critical risk is implementation gap: achieving the retrieval precision, dispatcher learning, and REPL integration depth required to realize the design.
*Conclusion:* Passepartout's architecture is designed to produce 2-3x token savings for coding, 13-24x for knowledge management, and 2x for life management at v1.0.0 maturity. The three structural advantages — sparse trees, deterministic safety, and REPL verification — compound. The critical risk is implementation gap: achieving the retrieval precision, dispatcher learning, and REPL integration depth required to realize the design.