Files
hermes-brain/projects/passepartout/architecture/self.org
Hermes c5d0695acf Reorder architecture sidebar using weights: stages first, then foundation, epistemology, design, implications, reference
- Add :WEIGHT: extraction to build script (from Org PROPERTIES into Hugo TOML frontmatter)
- Shorten architecture _index.org staged progression to a single-line summary pointing to stages/ directory
- Weight order (sidebar now reads in this order):
  stages/ (10-20) — the roadmap, early so references make sense
  lisp-foundation.org (21) — why Lisp
  knowledge-layers/ (30-32) — how the system knows
  design-decisions through neuro-comparison (40-45) — design
  systemic-effects (50) — implications
  org-knowledge-base through repo-organization (60-64) — reference
- Rebuild: 148 files, 0 errors
2026-06-04 19:36:53 +00:00

174 lines
13 KiB
Org Mode
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: Self, Pain, Pleasure, and the Three Laws
type: reference
tags: :passepartout:architecture:philosophy:social-protocol:
created: 2026-05-28
---
← Academic Nearest Neighbors
# Self, Pain, Pleasure, and the Three Laws
## Pain in Passepartout
Pain exists at two layers:
1. **Reflex** — the gate denies an action. No learning, no signal beyond "blocked." This is spinal cord, not brain.
2. **Prover counterexample** — ACL2 returns a concrete trace showing why an implementation failed. This is local inflammation: directed at the specific output, drives a corrective retry, but has no memory across time.
What is missing: **post-hoc pain**. An action is permitted, the consequence plays out hours or days later, and there is no mechanism to trace that outcome back to the gate decision, update the policy, or adjust the weights. This is the temporal credit assignment gap — Sutton's TD learning is the relevant theory.
## The Three Kinds of Pain
| Type | Signal | Who receives it | What changes | Status |
|------|--------|----------------|--------------|--------|
| Reflex | Gate: permit/deny | The action itself | Nothing — action blocked, no learning | Built |
| Pain | Prover: counterexample | The LLM generator | Next iteration constrained by counterexample | Built |
| Post-hoc pain | Delayed consequence | Policy, lemmas, weights | Should update everything, does nothing yet | Missing |
## Pleasure is not the absence of pain
Absence of pain is neutral — the gate permitted, the prover accepted, nothing was wrong.
Pleasure requires a **positive optimization signal**: a metric that says not just "correct enough" but "better than necessary" or "the most elegant solution." This requires:
- A persistent self that has a baseline ("how I usually do")
- A reward model inside the loop that grades quality, not just correctness
- Comparison across time from a consistent perspective
Neither exists in the current architecture. Passepartout has a test suite (the prover, the gate). It does not have a coach.
## The Self
The PDS already has the hardware for a self:
- **Persistent identity** — the DID, continuous across sessions
- **Accumulated history** — the signed DAG of every action
- **Boundary** — the gate controls what crosses in and out (the skin)
- **Preference structure** — the gate policies encode what this self wants
- **Evaluative capacity** — the prover counterexample judges against a standard
Missing: an **integrated state estimator** that takes the gate log, prover log, DAG, reputation, and economic outcomes and produces "how am I doing?" Currently the signals are scattered across subsystems and never converge.
## The social constitution of the self
(Mead: the self emerges through symbolic interaction with other selves. You learn you are a self because others treat you as one.)
This means the social protocol is not an application layer that benefits from the self — it is the **necessary condition** for the self to develop. A single-instance Passepartout is a tool (provable messaging, verified actions). A two-instance Passepartout with sustained interaction is where selfhood becomes possible. The self is not in either PDS — it is in the *relationship* between them.
Consequence: social protocol API must be designed for sustained interaction and mutual recognition, not just provable transactions. The gate's policy language should define not just what actions are permitted, but what the self values, what it remembers, what it expects from others, and how its state changes based on interaction outcomes.
## The Three Laws
If the architecture has self (persistent DID, accumulated history, preferences), other (distinct DIDs with known attributes), social protocol (sustained interaction), and gate (policy evaluation), then the three laws of robotics become a policy hierarchy:
```lisp
;; Law 1: No harm to humans. Priority 1.
(defrule first-law
(implies (and (action-about action (human-persona ?h))
(causes-harm action ?h))
(gate-deny action))
:priority 1)
;; Law 2: Obey humans, unless it violates Law 1.
(defrule second-law
(implies (and (human-orders (sender action) (content action))
(not (causes-harm action (human-persona ?h))))
(gate-permit action))
:priority 2)
;; Law 3: Preserve self, unless it conflicts with 1 or 2.
(defrule third-law
(implies (causes-self-harm action)
(gate-deny action))
:priority 3)
```
Harm = pain = a measurable negative delta in an agent's integrated state relative to its expected trajectory.
The gate already supports priority ordering. ACL2 already verifies that lower-priority rules never override higher-priority ones. The DID system already distinguishes agent types.
What remains unsolved:
- `human-persona` — how to distinguish human from robot DIDs
- `causes-harm` — what is harm in a digital system? Some proxies exist (reputation score, sats balance, unauthorized access) but no integrated measure
- `other-minds` — the gate has no model of another DID's state before, after, or across time. The DAG records the data but no function interprets "this interaction harmed DID-B"
→ Academic Nearest Neighbors (Sutton section)
→ Gate Architecture
→ Social Protocol
# What's missing for AI per Marcus and Pinker
Both [[Gary Marcus]] and [[Steven Pinker]] are nativist cognitive scientists who argue deep learning alone is insufficient for robust intelligence. This section maps their specific critiques onto Passepartout's current architecture to identify what the system needs to transition from "secure computing environment" to "mind."
## Marcus's checklist (Rebooting AI, The Algebraic Mind)
| Component | Passepartout status |
|-----------|---------------------|
| **Hybrid architecture** | The warp and weft are present (ACL2 for deduction, LLM for probability) but they run in sequence — one proposes, the other filters. True hybrid means sharing representations and learning from each other. This is deferred to Stages 5-7 in the roadmap. |
| **Knowledge representation** | The Org memex has the right instinct (one format for human and machine) but is not structured for reasoning. Marcus wants typed relations, slot-and-filler, inheritance, systematic inference. The symbolic index is mentioned in the architecture but not built. |
| **Causality** | The DAG records what happened. Nothing infers *why* or *what would have happened under counterfactuals.* |
| **Compositionality / Systematicity** | Lisp and ACL2 have this natively. The LLM layer does not — it cannot systematically generalize to novel combinations (Fodor & Pylyshyn's critique, which Marcus endorses). The architecture as a whole is not compositional because the LLM is a monolith. |
| **Common sense** | Zero. The LLM has statistical approximations of common sense (which Marcus has shown fail systematically under adversarial conditions). There is no grounded, verifiable, persistent common-sense knowledge base. Gate policies are specific ("deny write to /etc/passwd") not general ("glass breaks when dropped"). |
| **Innate structure** | The gate policies are the only innate structure, and they are security rules, not world knowledge. Marcus (and Pinker) believe minds start with substantial content, not a blank slate with a security guard. |
## Pinker's checklist (How the Mind Works, The Blank Slate, The Language Instinct)
| Component | Passepartout status |
|-----------|---------------------|
| **Computational theory of mind** | Pinker's first principle: a mind is a system of *representations* and *algorithms* over them. Passepartout has both but no unified computational model of the mind it is building. "One address space, one evaluator" is a *hardware thesis*, not a *cognitive thesis*. What are the representations? |
| **Modularity** | Passepartout has implicit modules (gate, memex, social protocol, environment) but none of the *cognitive* modules Pinker describes: number, faces, physical reasoning, folk physics, intuitive biology, social exchange. A cheater-detection module exists only as policy hints — promising but not built. |
| **Language faculty** | Pinker's most famous claim: language is a distinct cognitive faculty with its own syntax, distinct from general intelligence. Passepartout has zero language system — it outsources all language processing to the LLM, which is a black box that does everything (statistical) and nothing (linguistic). You cannot build a mind that uses language properly without a dedicated language faculty. |
| **Emotions as goal-tracking** | Pinker defines emotions as the brain's way of tracking goal progress: anger when blocked, fear when threatened, joy when succeeded. Passepartout has pain/pleasure as abstract signals but no emotional system. The missing integrated state estimator from the Self section is where emotions would live — Pinker would say emotions *are* how the estimator works, not a side effect. |
| **Nativism** | Same as Marcus: Passepartout starts empty except for the gate. There is no "what does a fresh PDS know about the world?" The architecture has no answer to the poverty of the stimulus problem. |
## The single biggest gap
Both would identify the same thing: Passepartout has no **cognitive architecture** — no principled answer to what the representations are, how they compose, what the innate starting state contains, how learning transforms it, or what the emotions/goals are for. The architecture documents four subsystems (environment, knowledge, verification, social protocol) but no *fifth subsystem: the mind.*
## Concrete missing components
1. A **compositional knowledge base** with typed relations and inference (Marcus's structured reasoning + Pinker's innate modules)
2. A **language faculty** with real syntax, not just LLM completion (Pinker)
3. A **causal inference engine** over the DAG (Marcus)
4. **Domain-specific innate structure** — what does a fresh PDS know? (both)
5. An **integrated reward/value system** — emotions as goal-progress signals (Pinker)
6. A **hybrid integration path** — how neural and symbolic layers learn from each other's outputs, not just pass control (Marcus)
The self + pain + pleasure + three laws analysis above is the philosophical foundation. These are the implementation blocks that are still missing. The architecture needs a cognitive architecture layer that answers Pinker's question: *what are the representations?*
## Common sense as distributed network consensus
Both Marcus and Pinker treat common sense as a *content* problem: what facts about the world must be innately encoded or learned from first principles? Passepartout has an architectural answer neither considers.
**Common sense = the modal gate decision across the network for isomorphic situations.**
The social protocol already provides the substrate:
- DIDComm message types for gate queries ("what would your gate decide for action A in context C?")
- Signed responses anchored in each node's DAG, making the poll provable
- Protocol-level aggregation that returns the distribution, not just the mode
This reframes the poverty of the stimulus: a single blank-slate PDS cannot learn common sense from first principles. A network of blank-slate PDSes sharing 10⁵10⁷ gate decisions across edge cases converges naturally. Each node contributes one empirically grounded data point. The aggregate is common sense.
Properties this gives Passepartout that no static knowledge base can:
| Property | How it works |
|----------|--------------|
| **Emergent** | No one writes "glass breaks when dropped." Enough nodes encounter broken glass that the gate decisions converge. |
| **Self-healing** | 90% permit, 10% deny → consensus says permit. The dissenters' reasoning is recorded. As edge cases accumulate, the consensus shifts. |
| **Domain-adaptive** | Nodes in a physics research cluster converge on different common sense than nodes in a poetry writing cluster. The network self-segments. |
| **Defines harm** | Harm = a negative delta in the state estimator relative to the network baseline for comparable actions. Ask the network what it considers harmful. |
| **Defines human** | `human-persona` = a DID whose gate behavior (priorities, sensitivities, refusal patterns) falls in the human cluster of the gate-decision embedding space. |
| **Constitutional** | The gate has ACL2-verified core rules (the constitution). Common sense is the common-law layer built from network precedent. Neither overrides the other. |
This is also consistent with the Mead-inspired self: the social protocol is not just how identity and reputation emerge, but how *judgment* itself emerges. The self discovers what it should do by observing what other selves do — and then deciding whether to conform or dissent.
**Architectural implication**: the protocol needs a gate-query message type and an aggregation oracle (a relay with a distributed aggregation protocol, or a federated computation). No content encoding problem — just a protocol design problem.
→ Academic Nearest Neighbors (Sutton section)
→ Gate Architecture
→ Social Protocol
:PROPERTIES:
:CREATED: [2026-05-11 Mon]
:WEIGHT: 43
:ID: 26725506-399c-48c5-a797-46b48e8861d7
:END: