per-domain flip: knowledge types and fastest acquisition

- Sufficiency flip is per-domain, not global. Poetry never flips. - Three knowledge types: structural (published rules), empirical (observations), performance (profiling data) - Fastest acquisition: active sandboxed probing, contrastive queries to human (not waiting for HITL to accumulate), ontology transfer from related domains, benchmark harness - Codified domain: flip within days (hours LLM + hours expert review) - Uncodified learnable domain: flip within weeks (probe + real use) - Never-flip domains: system is honest, LLM handles 100%
2026-05-21 18:55:11 +00:00
parent b5d59c3360
commit 2d0d6d478a
1 changed files with 71 additions and 0 deletions
--- a/ideas/passepartout-economics.org
+++ b/ideas/passepartout-economics.org
@@ -779,6 +779,77 @@ whether the bootstrap succeeds. A patient operator with a
 destination as an operator with bottomless API credits — the
 second arrives faster, but both arrive.

+** The per-domain sufficiency flip
+
+The sufficiency flip is not a single event. It happens
+independently for each domain, and some domains never flip.
+The flip point is determined by the kind of knowledge the
+domain requires.
+
+*** Knowledge types required for a flip
+
+| Domain | Knowledge required | Can it flip? | How fast? |
+|--------|-------------------|--------------|-----------|
+| Shell safety, path rules | Structural only — the deployment's config, shell semantics | Immediately | Instant — ingested from config |
+| Healthcare compliance | Structural (HIPAA text) + empirical (human reviews of edge cases) | Yes | Weeks — one pass of LLM translation + human review cycle |
+| Codebase refactoring | Structural (dependency graph, API surface) + empirical (test suite, build results) + performance (latency, throughput) | Yes | Months — depends on how many build cycles the system has observed |
+| Microcode optimization | Structural (RISC-V ISA, core topology) + performance (profiling data) | Yes | Weeks — after enough benchmark runs to characterize hardware |
+| Poetry, creative writing | Neither structural rules nor empirical ground truth (beauty is subjective) | **Never** | N/A — the gate stack cannot verify aesthetic quality |
+| Novel scientific discovery | Structural (known laws) + empirical (experiments) | Eventually | Years — requires experimental data the system must gather through instruments |
+
+*** The fastest acquisition strategy per domain
+
+The goal: reach the flip point with the fewest calendar days
+and the fewest human hours.
+
+| Knowledge type | Acquisition strategy | Calendar time | Human time |
+|----------------|---------------------|---------------|------------|
+| **Structural** (published rules, configs, specs) | LLM translation of source documents + ACL2 consistency verification + one-shot human review | Hours for the LLM pass, days for human review | Days — one domain expert reviewing the output |
+| **Structural** (unpublished — your deployment, your codebase) | Automated scanning — the system walks your filesystem, reads your configs, builds the dependency graph | Minutes to hours | Zero — fully automated |
+| **Empirical** (what happens when X?) | **Active probing** — the system does not wait for user interactions. It probes its own environment: runs shell commands in sandbox to verify gate rules, executes test suites to verify dependency graph, measures what happens when it pushes boundaries. | Hours to days | Zero — automated sandboxed probing |
+| **Empirical** (what does the human prefer?) | **Contrastive queries** — instead of waiting for HITL approvals to accumulate, the system asks targeted questions: "Which of these two interpretations of the regulation is correct? This one or this one?" Each question produces a fact. | Days — batch of targeted questions answered in one session | Hours — one review session with the domain expert |
+| **Performance** (latency, throughput) | **Benchmark harness** — the system runs its own workload at varying parameters, records timing data, stores it as facts with `:provenance :benchmark` | Hours — automated sweep | Zero |
+| **Transfer** from related domains | **Ontology alignment** — if the system already knows compliance for GDPR, it can ask Screamer: "Which GDPR rules have the same structure as HIPAA rules? Use those as seed hypotheses, flag for human review." The transferred rules flip faster because Screamer starts from a consistent foundation, not from scratch. | Days — Screamer finds alignments automatically, human reviews the suggestions | Hours — review the cross-domain alignment suggestions |
+
+*** The fastest path to flip any domain
+
+1. **Ingest all published text** for the domain (laws, specs, configs)
+   via LLM translation. One pass. Hours.
+
+2. **Run the benchmark harness** to measure the system's own performance
+   in the domain. One sweep. Hours.
+
+3. **Run active sandboxed probes** — test each gate rule against
+   synthetic inputs to verify it behaves as expected. Automated.
+
+4. **Generate contrastive queries** — Screamer identifies the 5%
+   of rules where the LLM's translation is most uncertain (contradicts
+   a transferred rule, has no precedent, has multiple valid
+   interpretations). Present these as yes/no questions to the human
+   domain expert in a single session.
+
+5. **Start serving real interactions.** Every gate outcome generates
+   an empirical fact that Screamer feeds back into the rule set.
+   The empirical loop tightens from the first real interaction.
+
+For a codified domain (healthcare compliance, financial regulation,
+industrial safety): flip within days of step 1-4. The only bottleneck
+is the domain expert's review session in step 4 — a few hours of
+human time.
+
+For an uncodified but learnable domain (codebase refactoring):
+flip within weeks of step 3 (benchmark harness) + step 5 (real
+interactions). No LLM translation needed — the knowledge comes
+from the system probing its own environment.
+
+For a domain that can never flip (poetry, aesthetics): the system
+never reaches sufficiency. It never claims to. The TUI shows
+"Symbolic index: 0 facts. This domain has no codifiable rules."
+The LLM handles 100% of poetry interactions. The gate stack
+only checks for safety (no shell commands, no file deletions)
+and passes through everything else to the LLM. The system is
+honest about its frontier.
+
 Large refactoring projects (extract module, rename API, split monolith)
 are the hardest test for any AI agent. Current approaches (Claude Code,
 Copilot) handle them probabilistically — every step costs tokens, and