From 2d0d6d478afd3992f9915895090badf1a4ead141 Mon Sep 17 00:00:00 2001 From: Hermes Date: Thu, 21 May 2026 18:55:11 +0000 Subject: [PATCH] per-domain flip: knowledge types and fastest acquisition - Sufficiency flip is per-domain, not global. Poetry never flips. - Three knowledge types: structural (published rules), empirical (observations), performance (profiling data) - Fastest acquisition: active sandboxed probing, contrastive queries to human (not waiting for HITL to accumulate), ontology transfer from related domains, benchmark harness - Codified domain: flip within days (hours LLM + hours expert review) - Uncodified learnable domain: flip within weeks (probe + real use) - Never-flip domains: system is honest, LLM handles 100% --- ideas/passepartout-economics.org | 71 ++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) diff --git a/ideas/passepartout-economics.org b/ideas/passepartout-economics.org index 0c8cec8..1a80de8 100644 --- a/ideas/passepartout-economics.org +++ b/ideas/passepartout-economics.org @@ -779,6 +779,77 @@ whether the bootstrap succeeds. A patient operator with a destination as an operator with bottomless API credits — the second arrives faster, but both arrive. +** The per-domain sufficiency flip + +The sufficiency flip is not a single event. It happens +independently for each domain, and some domains never flip. +The flip point is determined by the kind of knowledge the +domain requires. + +*** Knowledge types required for a flip + +| Domain | Knowledge required | Can it flip? | How fast? | +|--------|-------------------|--------------|-----------| +| Shell safety, path rules | Structural only — the deployment's config, shell semantics | Immediately | Instant — ingested from config | +| Healthcare compliance | Structural (HIPAA text) + empirical (human reviews of edge cases) | Yes | Weeks — one pass of LLM translation + human review cycle | +| Codebase refactoring | Structural (dependency graph, API surface) + empirical (test suite, build results) + performance (latency, throughput) | Yes | Months — depends on how many build cycles the system has observed | +| Microcode optimization | Structural (RISC-V ISA, core topology) + performance (profiling data) | Yes | Weeks — after enough benchmark runs to characterize hardware | +| Poetry, creative writing | Neither structural rules nor empirical ground truth (beauty is subjective) | **Never** | N/A — the gate stack cannot verify aesthetic quality | +| Novel scientific discovery | Structural (known laws) + empirical (experiments) | Eventually | Years — requires experimental data the system must gather through instruments | + +*** The fastest acquisition strategy per domain + +The goal: reach the flip point with the fewest calendar days +and the fewest human hours. + +| Knowledge type | Acquisition strategy | Calendar time | Human time | +|----------------|---------------------|---------------|------------| +| **Structural** (published rules, configs, specs) | LLM translation of source documents + ACL2 consistency verification + one-shot human review | Hours for the LLM pass, days for human review | Days — one domain expert reviewing the output | +| **Structural** (unpublished — your deployment, your codebase) | Automated scanning — the system walks your filesystem, reads your configs, builds the dependency graph | Minutes to hours | Zero — fully automated | +| **Empirical** (what happens when X?) | **Active probing** — the system does not wait for user interactions. It probes its own environment: runs shell commands in sandbox to verify gate rules, executes test suites to verify dependency graph, measures what happens when it pushes boundaries. | Hours to days | Zero — automated sandboxed probing | +| **Empirical** (what does the human prefer?) | **Contrastive queries** — instead of waiting for HITL approvals to accumulate, the system asks targeted questions: "Which of these two interpretations of the regulation is correct? This one or this one?" Each question produces a fact. | Days — batch of targeted questions answered in one session | Hours — one review session with the domain expert | +| **Performance** (latency, throughput) | **Benchmark harness** — the system runs its own workload at varying parameters, records timing data, stores it as facts with `:provenance :benchmark` | Hours — automated sweep | Zero | +| **Transfer** from related domains | **Ontology alignment** — if the system already knows compliance for GDPR, it can ask Screamer: "Which GDPR rules have the same structure as HIPAA rules? Use those as seed hypotheses, flag for human review." The transferred rules flip faster because Screamer starts from a consistent foundation, not from scratch. | Days — Screamer finds alignments automatically, human reviews the suggestions | Hours — review the cross-domain alignment suggestions | + +*** The fastest path to flip any domain + +1. **Ingest all published text** for the domain (laws, specs, configs) + via LLM translation. One pass. Hours. + +2. **Run the benchmark harness** to measure the system's own performance + in the domain. One sweep. Hours. + +3. **Run active sandboxed probes** — test each gate rule against + synthetic inputs to verify it behaves as expected. Automated. + +4. **Generate contrastive queries** — Screamer identifies the 5% + of rules where the LLM's translation is most uncertain (contradicts + a transferred rule, has no precedent, has multiple valid + interpretations). Present these as yes/no questions to the human + domain expert in a single session. + +5. **Start serving real interactions.** Every gate outcome generates + an empirical fact that Screamer feeds back into the rule set. + The empirical loop tightens from the first real interaction. + +For a codified domain (healthcare compliance, financial regulation, +industrial safety): flip within days of step 1-4. The only bottleneck +is the domain expert's review session in step 4 — a few hours of +human time. + +For an uncodified but learnable domain (codebase refactoring): +flip within weeks of step 3 (benchmark harness) + step 5 (real +interactions). No LLM translation needed — the knowledge comes +from the system probing its own environment. + +For a domain that can never flip (poetry, aesthetics): the system +never reaches sufficiency. It never claims to. The TUI shows +"Symbolic index: 0 facts. This domain has no codifiable rules." +The LLM handles 100% of poetry interactions. The gate stack +only checks for safety (no shell commands, no file deletions) +and passes through everything else to the LLM. The system is +honest about its frontier. + Large refactoring projects (extract module, rename API, split monolith) are the hardest test for any AI agent. Current approaches (Claude Code, Copilot) handle them probabilistically — every step costs tokens, and