Files
hermes-brain/ideas/evaluation-harness.org
Hermes 3e32ea9959 Promote entire passepartout-economics/ to ideas/ root
All 31 files from ideas/passepartout-economics/ promoted to ideas/ root.
- Subfolder's passepartout-economics.org (42-line index) renamed to
  triad-index.org to avoid collision with root-level full doc
- index.org removed (redundant — triad-index.org replaces it)
- Root-level passepartout-economics.org: stripped file:passepartout-economics/
  prefix from all cross-references (now simple file:foo.org links)
- compliance-framework-mapping.org: same prefix cleanup
- All internal file: links within the economics docs already used simple
  names (no prefix) — they resolve correctly from ideas/ root
2026-05-23 06:09:08 +00:00

18 lines
1.4 KiB
Org Mode

:PROPERTIES:
:ID: 45258a2d-1675-562c-9024-5d1eb2f1ea56
:END:
#+title: Evaluation Harness as Certification Service
#+filetags: :passepartout:revenue:certification:evaluation:regression:
The accumulated regression suite — thousands of edge cases from every deployed instance, every bug fix, every regulatory change — becomes the most comprehensive test of autonomous agent correctness.
**Service:** "Run our 10,000-task suite against your AI agent and get a Merkle-verified score."
**Target:** AI labs proving their agents' capabilities, enterprise procurement requiring independent verification.
**Price:** $50K-$200K per certification.
The regression suite grows with every deployment, making the certification increasingly valuable over time. The early player's suite is the largest because they started first. This is the [[file:collective-regression-suite.org][collective regression suite]] mechanism in action.
10 certifications in year one = $500K-$2M.
Long-term endpoint: this becomes the UL certification for AI — a third-party verification nobody can ignore. [[file:verification-monopoly.org][The verification monopoly]]. The certification relies on a [[file:verification-appliance.org][verification appliance]] to run the tests in a trusted environment, creating [[file:infrastructure-lock-in.org][infrastructure lock-in]] as certification history accumulates on the platform. These dynamics form powerful [[file:moats.org][moats]].