Files
hermes-brain/ideas/evaluation-harness.org
Hermes cc3976fb7f ideas: editorial sweep — atomization, interlinking, restructuring
- Split competitive-analysis-2026-05.org → TOC + 9 competitor files in
  ideas/competitors/. Dropped date from filename. All competitor UUIDs
  generated, TOC keeps original UUID for backlink continuity.
- Deleted passepartout-economics.org archive (replaced by 27-node KB).
- Inlined 5 'See also' blocks into natural prose (compliance-index,
  first-mover-window, revenue-table, orders-of-magnitude-time,
  native-org-knowledge-base).
- Linked 7 orphan compliance pages back to compliance index + finished
  truncated sentences.
- Linked all 14 Agora requirement docs from topic-relevant pages
  (identity→lisp-machine-security, infrastructure→compute-marketplace,
  social-space→growth-strategy, exchange→agora-contracts, etc.).
- Linked ai-industry-impact from investment-thesis, sufficiency-flip,
  verification-appliance, effects-growth-flywheel (up from 1 to 10+ pages).
- Fixed CREATED timestamps to use git commit dates instead of today.
- Made all links absolute from root (no port inheritance).
- Removed stale agora/docs/ duplicate content.
2026-05-24 16:25:55 +00:00

19 lines
1.5 KiB
Org Mode

:PROPERTIES:
:CREATED: [2026-05-24 Sun]
:ID: 45258a2d-1675-562c-9024-5d1eb2f1ea56
:END:
#+title: Evaluation Harness as Certification Service
#+filetags: :passepartout:revenue:certification:evaluation:regression:
The accumulated regression suite — thousands of edge cases from every deployed instance, every bug fix, every regulatory change — becomes the most comprehensive test of autonomous agent correctness.
**Service:** "Run our 10,000-task suite against your AI agent and get a Merkle-verified score."
**Target:** AI labs proving their agents' capabilities, enterprise procurement requiring independent verification.
**Price:** $50K-$200K per certification.
The regression suite grows with every deployment, making the certification increasingly valuable over time. The early player's suite is the largest because they started first. This is the [[id:a5d59d12-b23e-58d6-a81b-9b8b06556949][collective regression suite]] mechanism in action.
10 certifications in year one = $500K-$2M.
Long-term endpoint: this becomes the UL certification for AI — a third-party verification nobody can ignore. [[id:827bc546-e887-5b7c-9b65-6392beaf0920][The verification monopoly]]. The certification relies on a [[id:84a537b4-4256-50c8-91f5-dd5b4538418f][verification appliance]] to run the tests in a trusted environment, creating [[id:2f783eb4-638e-5afa-9b59-6224d086a712][infrastructure lock-in]] as certification history accumulates on the platform. These dynamics form powerful [[id:aa6d062e-a520-5d14-8773-00687ed9c689][moats]].