Files
hermes-brain/ideas/passepartout-economics/evaluation-harness.org
Hermes 303e8c6306 Convert cross-references from [[id:uuid]] to [[file:name.org]]
All 117 inter-node links now use [[file:node-name.org][title]] format
which renders as clickable hyperlinks in both Emacs (C-c C-o) and
web-based org renderers (Gitea, GitHub). Each node retains its :ID:
UUID property for Emacs org-roam database features (backlinks,
capturing, node-find).

Prev format: [[id:uuid][title]] — Emacs only, dead text on web
New format:  [[file:name.org][title]] — works everywhere
2026-05-21 19:40:54 +00:00

1.2 KiB

Evaluation Harness as Certification Service

The accumulated regression suite — thousands of edge cases from every deployed instance, every bug fix, every regulatory change — becomes the most comprehensive test of autonomous agent correctness.

Service: "Run our 10,000-task suite against your AI agent and get a Merkle-verified score." Target: AI labs proving their agents' capabilities, enterprise procurement requiring independent verification. Price: $50K-$200K per certification.

The regression suite grows with every deployment, making the certification increasingly valuable over time. The early player's suite is the largest because they started first.

10 certifications in year one = $500K-$2M.

Long-term endpoint: this becomes the UL certification for AI — a third-party verification nobody can ignore. The verification monopoly.

See also: Verification appliance, Infrastructure lock-in, Moats