gbrain: sync converted org-mode brain files

2026-05-29 03:00:48 +00:00
parent af132f7e88
commit 2f1aacd39c
6 changed files with 388 additions and 1 deletions
--- a/concepts/academic-nearest-neighbors.org
+++ b/concepts/academic-nearest-neighbors.org
@@ -57,6 +57,12 @@ Schafmeister is aligned with Passepartout on the "why Lisp" question — interac

 Both prove the viability of the autonomous loop concept but use the weakest possible verifiers (execution and empirical metrics).

+**The Bitter Lesson / Temporal Credit Assignment (Sutton)**
+
+| Researcher | Institution | System | Match | Divergence |
+|------------|-------------|--------|-------|------------|
+| Richard Sutton | Alberta / Keen Technologies | TD learning, eligibility traces, Alberta Plan | The fundamental problem in verification — *an action was checked, but the consequence plays out hours later; was the action correct?* — is the same problem TD learning solves in RL: assigning credit to actions based on delayed outcomes. Sutton's temporal credit assignment work is the theory you would need to extend Passepartout from per-action gates to trajectory-level verification. His Bitter Lesson (scale beats engineered knowledge at sufficient compute) is the most commonly cited argument against the symbolic verification approach Passepartout bets on. | The Bitter Lesson is not anti-knowledge — it says methods that improve with more computation eventually dominate. Passepartout's gate is a deliberately small engineered knowledge system that *won't* benefit from more compute (the ACL2 lemmas don't get more correct with more hardware). That's acceptable because the gate is a narrow bottleneck (permit/deny). The LLM layer inside the gate *does* benefit from scale. The architecture already respects the Bitter Lesson by placing the scalable piece where scale helps and the non-scalable piece where deductive certainty matters. Sutton's Alberta Plan (world model + reward + learning algorithm) parallels Passepartout's Stage 6 (world model + gate + verified fine-tuning), but Sutton's agents learn by pure reward while Passepartout's learn by reward constrained by verified policy. Sutton would likely argue that a learned safety policy at scale would outcompete the gate. Passepartout's bet is that access control, message authentication, and compliance should never be probabilistic, even at infinite scale.
+
 **Integrate-Symbolic-Into-Neural (Garcez)**

 | Researcher | Institution | System | Match | Divergence |