Files
passepartout/skills/org-skill-embedding.org

4.3 KiB

SKILL: Vector Embedding (Universal Literate Note)

Overview

The Vector Embedding skill provides semantic search and vectorization capabilities to the org-agent. It decouples the specific embedding algorithms and provider-specific API calls from the core kernel.

Phase A: Demand (PRD)

1. Purpose

Provide a standardized interface for converting text into vector representations and performing similarity searches.

2. User Needs

  • Text Vectorization: Convert Org-mode content into high-dimensional vectors.
  • Similarity Search: Find semantically related nodes in the Object Store.
  • Provider Agnosticism: Support multiple embedding models (Gemini, OpenAI, etc.).

3. Success Criteria

  • Successfully retrieve embeddings from a configured provider.
  • Perform cosine similarity calculations between vectors.
  • Register as a hot-reloadable skill.

Phase B: Blueprint (PROTOCOL)

1. Architectural Intent

Move heavy neural and mathematical logic out of `core.lisp` and `probabilistic.lisp` into a dedicated skill.

2. Semantic Interfaces

(defun get-embedding (text)
  "Retrieves a vector representation of text via the configured neural provider.")

(defun cosine-similarity (v1 v2)
  "Calculates the semantic distance between two vectors.")

(defun find-most-similar (query-vector top-k)
  "Identifies the top-k most semantically related objects in the store.")

Phase D: Build (Implementation)

Vector Operations

(in-package :org-agent)

(defun get-embedding (text)
  "Retrieves a vector representation of text via the configured neural provider."
  (let* ((auth (get-provider-auth :gemini)) 
         (api-key (getf auth :api-key))
         (endpoint "https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:embedContent"))
    (unless api-key 
      (harness-log "EMBEDDING ERROR: No API key for :gemini")
      (return-from get-embedding nil))
    (let* ((url (format nil "~a?key=~a" endpoint api-key)) 
           (headers `(("Content-Type" . "application/json")))
           (body (cl-json:encode-json-to-string 
                  `((model . "models/text-embedding-004") 
                    (content . ((parts . ((text . ,text)))))))))
      (handler-case 
          (let* ((response (dex:post url :headers headers :content body))
                 (json (cl-json:decode-json-from-string response))
                 (embedding (getf (getf json :embedding) :values)))
            embedding)
        (error (c) 
          (harness-log "EMBEDDING FAILURE: ~a" c) 
          nil)))))

(defun dot-product (v1 v2) 
  "Calculates the dot product of two numerical vectors."
  (reduce #'+ (mapcar #'* v1 v2)))

(defun magnitude (v) 
  "Calculates the Euclidean magnitude of a numerical vector."
  (sqrt (reduce #'+ (mapcar (lambda (x) (* x x)) v))))

(defun cosine-similarity (v1 v2)
  "Calculates the semantic distance between two vectors."
  (let ((m1 (magnitude v1)) 
        (m2 (magnitude v2))) 
    (if (or (zerop m1) (zerop m2)) 0 (/ (dot-product v1 v2) (* m1 m2)))))

(defun find-most-similar (query-vector top-k)
  "Identifies the top-k most semantically related objects in the store."
  (let ((similarities nil))
    (maphash (lambda (id obj) 
               (declare (ignore id))
               (let ((vec (org-object-vector obj))) 
                 (when vec 
                   (push (cons (cosine-similarity query-vector vec) obj) similarities)))) 
             *object-store*)
    (let ((sorted (sort similarities #'> :key #'car))) 
      (subseq sorted 0 (min top-k (length sorted))))))

Registration

(defskill :skill-embedding
  :priority 50
  :trigger (lambda (ctx) (eq (getf (getf ctx :payload) :sensor) :embedding-request))
  :probabilistic nil
  :deterministic (lambda (action ctx)
              (declare (ignore ctx))
              (case (getf action :action)
                (:get-embedding (get-embedding (getf action :text)))
                (:similarity (cosine-similarity (getf action :v1) (getf action :v2)))
                (t action))))