:PROPERTIES: :ID: llm-gateway-skill :CREATED: [2026-04-09 Thu] :EDITED: [2026-04-11 Sat] :END: #+TITLE: SKILL: Unified LLM Gateway (Universal Literate Note) #+STARTUP: content #+FILETAGS: :llm:gateway:infrastructure:psf: #+DEPENDS_ON: id:credentials-vault-skill * Overview The *Unified LLM Gateway* is the single sensory and reasoning interface for all neural backends. It consolidates the previously fragmented provider skills into a high-integrity dispatch layer, standardizing credential management, error handling, and payload formatting. * Phase A: Demand (PRD) :PROPERTIES: :STATUS: SIGNED :END: ** 1. Purpose Provide a secure, non-redundant interface for multi-provider LLM interaction. ** 2. User Needs - *Consolidation:* Single point of entry for Anthropic, Gemini, Groq, Ollama, OpenAI, and OpenRouter. - *Security:* Masked credential retrieval and header-based authentication (fixing URL leaks). - *Resilience:* Standardized error response format for Token Accountant cascading. - *Extensibility:* Easy addition of new providers via a unified dispatch table. * Phase B: Blueprint (PROTOCOL) :PROPERTIES: :STATUS: SIGNED :END: ** 1. Architectural Intent The gateway utilizes a functional dispatch pattern. A single entry point, `execute-llm-request`, resolves the provider-specific nuances (URLs, headers, JSON structures) while exposing a uniform interface to the harness. ** 2. Semantic Interfaces #+begin_src lisp (defun execute-llm-request (prompt system-prompt &key provider model) "Executes a neural request. Returns (:status :success :content ...) or (:status :error :message ...).") #+end_src * Phase C: Success (QUALITY) :PROPERTIES: :STATUS: SIGNED :END: ** 1. Success Criteria - [ ] *Credential Safety:* API keys are never logged or hardcoded. - [ ] *Header Integrity:* Correct headers (x-api-key, Bearer) for each provider. - [ ] *Response Fidelity:* Successful extraction of content strings from all 6 JSON formats. - [ ] *Resilience:* Standardized error return on timeout or 4xx/5xx responses. ** 2. TDD Plan Verification will occur via `tests/llm-gateway-tests.lisp` using the FiveAM framework. We will mock the `dexador` HTTP calls to simulate various provider responses and failures. * Phase D: Build (Implementation) ** Package Context #+begin_src lisp :tangle ../src/llm-gateway.lisp (in-package :org-agent) #+end_src ** Nested Extraction Helper (get-nested) A robust utility to navigate deeply nested JSON alists produced by `cl-json`, handling both objects and arrays. #+begin_src lisp :tangle ../src/llm-gateway.lisp (defun get-nested (alist &rest keys) "Recursively extracts nested values from an alist, handling both objects and arrays." (let ((val alist)) (dolist (k keys) ;; If val is an array (a list where the first element is a list but NOT a pair), ;; descend into the first element. (when (and (listp val) (listp (car val)) (not (keywordp (caar val)))) (setf val (car val))) (let ((pair (assoc k val))) (if pair (setf val (cdr pair)) (return-from get-nested nil)))) val)) #+end_src ** Unified Request Executor (execute-llm-request) This is the primary actuator for neural reasoning. It handles the specific JSON payload formats and HTTP headers required by each provider. It retrieves secrets from the [[file:org-skill-credentials-vault.org][Credentials Vault]], ensuring that API keys are masked in all diagnostic output. #+begin_src lisp :tangle ../src/llm-gateway.lisp (defun execute-llm-request (prompt system-prompt &key provider model) "Unified entry point for all LLM providers." (let ((api-key (vault-get-secret provider :type :api-key)) (full-prompt (format nil "~a~%~%Prompt: ~a" system-prompt prompt))) (harness-log "PROBABILISTIC ENGINE: Requesting ~a (Model: ~a) [Key: ~a]" provider (or model "default") (vault-mask-string api-key)) (case provider (:gemini-web (let ((res (uiop:symbol-call :org-agent.skills.org-skill-web-research :ask-gemini-web full-prompt))) (if res (list :status :success :content res) (list :status :error :message "Web Research Failure")))) (:ollama (let* ((host (or (uiop:getenv "OLLAMA_HOST") "localhost:11434")) (url (format nil "http://~a/api/generate" host)) (body (cl-json:encode-json-to-string `((model . ,(or model "llama3")) (prompt . ,full-prompt) (stream . :false))))) (handler-case (let* ((response (dex:post url :headers '(("Content-Type" . "application/json")) :content body :connect-timeout 5 :read-timeout 60)) (json (cl-json:decode-json-from-string response))) (list :status :success :content (cdr (assoc :response json)))) (error (c) (list :status :error :message (format nil "Ollama Failure: ~a" c)))))) (t ;; Cloud Providers (Anthropic, Gemini API, Groq, OpenAI, OpenRouter) (when (or (null api-key) (string= api-key "")) (return-from execute-llm-request (list :status :error :message (format nil "API Key missing for ~a" provider)))) (let* ((endpoint (case provider (:anthropic "https://api.anthropic.com/v1/messages") (:gemini-api (format nil "https://generativelanguage.googleapis.com/v1/models/~a:generateContent" (or model "gemini-1.5-flash-latest"))) (:groq "https://api.groq.com/openai/v1/chat/completions") (:openai "https://api.openai.com/v1/chat/completions") (:openrouter "https://openrouter.ai/api/v1/chat/completions"))) (headers (case provider (:anthropic `(("Content-Type" . "application/json") ("x-api-key" . ,api-key) ("anthropic-version" . "2023-06-01"))) (:gemini-api `(("Content-Type" . "application/json") ("x-goog-api-key" . ,api-key))) (:openrouter `(("Content-Type" . "application/json") ("Authorization" . ,(format nil "Bearer ~a" api-key)) ("HTTP-Referer" . "https://github.com/amr/org-agent") ("X-Title" . "org-agent Sovereign Kernel"))) (t `(("Content-Type" . "application/json") ("Authorization" . ,(format nil "Bearer ~a" api-key)))))) (body (case provider (:anthropic (cl-json:encode-json-to-string `((model . ,(or model "claude-3-5-sonnet-20240620")) (max_tokens . 4096) (system . ,system-prompt) (messages . (( (role . "user") (content . ,prompt) )))))) (:gemini-api (cl-json:encode-json-to-string `((contents . (((parts . (((text . ,full-prompt)))))))))) (t (cl-json:encode-json-to-string `((model . ,(or model (case provider (:groq "llama-3.3-70b-versatile") (:openai "gpt-4o") (t "openrouter/auto")))) (messages . (( (role . "system") (content . ,system-prompt) ) ( (role . "user") (content . ,prompt) ))))))))) (handler-case (let* ((response (dex:post endpoint :headers headers :content body :connect-timeout 10 :read-timeout 30)) (json (cl-json:decode-json-from-string response))) (let ((content (case provider (:anthropic (get-nested json :content :text)) (:gemini-api (get-nested json :candidates :parts :text)) (t (get-nested json :choices :message :content))))) (if content (list :status :success :content content) (list :status :error :message (format nil "Failed to parse ~a response structure." provider))))) (error (c) (list :status :error :message (format nil "LLM Gateway Failure (~a): ~a" provider c))))))))) #+end_src ** Cognitive Tools The `:ask-llm` tool exposes the gateway's power to Probabilistic Engine, allowing it to explicitly request reasoning from a specific provider when the default cascade is insufficient. ** Registration: Tool Register the unified gateway as a cognitive tool. #+begin_src lisp :tangle ../src/llm-gateway.lisp (def-cognitive-tool :ask-llm "Queries an LLM provider via the unified gateway." ((:prompt :type :string :description "The user prompt.") (:system-prompt :type :string :description "The system instructions.") (:provider :type :keyword :description "The provider (e.g., :gemini-api, :anthropic, :groq, :openai, :openrouter, :ollama, :gemini-web).") (:model :type :string :description "Optional specific model ID.")) :body (lambda (args) (execute-llm-request (getf args :prompt) (or (getf args :system-prompt) "You are a helpful assistant.") :provider (getf args :provider) :model (getf args :model)))) #+end_src Register each supported provider with the harness's neural registry. #+begin_src lisp :tangle ../src/llm-gateway.lisp (dolist (p '(:anthropic :gemini-api :gemini-web :groq :ollama :openai :openrouter)) (org-agent:register-probabilistic-backend p (lambda (prompt system-prompt &key model) (execute-llm-request prompt system-prompt :provider p :model model)))) #+end_src ** Registration: Skill Define the foundational skill entry for the gateway. #+begin_src lisp :tangle ../src/llm-gateway.lisp (defskill :skill-llm-gateway :priority 150 ; Higher than individual old skills :trigger (lambda (context) (declare (ignore context)) nil) :probabilistic (lambda (context) (declare (ignore context)) nil) :deterministic (lambda (action context) (declare (ignore context)) action)) #+end_src * Phase E: Chaos (Verification) ** 1. Unit Tests (FiveAM) Verification is performed in `tests/llm-gateway-tests.lisp` by mocking the `dex:post` client. ** 2. Chaos Scenarios - *Scenario A (Key Exhaustion):* Use the `chaos` skill to temporarily clear an API key and verify the `token-accountant` successfully falls back to the next healthy provider. - *Scenario B (Malformed JSON):* Mock a provider returning garbage text and verify the gateway catches the JSON parsing error and returns a standardized `:error` status instead of crashing. * Phase F: Memory (RCA) - *[2026-04-09 Thu]:* Refactored 6 providers into this unified gateway to solve the URL key-leakage security vulnerability and reduce boilerplate by 60%. - *[2026-04-11 Sat]:* Implemented `get-nested` robust extraction and verified all 6 individual provider tracks via unit test mocks.