Files
passepartout/org/system-model-router.org
Amr Gharbeia 908936d4d3 rename gateway-* → system-model-* + gateway-messaging, de-ollama, add system-model-explorer
- Rename gateway-provider → system-model-provider (generic :local provider, no hardcoded ollama)
- Rename gateway-llm → system-model (model-request dispatcher)
- Rename system-embedding-gateway → system-model-embedding
- Rename gateway-manager → gateway-messaging (public api renamed to messaging-*)
- Add system-model-explorer (model discovery via OpenRouter API, cached, per-slot recommendations)
- Fix skill loader export: replace prefix-matching with fbound/boundp-based export (20 skills now export)
- Add model-router to skill-loader exclusion list (loaded via CLI)
- De-ollama: remove hardcoded assumed-available patterns from provider pipeline
- Default cascade: cloud-only (openrouter, openai, groq, gemini, deepseek, nvidia, anthropic)
- Env example: add LOCAL_BASE_URL, fix cascade order
- All org files updated with architectural prose (literate programming)
2026-05-04 09:58:59 -04:00

8.0 KiB

SKILL: Model Router (org-skill-model-router.org)

Overview: Quadrant-Based Model Routing

The Model Router implements the four-quadrant cognitive architecture for LLM model selection. Each signal is routed through a pipeline of three filters — privacy, quadrant, and complexity — before a model is chosen.

The routing pipeline for every probabilistic signal:

all backends → privacy filter → quadrant/classifier → per-slot cascade → model

  • Privacy filter strips cloud backends when content carries @personal tags.
  • Quadrant determines if the signal is foreground or background.
  • Complexity classifier assigns foreground signals to one of three slots: :code, :plan, or :chat.
  • Per-slot cascade selects a backend and model for the slot, with fallback ordering defined in each cascade list.

The model selector function is registered into the core *model-selector* hook at load time. The core iterates providers, calling the selector for each one.

Implementation

Package Context

(in-package :passepartout)

Configuration: Per-Slot Cascades

Four env-configurable cascade variables, one per slot. Each cascade is a list of (provider-keyword . "model-name") pairs. The first match for the current backend is used.

Example: MODEL_CASCADE_CODE='((:ollama . "deepseek-coder:6.7b") (:openrouter . "claude-sonnet"))'

model-cascade-code

The cascade for :code tasks (code generation, refactoring, bug fixing). Format: ((:ollama . "model-name") ...). Configured via MODEL_CASCADE_CODE.

;; REPL-VERIFIED: 2026-05-03T14:00:00

(defvar *model-cascade-code* nil
  "Cascade for :code tasks: ((:ollama . \"model\") ...)")

model-cascade-plan

Cascade for planning and architecture tasks. Configured via MODEL_CASCADE_PLAN.

;; REPL-VERIFIED: 2026-05-03T14:00:00

(defvar *model-cascade-plan* nil
  "Cascade for :plan tasks.")

model-cascade-chat

Cascade for general conversation and simple Q&A. Configured via MODEL_CASCADE_CHAT.

;; REPL-VERIFIED: 2026-05-03T14:00:00

(defvar *model-cascade-chat* nil
  "Cascade for :chat tasks.")

model-cascade-background

Cascade for background tasks (heartbeat scraping, delegation processing). Configured via MODEL_CASCADE_BACKGROUND.

;; REPL-VERIFIED: 2026-05-03T14:00:00

(defvar *model-cascade-background* nil
  "Cascade for background tasks (heartbeat, delegation).")

local-backends

List of backend keywords considered local for privacy routing. Content tagged with @personal will only be sent to these backends.

;; REPL-VERIFIED: 2026-05-03T14:00:00

(defvar *local-backends* '(:ollama :llama-cpp)
  "Backend keywords considered local (privacy-safe).")

Complexity Classifier

Keyword-based heuristic that assigns signal text to a complexity slot. Pluggable — set *complexity-classifier* to override.

;; REPL-VERIFIED: 2026-05-03T14:00:00

(defun model-classify-complexity (text)
  "Classify TEXT into :code, :plan, or :chat."
  (let ((lower (string-downcase text)))
    (cond
      ((or (search "defun" lower) (search "defmacro" lower)
           (search "write" lower) (search "refactor" lower)
           (search "fix " lower) (search "implement" lower)
           (search "code" lower)
           (search "#+begin_src" lower))
       :code)
      ((or (search "plan" lower) (search "roadmap" lower)
           (search "strategy" lower) (search "design" lower)
           (search "architecture" lower))
       :plan)
      (t :chat))))

Cascade Lookup

The core iterates each backend in *provider-cascade* and calls the model selector for each one. This function matches the current backend against the per-slot cascade list to find the appropriate model. Returns the first :code (provider . model) entry whose provider matches, or nil if the backend has no entry in that slot's cascade (the core will skip to the next provider).

;; REPL-VERIFIED: 2026-05-03T14:00:00

(defun model-cascade-find (cascade backend)
  "Find first (PROVIDER . MODEL) in CASCADE matching BACKEND."
  (assoc backend cascade
         :test (lambda (a b) (string-equal (string a) (string b)))))

Model Selector

The main routing function. Registered into *model-selector* at init time. Called per-backend by backend-cascade-call. Returns a model name string, or :skip if the backend should not be tried (e.g., privacy filter).

Filter order: privacy → quadrant → complexity → cascade.

;; REPL-VERIFIED: 2026-05-03T14:00:00

(defun model-select (backend context)
  "Select model for BACKEND given CONTEXT signal.
Returns model name or :skip."
  (let* ((payload (getf context :payload))
         (text (or (getf payload :text) ""))
         (sensor (getf payload :sensor))
         (has-personal (and (boundp '*dispatcher-privacy-tags*)
                            (some (lambda (tag) (search tag text))
                                  (symbol-value '*dispatcher-privacy-tags*))))
         (is-local (member backend *local-backends*)))
    ;; Privacy: skip cloud backends for personal content
    (when (and has-personal (not is-local))
      (log-message "MODEL-ROUTER: Skipping ~a (personal content)" backend)
      (return-from model-select :skip))
    ;; Quadrant: background tasks use background cascade
    (if (member sensor '(:heartbeat :delegation :tool-output :loop-error))
        (let ((entry (car (or *model-cascade-background*
                              '((:ollama . "phi-2"))))))
          (cdr entry))
        ;; Foreground: classify complexity, use slot cascade
        (let* ((slot (model-classify-complexity text))
               (cascade (case slot
                          (:code *model-cascade-code*)
                          (:plan *model-cascade-plan*)
                          (t *model-cascade-chat*)))
               (entry (model-cascade-find
                       (or cascade '((:ollama . "qwen2.5:14b"))) backend)))
          (if entry (cdr entry) nil)))))

Initialization

Reads cascade configuration from environment variables and registers model-select into the core *model-selector* hook.

;; REPL-VERIFIED: 2026-05-03T14:00:00

(defun model-router-init ()
  "Read env vars and wire model-select into *model-selector*."
  (flet ((parse-cascade (str)
           (when (and str (> (length str) 0))
             (let ((*read-eval* nil))
               (read-from-string str)))))
    (setf *model-cascade-code* (parse-cascade (uiop:getenv "MODEL_CASCADE_CODE"))
          *model-cascade-plan* (parse-cascade (uiop:getenv "MODEL_CASCADE_PLAN"))
          *model-cascade-chat* (parse-cascade (uiop:getenv "MODEL_CASCADE_CHAT"))
          *model-cascade-background* (parse-cascade (uiop:getenv "MODEL_CASCADE_BACKGROUND"))
          *local-backends* (let ((env (uiop:getenv "LOCAL_BACKENDS")))
                             (if env
                                 (mapcar (lambda (s) (intern (string-upcase (string-trim " " s)) :keyword))
                                         (uiop:split-string env :separator '(#\,)))
                                 '(:ollama :llama-cpp)))))
    (setf *model-selector* #'model-select)
    (log-message "MODEL-ROUTER: Initialized, selector=~a" *model-selector*))

Skill Registration

The model router is an observer skill — it has no trigger and no deterministic gate. All work happens at load time via model-router-init, which reads env vars and registers into the core *model-selector* hook. The defskill call exists only to register metadata (priority, name) for telemetry and lifecycle management.

(defskill :passepartout-model-router
  :priority 250
  :trigger (lambda (ctx) (declare (ignore ctx)) nil))

Auto-Init

(model-router-init)