v0.4.0: semantic retrieval — REPL TDD + literate prose
Some checks failed
Deploy (Gitea) / deploy (push) Failing after 3s
Some checks failed
Deploy (Gitea) / deploy (push) Failing after 3s
RED proofs (pre-v0.4.0): - SEMANTIC_SCORE never appears in context output (foveal-vector = nil) - Context suite: 9/0 (no trigram test) - SHA-256 hashing default — cryptographically blind to similarity GREEN proofs (v0.4.0): - Trigram 'authentication' vs 'authenticate' → 0.80 similarity - Trigram 'authentication' vs 'banana' → 0.00 similarity - Default provider: :trigram (lexical overlap, zero dependencies) - Context suite: 12/0 (new test-semantic-retrieval-trigram) - SHA-256 preserved as explicit :sha256 provider (integrity-only) Prose: - system-model-embedding.org: explains why SHA-256 is blind (avalanche property) and why trigrams capture lexical overlap (shared 'aut','uth', 'the','hen',...). Documents :trigram, :sha256, :local, :openai backends. - core-context.org: documents the one-line foveal-vector wiring fix and how it activates the dormant semantic retrieval path. Explains the full pipeline: trigram embed → memory-object-vector → context-awareness-assemble → context-object-render → cosine similarity.
This commit is contained in:
@@ -24,6 +24,14 @@ A naive implementation that serializes every ~org-object~ to text would produce
|
||||
|
||||
The semantic threshold is configurable via ~CONTEXT_SEMANTIC_THRESHOLD~ env var (default 0.75). Lower values include more peripherally related content; higher values restrict to tightly related content.
|
||||
|
||||
** Semantic Retrieval Activation (v0.4.0)
|
||||
|
||||
In v0.3.0, the infrastructure for semantic retrieval was in place — the cosine similarity calculation, the semantic threshold check, and the embedding pipeline — but ~:foveal-vector~ was never passed to ~context-object-render~. It was always ~nil~, so ~(if (and foveal-vector obj-vector ...) ...)~ always took the ~0.0~ branch. Every peripheral node had similarity zero regardless of content overlap.
|
||||
|
||||
The fix is a one-line wiring: ~context-awareness-assemble~ now extracts the foveal node's embedding vector via ~(memory-object-vector (memory-object-get foveal-id))~ and passes it as the ~:foveal-vector~ keyword argument to ~context-object-render~. This activates the entire semantic retrieval path — nodes with high cosine similarity to the foveal node are promoted to full-content rendering.
|
||||
|
||||
The effectiveness of this depends on the embedding backend. The default ~:trigram~ backend (v0.4.0 replacement for ~:hashing~/SHA-256) captures lexical overlap: if two nodes share enough character trigrams, their cosine similarity exceeds the threshold and the peripheral node is promoted to foveal detail. This gives the context model genuine semantic boosting with zero LLM tokens and zero external dependencies.
|
||||
|
||||
** Contract
|
||||
|
||||
1. (context-awareness-assemble &optional signal): produces a skeletal
|
||||
|
||||
Reference in New Issue
Block a user