feat(arch): finalize Universal Literate Note transition for all projects and skills

2026-03-31 16:14:37 -04:00
parent 1712b1e4a9
commit 70be8ab93e
79 changed files with 1606 additions and 417 deletions
--- a/projects/token-optimization/docs/budget-50.org
+++ b/projects/token-optimization/docs/budget-50.org
@@ -0,0 +1,112 @@
+#+TITLE: Token Optimization - $50 Monthly Budget
+#+author: Amero Garcia
+#+created: [2026-03-16 Mon 14:28]
+#+DATE: 2026-03-04
+#+FILETAGS: :budget:constraints:optimization
+
+* Budget: $50/Month
+
+** Budget Breakdown
+
+| Tier | Provider | Allocation | Tokens Est. | Use Case |
+|------|----------|-----------|-------------|----------|
+| FREE | Google Gemini | $0 | ~9M/month | 90% of work |
+| CHEAP | OpenRouter | $20 | ~6M tokens | Fallback, complex tasks |
+| PREMIUM | Claude/GPT-4o | $25 | ~500K tokens | Critical decisions |
+| BUFFER | Various | $5 | Emergency | Overruns, testing |
+
+** Daily Free Allowance
+
+- *Google Gemini:* 300K tokens/day = 9M/month = *$0*
+- This covers 90-95% of expected workload
+
+** Paid Tier Allocation ($45)
+
+- *$20 → OpenRouter* (Qwen, Mistral, Llama)
+  - ~6M tokens at $0.003/1K
+  - Use when: Gemini rate limited, need different model
+  
+- *$25 → Premium models* (Claude, GPT-4o)
+  - ~500K tokens at $0.05/1K average
+  - Use when: Architecture decisions, critical code review, final validation
+
+- *$5 → Buffer*
+  - Handle overruns
+  - Emergency access
+  - Testing new models
+
+** Hard Limits
+
+| Provider | Monthly Cap | Alert At |
+|----------|-------------|----------|
+| OpenRouter | $20 | $16 (80%) |
+| Premium | $25 | $20 (80%) |
+| Total | $50 | $45 (90%) |
+
+** Daily Tracking
+
+Target: *Monitor consumption every session*
+
+```
+IF daily_cost > $1.50:
+  → Switch to Gemini only
+  → Defer premium tasks
+  
+IF weekly_cost > $12:
+  → Review usage patterns
+  → Find optimization opportunities
+```
+
+** Emergency Protocol
+
+If approaching $50 limit before month end:
+1. Halt all paid API calls
+2. Switch to Gemini-only mode
+3. Queue premium tasks for next month
+4. Consider local inference setup
+
+** Cost-Per-Task Guidelines
+
+| Task Type | Max Cost | Preferred Model |
+|-----------|----------|-----------------|
+| Quick lookup | $0.00 | Gemini |
+| Code review | $0.01 | Gemini/OpenRouter |
+| Feature design | $0.05 | OpenRouter |
+| Architecture review | $0.10 | Claude/GPT-4o |
+| Emergency debug | $0.20 | Best available |
+
+** Optimization Imperative
+
+With $50/month, waste is not affordable:
+- ❌ No speculative queries
+- ❌ No "just curious" premium calls
+- ❌ No repeated similar prompts
+- ✅ Always use Gemini first
+- ✅ Batch similar requests
+- ✅ Cache embeddings locally
+- ✅ Summarize long contexts
+
+** Monthly Review
+
+1. Compare actual vs. projected usage
+2. Adjust model routing rules
+3. Identify expensive query patterns
+4. Plan next month's allocation
+
+** Break-Even Analysis
+
+At $50/month = $600/year:
+- *Option A:* Continue APIs (flexible, managed)
+- *Option B:* Local inference (~$800 hardware, $0 ongoing)
+  - Break-even: 16 months
+  - Risk: Hardware failure, maintenance
+  
+*Recommendation:* Stick with APIs until $100+/month, then evaluate hardware.
+
+** Questions for Human Partner
+
+1. Is $50 firm or flexible in emergencies?
+2. What happens if we hit limit mid-critical-task?
+3. Preference for which premium model? (Claude vs GPT-4 vs both)
+4. Should I track and report costs per project?
+5. Any tasks that are "unlimited budget" critical?