feat(arch): finalize Universal Literate Note transition for all projects and skills

This commit is contained in:
2026-03-31 16:14:37 -04:00
parent 1712b1e4a9
commit 70be8ab93e
79 changed files with 1606 additions and 417 deletions

View File

@@ -0,0 +1,112 @@
#+TITLE: Token Optimization - $50 Monthly Budget
#+author: Amero Garcia
#+created: [2026-03-16 Mon 14:28]
#+DATE: 2026-03-04
#+FILETAGS: :budget:constraints:optimization
* Budget: $50/Month
** Budget Breakdown
| Tier | Provider | Allocation | Tokens Est. | Use Case |
|------|----------|-----------|-------------|----------|
| FREE | Google Gemini | $0 | ~9M/month | 90% of work |
| CHEAP | OpenRouter | $20 | ~6M tokens | Fallback, complex tasks |
| PREMIUM | Claude/GPT-4o | $25 | ~500K tokens | Critical decisions |
| BUFFER | Various | $5 | Emergency | Overruns, testing |
** Daily Free Allowance
- *Google Gemini:* 300K tokens/day = 9M/month = *$0*
- This covers 90-95% of expected workload
** Paid Tier Allocation ($45)
- *$20 → OpenRouter* (Qwen, Mistral, Llama)
- ~6M tokens at $0.003/1K
- Use when: Gemini rate limited, need different model
- *$25 → Premium models* (Claude, GPT-4o)
- ~500K tokens at $0.05/1K average
- Use when: Architecture decisions, critical code review, final validation
- *$5 → Buffer*
- Handle overruns
- Emergency access
- Testing new models
** Hard Limits
| Provider | Monthly Cap | Alert At |
|----------|-------------|----------|
| OpenRouter | $20 | $16 (80%) |
| Premium | $25 | $20 (80%) |
| Total | $50 | $45 (90%) |
** Daily Tracking
Target: *Monitor consumption every session*
```
IF daily_cost > $1.50:
→ Switch to Gemini only
→ Defer premium tasks
IF weekly_cost > $12:
→ Review usage patterns
→ Find optimization opportunities
```
** Emergency Protocol
If approaching $50 limit before month end:
1. Halt all paid API calls
2. Switch to Gemini-only mode
3. Queue premium tasks for next month
4. Consider local inference setup
** Cost-Per-Task Guidelines
| Task Type | Max Cost | Preferred Model |
|-----------|----------|-----------------|
| Quick lookup | $0.00 | Gemini |
| Code review | $0.01 | Gemini/OpenRouter |
| Feature design | $0.05 | OpenRouter |
| Architecture review | $0.10 | Claude/GPT-4o |
| Emergency debug | $0.20 | Best available |
** Optimization Imperative
With $50/month, waste is not affordable:
- ❌ No speculative queries
- ❌ No "just curious" premium calls
- ❌ No repeated similar prompts
- ✅ Always use Gemini first
- ✅ Batch similar requests
- ✅ Cache embeddings locally
- ✅ Summarize long contexts
** Monthly Review
1. Compare actual vs. projected usage
2. Adjust model routing rules
3. Identify expensive query patterns
4. Plan next month's allocation
** Break-Even Analysis
At $50/month = $600/year:
- *Option A:* Continue APIs (flexible, managed)
- *Option B:* Local inference (~$800 hardware, $0 ongoing)
- Break-even: 16 months
- Risk: Hardware failure, maintenance
*Recommendation:* Stick with APIs until $100+/month, then evaluate hardware.
** Questions for Human Partner
1. Is $50 firm or flexible in emergencies?
2. What happens if we hit limit mid-critical-task?
3. Preference for which premium model? (Claude vs GPT-4 vs both)
4. Should I track and report costs per project?
5. Any tasks that are "unlimited budget" critical?