memex/projects/token-optimization/docs/budget-50.org

#+TITLE: Token Optimization - $50 Monthly Budget
#+author: Amero Garcia
#+created: [2026-03-16 Mon 14:28]
#+DATE: 2026-03-04
#+FILETAGS: :budget:constraints:optimization

* Budget: $50/Month

** Budget Breakdown

| Tier | Provider | Allocation | Tokens Est. | Use Case |
|------|----------|-----------|-------------|----------|
| FREE | Google Gemini | $0 | ~9M/month | 90% of work |
| CHEAP | OpenRouter | $20 | ~6M tokens | Fallback, complex tasks |
| PREMIUM | Claude/GPT-4o | $25 | ~500K tokens | Critical decisions |
| BUFFER | Various | $5 | Emergency | Overruns, testing |

** Daily Free Allowance

- *Google Gemini:* 300K tokens/day = 9M/month = *$0*
- This covers 90-95% of expected workload

** Paid Tier Allocation ($45)

- *$20 → OpenRouter* (Qwen, Mistral, Llama)
  - ~6M tokens at $0.003/1K
  - Use when: Gemini rate limited, need different model

- *$25 → Premium models* (Claude, GPT-4o)
  - ~500K tokens at $0.05/1K average
  - Use when: Architecture decisions, critical code review, final validation

- *$5 → Buffer*
  - Handle overruns
  - Emergency access
  - Testing new models

** Hard Limits

| Provider | Monthly Cap | Alert At |
|----------|-------------|----------|
| OpenRouter | $20 | $16 (80%) |
| Premium | $25 | $20 (80%) |
| Total | $50 | $45 (90%) |

** Daily Tracking

Target: *Monitor consumption every session*

```
IF daily_cost > $1.50:
  → Switch to Gemini only
  → Defer premium tasks

IF weekly_cost > $12:
  → Review usage patterns
  → Find optimization opportunities
```

** Emergency Protocol

If approaching $50 limit before month end:
1. Halt all paid API calls
2. Switch to Gemini-only mode
3. Queue premium tasks for next month
4. Consider local inference setup

** Cost-Per-Task Guidelines

| Task Type | Max Cost | Preferred Model |
|-----------|----------|-----------------|
| Quick lookup | $0.00 | Gemini |
| Code review | $0.01 | Gemini/OpenRouter |
| Feature design | $0.05 | OpenRouter |
| Architecture review | $0.10 | Claude/GPT-4o |
| Emergency debug | $0.20 | Best available |

** Optimization Imperative

With $50/month, waste is not affordable:
- ❌ No speculative queries
- ❌ No "just curious" premium calls
- ❌ No repeated similar prompts
- ✅ Always use Gemini first
- ✅ Batch similar requests
- ✅ Cache embeddings locally
- ✅ Summarize long contexts

** Monthly Review

1. Compare actual vs. projected usage
2. Adjust model routing rules
3. Identify expensive query patterns
4. Plan next month's allocation

** Break-Even Analysis

At $50/month = $600/year:
- *Option A:* Continue APIs (flexible, managed)
- *Option B:* Local inference (~$800 hardware, $0 ongoing)
  - Break-even: 16 months
  - Risk: Hardware failure, maintenance

*Recommendation:* Stick with APIs until $100+/month, then evaluate hardware.

** Questions for Human Partner

1. Is $50 firm or flexible in emergencies?
2. What happens if we hit limit mid-critical-task?
3. Preference for which premium model? (Claude vs GPT-4 vs both)
4. Should I track and report costs per project?
5. Any tasks that are "unlimited budget" critical?