112 lines
3.0 KiB
Org Mode
112 lines
3.0 KiB
Org Mode
#+TITLE: Token Optimization - $50 Monthly Budget
|
|
#+author: Amero Garcia
|
|
#+created: [2026-03-16 Mon 14:28]
|
|
#+DATE: 2026-03-04
|
|
#+FILETAGS: :budget:constraints:optimization
|
|
|
|
* Budget: $50/Month
|
|
|
|
** Budget Breakdown
|
|
|
|
| Tier | Provider | Allocation | Tokens Est. | Use Case |
|
|
|------|----------|-----------|-------------|----------|
|
|
| FREE | Google Gemini | $0 | ~9M/month | 90% of work |
|
|
| CHEAP | OpenRouter | $20 | ~6M tokens | Fallback, complex tasks |
|
|
| PREMIUM | Claude/GPT-4o | $25 | ~500K tokens | Critical decisions |
|
|
| BUFFER | Various | $5 | Emergency | Overruns, testing |
|
|
|
|
** Daily Free Allowance
|
|
|
|
- *Google Gemini:* 300K tokens/day = 9M/month = *$0*
|
|
- This covers 90-95% of expected workload
|
|
|
|
** Paid Tier Allocation ($45)
|
|
|
|
- *$20 → OpenRouter* (Qwen, Mistral, Llama)
|
|
- ~6M tokens at $0.003/1K
|
|
- Use when: Gemini rate limited, need different model
|
|
|
|
- *$25 → Premium models* (Claude, GPT-4o)
|
|
- ~500K tokens at $0.05/1K average
|
|
- Use when: Architecture decisions, critical code review, final validation
|
|
|
|
- *$5 → Buffer*
|
|
- Handle overruns
|
|
- Emergency access
|
|
- Testing new models
|
|
|
|
** Hard Limits
|
|
|
|
| Provider | Monthly Cap | Alert At |
|
|
|----------|-------------|----------|
|
|
| OpenRouter | $20 | $16 (80%) |
|
|
| Premium | $25 | $20 (80%) |
|
|
| Total | $50 | $45 (90%) |
|
|
|
|
** Daily Tracking
|
|
|
|
Target: *Monitor consumption every session*
|
|
|
|
```
|
|
IF daily_cost > $1.50:
|
|
→ Switch to Gemini only
|
|
→ Defer premium tasks
|
|
|
|
IF weekly_cost > $12:
|
|
→ Review usage patterns
|
|
→ Find optimization opportunities
|
|
```
|
|
|
|
** Emergency Protocol
|
|
|
|
If approaching $50 limit before month end:
|
|
1. Halt all paid API calls
|
|
2. Switch to Gemini-only mode
|
|
3. Queue premium tasks for next month
|
|
4. Consider local inference setup
|
|
|
|
** Cost-Per-Task Guidelines
|
|
|
|
| Task Type | Max Cost | Preferred Model |
|
|
|-----------|----------|-----------------|
|
|
| Quick lookup | $0.00 | Gemini |
|
|
| Code review | $0.01 | Gemini/OpenRouter |
|
|
| Feature design | $0.05 | OpenRouter |
|
|
| Architecture review | $0.10 | Claude/GPT-4o |
|
|
| Emergency debug | $0.20 | Best available |
|
|
|
|
** Optimization Imperative
|
|
|
|
With $50/month, waste is not affordable:
|
|
- ❌ No speculative queries
|
|
- ❌ No "just curious" premium calls
|
|
- ❌ No repeated similar prompts
|
|
- ✅ Always use Gemini first
|
|
- ✅ Batch similar requests
|
|
- ✅ Cache embeddings locally
|
|
- ✅ Summarize long contexts
|
|
|
|
** Monthly Review
|
|
|
|
1. Compare actual vs. projected usage
|
|
2. Adjust model routing rules
|
|
3. Identify expensive query patterns
|
|
4. Plan next month's allocation
|
|
|
|
** Break-Even Analysis
|
|
|
|
At $50/month = $600/year:
|
|
- *Option A:* Continue APIs (flexible, managed)
|
|
- *Option B:* Local inference (~$800 hardware, $0 ongoing)
|
|
- Break-even: 16 months
|
|
- Risk: Hardware failure, maintenance
|
|
|
|
*Recommendation:* Stick with APIs until $100+/month, then evaluate hardware.
|
|
|
|
** Questions for Human Partner
|
|
|
|
1. Is $50 firm or flexible in emergencies?
|
|
2. What happens if we hit limit mid-critical-task?
|
|
3. Preference for which premium model? (Claude vs GPT-4 vs both)
|
|
4. Should I track and report costs per project?
|
|
5. Any tasks that are "unlimited budget" critical? |