3.0 KiB
3.0 KiB
Token Optimization - $50 Monthly Budget
Budget: $50/Month
Budget Breakdown
| Tier | Provider | Allocation | Tokens Est. | Use Case |
|---|---|---|---|---|
| FREE | Google Gemini | $0 | ~9M/month | 90% of work |
| CHEAP | OpenRouter | $20 | ~6M tokens | Fallback, complex tasks |
| PREMIUM | Claude/GPT-4o | $25 | ~500K tokens | Critical decisions |
| BUFFER | Various | $5 | Emergency | Overruns, testing |
Daily Free Allowance
- Google Gemini: 300K tokens/day = 9M/month = $0
- This covers 90-95% of expected workload
Paid Tier Allocation ($45)
-
$20 → OpenRouter (Qwen, Mistral, Llama)
- ~6M tokens at $0.003/1K
- Use when: Gemini rate limited, need different model
-
$25 → Premium models (Claude, GPT-4o)
- ~500K tokens at $0.05/1K average
- Use when: Architecture decisions, critical code review, final validation
-
$5 → Buffer
- Handle overruns
- Emergency access
- Testing new models
Hard Limits
| Provider | Monthly Cap | Alert At |
|---|---|---|
| OpenRouter | $20 | $16 (80%) |
| Premium | $25 | $20 (80%) |
| Total | $50 | $45 (90%) |
Daily Tracking
Target: Monitor consumption every session
``` IF daily_cost > $1.50: → Switch to Gemini only → Defer premium tasks
IF weekly_cost > $12: → Review usage patterns → Find optimization opportunities ```
Emergency Protocol
If approaching $50 limit before month end:
- Halt all paid API calls
- Switch to Gemini-only mode
- Queue premium tasks for next month
- Consider local inference setup
Cost-Per-Task Guidelines
| Task Type | Max Cost | Preferred Model |
|---|---|---|
| Quick lookup | $0.00 | Gemini |
| Code review | $0.01 | Gemini/OpenRouter |
| Feature design | $0.05 | OpenRouter |
| Architecture review | $0.10 | Claude/GPT-4o |
| Emergency debug | $0.20 | Best available |
Optimization Imperative
With $50/month, waste is not affordable:
- ❌ No speculative queries
- ❌ No "just curious" premium calls
- ❌ No repeated similar prompts
- ✅ Always use Gemini first
- ✅ Batch similar requests
- ✅ Cache embeddings locally
- ✅ Summarize long contexts
Monthly Review
- Compare actual vs. projected usage
- Adjust model routing rules
- Identify expensive query patterns
- Plan next month's allocation
Break-Even Analysis
At $50/month = $600/year:
- Option A: Continue APIs (flexible, managed)
-
Option B: Local inference (~$800 hardware, $0 ongoing)
- Break-even: 16 months
- Risk: Hardware failure, maintenance
Recommendation: Stick with APIs until $100+/month, then evaluate hardware.
Questions for Human Partner
- Is $50 firm or flexible in emergencies?
- What happens if we hit limit mid-critical-task?
- Preference for which premium model? (Claude vs GPT-4 vs both)
- Should I track and report costs per project?
- Any tasks that are "unlimited budget" critical?