Files
memex/projects/token-optimization/docs/budget-50.org

3.0 KiB

Token Optimization - $50 Monthly Budget

Budget: $50/Month

Budget Breakdown

Tier Provider Allocation Tokens Est. Use Case
FREE Google Gemini $0 ~9M/month 90% of work
CHEAP OpenRouter $20 ~6M tokens Fallback, complex tasks
PREMIUM Claude/GPT-4o $25 ~500K tokens Critical decisions
BUFFER Various $5 Emergency Overruns, testing

Daily Free Allowance

  • Google Gemini: 300K tokens/day = 9M/month = $0
  • This covers 90-95% of expected workload

Paid Tier Allocation ($45)

  • $20 → OpenRouter (Qwen, Mistral, Llama)

    • ~6M tokens at $0.003/1K
    • Use when: Gemini rate limited, need different model
  • $25 → Premium models (Claude, GPT-4o)

    • ~500K tokens at $0.05/1K average
    • Use when: Architecture decisions, critical code review, final validation
  • $5 → Buffer

    • Handle overruns
    • Emergency access
    • Testing new models

Hard Limits

Provider Monthly Cap Alert At
OpenRouter $20 $16 (80%)
Premium $25 $20 (80%)
Total $50 $45 (90%)

Daily Tracking

Target: Monitor consumption every session

``` IF daily_cost > $1.50: → Switch to Gemini only → Defer premium tasks

IF weekly_cost > $12: → Review usage patterns → Find optimization opportunities ```

Emergency Protocol

If approaching $50 limit before month end:

  1. Halt all paid API calls
  2. Switch to Gemini-only mode
  3. Queue premium tasks for next month
  4. Consider local inference setup

Cost-Per-Task Guidelines

Task Type Max Cost Preferred Model
Quick lookup $0.00 Gemini
Code review $0.01 Gemini/OpenRouter
Feature design $0.05 OpenRouter
Architecture review $0.10 Claude/GPT-4o
Emergency debug $0.20 Best available

Optimization Imperative

With $50/month, waste is not affordable:

  • No speculative queries
  • No "just curious" premium calls
  • No repeated similar prompts
  • Always use Gemini first
  • Batch similar requests
  • Cache embeddings locally
  • Summarize long contexts

Monthly Review

  1. Compare actual vs. projected usage
  2. Adjust model routing rules
  3. Identify expensive query patterns
  4. Plan next month's allocation

Break-Even Analysis

At $50/month = $600/year:

  • Option A: Continue APIs (flexible, managed)
  • Option B: Local inference (~$800 hardware, $0 ongoing)

    • Break-even: 16 months
    • Risk: Hardware failure, maintenance

Recommendation: Stick with APIs until $100+/month, then evaluate hardware.

Questions for Human Partner

  1. Is $50 firm or flexible in emergencies?
  2. What happens if we hit limit mid-critical-task?
  3. Preference for which premium model? (Claude vs GPT-4 vs both)
  4. Should I track and report costs per project?
  5. Any tasks that are "unlimited budget" critical?