amr/memex

Files

Amr Gharbeia 70be8ab93e feat(arch): finalize Universal Literate Note transition for all projects and skills

2026-03-31 16:14:37 -04:00

3.0 KiB

Raw Blame History

Token Optimization - $50 Monthly Budget

Budget: $50/Month

Budget: $50/Month

Budget Breakdown

Tier	Provider	Allocation	Tokens Est.	Use Case
FREE	Google Gemini	$0	~9M/month	90% of work
CHEAP	OpenRouter	$20	~6M tokens	Fallback, complex tasks
PREMIUM	Claude/GPT-4o	$25	~500K tokens	Critical decisions
BUFFER	Various	$5	Emergency	Overruns, testing

Daily Free Allowance

Google Gemini: 300K tokens/day = 9M/month = $0
This covers 90-95% of expected workload

Paid Tier Allocation ($45)

$20 → OpenRouter (Qwen, Mistral, Llama)
- ~6M tokens at $0.003/1K
- Use when: Gemini rate limited, need different model
$25 → Premium models (Claude, GPT-4o)
- ~500K tokens at $0.05/1K average
- Use when: Architecture decisions, critical code review, final validation
$5 → Buffer
- Handle overruns
- Emergency access
- Testing new models

Hard Limits

Provider	Monthly Cap	Alert At
OpenRouter	$20	$16 (80%)
Premium	$25	$20 (80%)
Total	$50	$45 (90%)

Daily Tracking

Target: Monitor consumption every session

``` IF daily_cost > $1.50: → Switch to Gemini only → Defer premium tasks

IF weekly_cost > $12: → Review usage patterns → Find optimization opportunities ```

Emergency Protocol

If approaching $50 limit before month end:

Halt all paid API calls
Switch to Gemini-only mode
Queue premium tasks for next month
Consider local inference setup

Cost-Per-Task Guidelines

Task Type	Max Cost	Preferred Model
Quick lookup	$0.00	Gemini
Code review	$0.01	Gemini/OpenRouter
Feature design	$0.05	OpenRouter
Architecture review	$0.10	Claude/GPT-4o
Emergency debug	$0.20	Best available

Optimization Imperative

With $50/month, waste is not affordable:

❌ No speculative queries
❌ No "just curious" premium calls
❌ No repeated similar prompts
✅ Always use Gemini first
✅ Batch similar requests
✅ Cache embeddings locally
✅ Summarize long contexts

Monthly Review

Compare actual vs. projected usage
Adjust model routing rules
Identify expensive query patterns
Plan next month's allocation

Break-Even Analysis

At $50/month = $600/year:

Option A: Continue APIs (flexible, managed)
Option B: Local inference (~$800 hardware, $0 ongoing)
- Break-even: 16 months
- Risk: Hardware failure, maintenance

Recommendation: Stick with APIs until $100+/month, then evaluate hardware.

Questions for Human Partner

Is $50 firm or flexible in emergencies?
What happens if we hit limit mid-critical-task?
Preference for which premium model? (Claude vs GPT-4 vs both)
Should I track and report costs per project?
Any tasks that are "unlimited budget" critical?