feat(arch): finalize Universal Literate Note transition for all projects and skills
This commit is contained in:
67
projects/token-optimization/docs/research.org
Normal file
67
projects/token-optimization/docs/research.org
Normal file
@@ -0,0 +1,67 @@
|
||||
#+TITLE: Token Management & Model Optimization Research
|
||||
#+author: Amero Garcia
|
||||
#+created: [2026-03-16 Mon 14:28]
|
||||
#+DATE: 2026-03-04
|
||||
#+FILETAGS: :research:token:optimization:models
|
||||
|
||||
* Token Management Strategy Research
|
||||
|
||||
** Initial Findings
|
||||
|
||||
*** OpenRouter Free Tier
|
||||
- URL: https://openrouter.ai/collections/free-models
|
||||
- Providers moving from free to paid-only models
|
||||
- Belief: "Free models play crucial role in democratizing access"
|
||||
|
||||
*** Google AI Studio (Gemini)
|
||||
- Free tier available
|
||||
- Limits: 60 requests/minute, 300K tokens/day
|
||||
- No credit card required
|
||||
- Every API key gets these limits
|
||||
|
||||
** Research Questions
|
||||
|
||||
1. Which providers offer free or low-cost tiers?
|
||||
2. What are the rate limits and quotas?
|
||||
3. Which models are best for which use cases?
|
||||
4. How to optimize context windows?
|
||||
5. What is the cost per token breakdown?
|
||||
|
||||
** To Research Further
|
||||
|
||||
| Provider | Free Tier | Paid Tier | Best For |
|
||||
|----------|-----------|-----------|----------|
|
||||
| Google Gemini | 300K tokens/day | Pay per use? | General, coding |
|
||||
| OpenRouter | Varies by model | Per-request | Routing, variety |
|
||||
| OpenAI | ? | ? | GPT-4 quality |
|
||||
| Anthropic | ? | ? | Claude capabilities |
|
||||
| Mistral | ? | ? | Open weights |
|
||||
| Local | Hardware cost | Free | Privacy, control |
|
||||
|
||||
** Token Optimization Strategies to Explore
|
||||
|
||||
1. *Tiered Model Usage*
|
||||
- Simple tasks: Fast/cheap models
|
||||
- Complex tasks: Stronger models
|
||||
- Fallback: Lower tier if higher fails
|
||||
|
||||
2. *Context Compression*
|
||||
- Summarize long contexts
|
||||
- Use RAG instead of full context
|
||||
- Prune old conversation
|
||||
|
||||
3. *Caching*
|
||||
- Cache common responses
|
||||
- Reuse embeddings
|
||||
- Batch requests
|
||||
|
||||
4. *Hybrid Approach*
|
||||
- Local models for simple queries
|
||||
- Cloud APIs for complex tasks
|
||||
- Manual review for critical outputs
|
||||
|
||||
** X Account Access
|
||||
|
||||
*Pending:* X account access via Google login
|
||||
*Blocker:* Requires OTP from user per security rule (SOUL.md)
|
||||
*Action needed:* User provides OTP, I complete OAuth, access bookmarks
|
||||
Reference in New Issue
Block a user