refactor: moved org-agent to its own repository as a submodule
This commit is contained in:
64
notes/llm-alternative-providers.org
Normal file
64
notes/llm-alternative-providers.org
Normal file
@@ -0,0 +1,64 @@
|
||||
#+TITLE: Alternative LLM Providers - Subscription & Token Efficient
|
||||
#+author: User
|
||||
#+created: [2026-03-16 Mon 14:28]
|
||||
#+DATE: 2026-03-07
|
||||
#+FILETAGS: :research:llm:pricing:alternatives
|
||||
|
||||
* GLM-5 (Zhipu AI) - Research
|
||||
|
||||
** Pricing Found
|
||||
- Input: $1.00 per 1M tokens
|
||||
- Output: $3.20 per 1M tokens
|
||||
- Context: ~744B parameters, MoE architecture
|
||||
- Training: 28.5T tokens
|
||||
|
||||
** Comparison to Current
|
||||
| Model | Input Cost | Output Cost | Context | Free Tier |
|
||||
|-------|-----------|-------------|---------|-----------|
|
||||
| Gemini 2.0 | $0 | $0 | 1M | ✅ Yes |
|
||||
| GLM-5 | $1.00 | $3.20 | ? | ? |
|
||||
| Claude | $3.00 | $15.00 | 200K | ❌ No |
|
||||
| GPT-4 | varies | varies | 128K | ❌ No |
|
||||
|
||||
** Status: Still researching subscription/unlimited plans
|
||||
|
||||
* Alternative Providers to Research
|
||||
|
||||
** Tier 1: Subscription/Unlimited
|
||||
1. *Fireworks AI* - Flat-rate inference
|
||||
2. *Together AI* - Pay-per-token but high limits
|
||||
3. *Replicate* - Metered but competitive
|
||||
4. *Groq* - Ultra-fast, low cost
|
||||
|
||||
** Tier 2: Self-Hosted (One-time cost)
|
||||
1. *RunPod* - GPU rental for local models
|
||||
2. *Lambdalabs* - GPU cloud
|
||||
3. *Local inference* - RTX 4090, etc.
|
||||
|
||||
** Tier 3: Open Source Providers
|
||||
1. *Ollama* + RunPod/Lambda
|
||||
2. *llama.cpp* quantized models
|
||||
3. *vLLM* serving framework
|
||||
|
||||
* Research Questions
|
||||
|
||||
1. Does GLM-5 offer unlimited subscription tier?
|
||||
2. What about Fireworks/Together flat-rate plans?
|
||||
3. AWS Bedrock with flat-rate (Amazon Q)?
|
||||
4. Self-hosted llama3 70B vs GLM-5 quality?
|
||||
|
||||
* Next Steps Needed
|
||||
|
||||
- Manual research required (web browsing limited)
|
||||
- Check Zhipu pricing page directly
|
||||
- Compare subscription tiers
|
||||
- Evaluate self-hosting break-even
|
||||
|
||||
* Current Recommendation
|
||||
|
||||
*Until research complete:*
|
||||
- Stay on Gemini (free tier) ✅
|
||||
- Use sparingly to avoid 60/minute rate limit
|
||||
- 300K tokens/day = ~9M tokens/month free
|
||||
|
||||
*If need more than 9M/month:* Evaluate paid tiers
|
||||
Reference in New Issue
Block a user