1.8 KiB
1.8 KiB
Alternative LLM Providers - Subscription & Token Efficient
- GLM-5 (Zhipu AI) - Research
- Alternative Providers to Research
- Research Questions
- Next Steps Needed
- Current Recommendation
GLM-5 (Zhipu AI) - Research
Pricing Found
- Input: $1.00 per 1M tokens
- Output: $3.20 per 1M tokens
- Context: ~744B parameters, MoE architecture
- Training: 28.5T tokens
Comparison to Current
| Model | Input Cost | Output Cost | Context | Free Tier |
|---|---|---|---|---|
| Gemini 2.0 | $0 | $0 | 1M | ✅ Yes |
| GLM-5 | $1.00 | $3.20 | ? | ? |
| Claude | $3.00 | $15.00 | 200K | ❌ No |
| GPT-4 | varies | varies | 128K | ❌ No |
Status: Still researching subscription/unlimited plans
Alternative Providers to Research
Tier 1: Subscription/Unlimited
- Fireworks AI - Flat-rate inference
- Together AI - Pay-per-token but high limits
- Replicate - Metered but competitive
- Groq - Ultra-fast, low cost
Tier 2: Self-Hosted (One-time cost)
- RunPod - GPU rental for local models
- Lambdalabs - GPU cloud
- Local inference - RTX 4090, etc.
Tier 3: Open Source Providers
- Ollama + RunPod/Lambda
- llama.cpp quantized models
- vLLM serving framework
Research Questions
- Does GLM-5 offer unlimited subscription tier?
- What about Fireworks/Together flat-rate plans?
- AWS Bedrock with flat-rate (Amazon Q)?
- Self-hosted llama3 70B vs GLM-5 quality?
Next Steps Needed
- Manual research required (web browsing limited)
- Check Zhipu pricing page directly
- Compare subscription tiers
- Evaluate self-hosting break-even
Current Recommendation
Until research complete:
- Stay on Gemini (free tier) ✅
- Use sparingly to avoid 60/minute rate limit
- 300K tokens/day = ~9M tokens/month free
If need more than 9M/month: Evaluate paid tiers