#+TITLE: Alternative LLM Providers - Subscription & Token Efficient #+author: User #+created: [2026-03-16 Mon 14:28] #+DATE: 2026-03-07 #+FILETAGS: :research:llm:pricing:alternatives * GLM-5 (Zhipu AI) - Research ** Pricing Found - Input: $1.00 per 1M tokens - Output: $3.20 per 1M tokens - Context: ~744B parameters, MoE architecture - Training: 28.5T tokens ** Comparison to Current | Model | Input Cost | Output Cost | Context | Free Tier | |-------|-----------|-------------|---------|-----------| | Gemini 2.0 | $0 | $0 | 1M | ✅ Yes | | GLM-5 | $1.00 | $3.20 | ? | ? | | Claude | $3.00 | $15.00 | 200K | ❌ No | | GPT-4 | varies | varies | 128K | ❌ No | ** Status: Still researching subscription/unlimited plans * Alternative Providers to Research ** Tier 1: Subscription/Unlimited 1. *Fireworks AI* - Flat-rate inference 2. *Together AI* - Pay-per-token but high limits 3. *Replicate* - Metered but competitive 4. *Groq* - Ultra-fast, low cost ** Tier 2: Self-Hosted (One-time cost) 1. *RunPod* - GPU rental for local models 2. *Lambdalabs* - GPU cloud 3. *Local inference* - RTX 4090, etc. ** Tier 3: Open Source Providers 1. *Ollama* + RunPod/Lambda 2. *llama.cpp* quantized models 3. *vLLM* serving framework * Research Questions 1. Does GLM-5 offer unlimited subscription tier? 2. What about Fireworks/Together flat-rate plans? 3. AWS Bedrock with flat-rate (Amazon Q)? 4. Self-hosted llama3 70B vs GLM-5 quality? * Next Steps Needed - Manual research required (web browsing limited) - Check Zhipu pricing page directly - Compare subscription tiers - Evaluate self-hosting break-even * Current Recommendation *Until research complete:* - Stay on Gemini (free tier) ✅ - Use sparingly to avoid 60/minute rate limit - 300K tokens/day = ~9M tokens/month free *If need more than 9M/month:* Evaluate paid tiers