Files
memex/notes/llm-alternative-providers.org

1.8 KiB

Alternative LLM Providers - Subscription & Token Efficient

GLM-5 (Zhipu AI) - Research

Pricing Found

  • Input: $1.00 per 1M tokens
  • Output: $3.20 per 1M tokens
  • Context: ~744B parameters, MoE architecture
  • Training: 28.5T tokens

Comparison to Current

Model Input Cost Output Cost Context Free Tier
Gemini 2.0 $0 $0 1M Yes
GLM-5 $1.00 $3.20 ? ?
Claude $3.00 $15.00 200K No
GPT-4 varies varies 128K No

Status: Still researching subscription/unlimited plans

Alternative Providers to Research

Tier 1: Subscription/Unlimited

  1. Fireworks AI - Flat-rate inference
  2. Together AI - Pay-per-token but high limits
  3. Replicate - Metered but competitive
  4. Groq - Ultra-fast, low cost

Tier 2: Self-Hosted (One-time cost)

  1. RunPod - GPU rental for local models
  2. Lambdalabs - GPU cloud
  3. Local inference - RTX 4090, etc.

Tier 3: Open Source Providers

  1. Ollama + RunPod/Lambda
  2. llama.cpp quantized models
  3. vLLM serving framework

Research Questions

  1. Does GLM-5 offer unlimited subscription tier?
  2. What about Fireworks/Together flat-rate plans?
  3. AWS Bedrock with flat-rate (Amazon Q)?
  4. Self-hosted llama3 70B vs GLM-5 quality?

Next Steps Needed

  • Manual research required (web browsing limited)
  • Check Zhipu pricing page directly
  • Compare subscription tiers
  • Evaluate self-hosting break-even

Current Recommendation

Until research complete:

  • Stay on Gemini (free tier)
  • Use sparingly to avoid 60/minute rate limit
  • 300K tokens/day = ~9M tokens/month free

If need more than 9M/month: Evaluate paid tiers