amr/memex

Files

Amr Gharbeia b7e082c403 refactor: moved org-agent to its own repository as a submodule

2026-03-27 15:46:53 -04:00

1.8 KiB

Raw Blame History

Alternative LLM Providers - Subscription & Token Efficient

GLM-5 (Zhipu AI) - Research
Alternative Providers to Research
Research Questions
Next Steps Needed
Current Recommendation

GLM-5 (Zhipu AI) - Research

Pricing Found

Input: $1.00 per 1M tokens
Output: $3.20 per 1M tokens
Context: ~744B parameters, MoE architecture
Training: 28.5T tokens

Comparison to Current

Model	Input Cost	Output Cost	Context	Free Tier
Gemini 2.0	$0	$0	1M	✅ Yes
GLM-5	$1.00	$3.20	?	?
Claude	$3.00	$15.00	200K	❌ No
GPT-4	varies	varies	128K	❌ No

Status: Still researching subscription/unlimited plans

Alternative Providers to Research

Tier 1: Subscription/Unlimited

Fireworks AI - Flat-rate inference
Together AI - Pay-per-token but high limits
Replicate - Metered but competitive
Groq - Ultra-fast, low cost

Tier 2: Self-Hosted (One-time cost)

RunPod - GPU rental for local models
Lambdalabs - GPU cloud
Local inference - RTX 4090, etc.

Tier 3: Open Source Providers

Ollama + RunPod/Lambda
llama.cpp quantized models
vLLM serving framework

Research Questions

Does GLM-5 offer unlimited subscription tier?
What about Fireworks/Together flat-rate plans?
AWS Bedrock with flat-rate (Amazon Q)?
Self-hosted llama3 70B vs GLM-5 quality?

Next Steps Needed

Manual research required (web browsing limited)
Check Zhipu pricing page directly
Compare subscription tiers
Evaluate self-hosting break-even

Current Recommendation

Until research complete:

Stay on Gemini (free tier) ✅
Use sparingly to avoid 60/minute rate limit
300K tokens/day = ~9M tokens/month free

If need more than 9M/month: Evaluate paid tiers