refactor: moved org-agent to its own repository as a submodule

2026-03-27 15:46:53 -04:00
parent 01f76a4570
commit b7e082c403
176 changed files with 19686 additions and 9665 deletions
--- a/notes/llm-alternative-providers.org
+++ b/notes/llm-alternative-providers.org
@@ -0,0 +1,64 @@
+#+TITLE: Alternative LLM Providers - Subscription & Token Efficient
+#+author: User
+#+created: [2026-03-16 Mon 14:28]
+#+DATE: 2026-03-07
+#+FILETAGS: :research:llm:pricing:alternatives
+
+* GLM-5 (Zhipu AI) - Research
+
+** Pricing Found
+- Input: $1.00 per 1M tokens
+- Output: $3.20 per 1M tokens
+- Context: ~744B parameters, MoE architecture
+- Training: 28.5T tokens
+
+** Comparison to Current
+| Model | Input Cost | Output Cost | Context | Free Tier |
+|-------|-----------|-------------|---------|-----------|
+| Gemini 2.0 | $0 | $0 | 1M | ✅ Yes |
+| GLM-5 | $1.00 | $3.20 | ? | ? |
+| Claude | $3.00 | $15.00 | 200K | ❌ No |
+| GPT-4 | varies | varies | 128K | ❌ No |
+
+** Status: Still researching subscription/unlimited plans
+
+* Alternative Providers to Research
+
+** Tier 1: Subscription/Unlimited
+1. *Fireworks AI* - Flat-rate inference
+2. *Together AI* - Pay-per-token but high limits
+3. *Replicate* - Metered but competitive
+4. *Groq* - Ultra-fast, low cost
+
+** Tier 2: Self-Hosted (One-time cost)
+1. *RunPod* - GPU rental for local models
+2. *Lambdalabs* - GPU cloud
+3. *Local inference* - RTX 4090, etc.
+
+** Tier 3: Open Source Providers
+1. *Ollama* + RunPod/Lambda
+2. *llama.cpp* quantized models
+3. *vLLM* serving framework
+
+* Research Questions
+
+1. Does GLM-5 offer unlimited subscription tier?
+2. What about Fireworks/Together flat-rate plans?
+3. AWS Bedrock with flat-rate (Amazon Q)?
+4. Self-hosted llama3 70B vs GLM-5 quality?
+
+* Next Steps Needed
+
+- Manual research required (web browsing limited)
+- Check Zhipu pricing page directly
+- Compare subscription tiers
+- Evaluate self-hosting break-even
+
+* Current Recommendation
+
+*Until research complete:*
+- Stay on Gemini (free tier) ✅
+- Use sparingly to avoid 60/minute rate limit
+- 300K tokens/day = ~9M tokens/month free
+
+*If need more than 9M/month:* Evaluate paid tiers