Pricing
Three plans. GPU auto-detected. Unlimited queries on every plan.
Free
Get everyday AI with 2M token context on your own GPU.
No credit card required
AlphaChat app
Your personal, private AI assistant running locally
2M token context
Wikipedia knowledge base
Unlimited queries
Runs on any consumer GPU (8-32 GB)
Data never leaves your device
Pro
Get 25x more context and access to all knowledge bases and daily news.
25x more context than Free
Everything in Free and:
50M token context
Feed your entire codebase, all docs, every email
All 10 knowledge bases
Daily news updates
Custom document upload
Knowledge expert mode
Business
Unlock the highest level of access with unlimited context and enterprise features.
$60/mo at 200M · $750/mo at 2.5B
$6K/mo at 20B · $30K/mo at 100B
Cached prefill free on reuse — index once, query forever
Get startedEverything in Pro and:
Unlimited context
No caps. Index your entire knowledge base.
API access (unlimited QPS)
Multi-seat + admin panel
SSO + SLA (99.9%)
Custom private knowledge base
Cached prefill free — pay once to index, reuse at $0
What Your GPU Can Run
AlphaLlama runs models that normally need $45K+ in datacenter GPUs on your single consumer GPU.
| Your GPU | VRAM | What runs | Examples |
|---|---|---|---|
| RTX 3090 / 4090 | 24 GB | ~40B dense / 397B MoE | Qwen3.5 35B MoE (116 tok/s), Qwen3.5 27B, Gemma 27B |
| RTX 5090 | 32 GB | ~50B dense / 397B MoE | Qwen3.5 35B MoE, Mixtral 8x7B, Llama 4 Scout |
| A100 / H100 80GB | 80 GB | ~130B dense | Qwen3.5 122B MoE (54 tok/s), Llama 3.3 70B, Qwen3.5 72B |
| H200 141GB | 141 GB | ~230B dense | Llama 4 Maverick, DeepSeek V4 Flash (236B) |
| B200 / MI300X 192GB | 192 GB | ~300B+ dense | Most open-weight models fit natively |