Pricing

Three plans. GPU auto-detected. Unlimited queries on every plan.

Free

Get everyday AI with 2M token context on your own GPU.

$0/ month

No credit card required

AlphaChat app

Your personal, private AI assistant running locally

2M token context

Wikipedia knowledge base

Unlimited queries

Runs on any consumer GPU (8-32 GB)

Data never leaves your device

Most Popular

Pro

Get 25x more context and access to all knowledge bases and daily news.

$19/ month

25x more context than Free

Everything in Free and:

50M token context

Feed your entire codebase, all docs, every email

All 10 knowledge bases

Daily news updates

Custom document upload

Knowledge expert mode

Professional GPU

Business

Unlock the highest level of access with unlimited context and enterprise features.

Starting at:
$0.30/ MTok indexed / mo

$60/mo at 200M · $750/mo at 2.5B

$6K/mo at 20B · $30K/mo at 100B

Cached prefill free on reuse — index once, query forever

Get started

Everything in Pro and:

Unlimited context

No caps. Index your entire knowledge base.

API access (unlimited QPS)

Multi-seat + admin panel

SSO + SLA (99.9%)

Custom private knowledge base

Cached prefill free — pay once to index, reuse at $0

What Your GPU Can Run

AlphaLlama runs models that normally need $45K+ in datacenter GPUs on your single consumer GPU.

Your GPUVRAMWhat runsExamples
RTX 3090 / 409024 GB~40B dense / 397B MoEQwen3.5 35B MoE (116 tok/s), Qwen3.5 27B, Gemma 27B
RTX 509032 GB~50B dense / 397B MoEQwen3.5 35B MoE, Mixtral 8x7B, Llama 4 Scout
A100 / H100 80GB80 GB~130B denseQwen3.5 122B MoE (54 tok/s), Llama 3.3 70B, Qwen3.5 72B
H200 141GB141 GB~230B denseLlama 4 Maverick, DeepSeek V4 Flash (236B)
B200 / MI300X 192GB192 GB~300B+ denseMost open-weight models fit natively