How much does it cost to use AI APIs?

AI API costs range from $0.10 to $75.00 per million tokens depending on the model. For a typical application sending 100,000 prompts per month with 500 input tokens and 300 output tokens each, costs range from ~$5/month (Gemini Flash) to ~$2,600/month (Claude Opus 4).

How do I calculate AI API costs?

Multiply your monthly prompts by average tokens per prompt, then multiply by the model's per-token price. Most providers charge separately for input and output tokens. Use our calculator above for instant estimates across all major providers.

What is the cheapest AI API?

Google Gemini 2.0 Flash is currently the cheapest major AI API at $0.10/1M input tokens and $0.40/1M output tokens. DeepSeek V3 ($0.27/$1.10) and Meta Llama 4 Scout ($0.15/$0.40) are also extremely affordable.

How can I reduce AI API costs?

Use model routing to send simple queries to cheap models and complex queries to premium models. A gateway like Swfte Connect automates this, typically saving 30-60%. Also consider cached input pricing, prompt optimization, and batching requests.

AI Cost Calculator

Enter your usage and instantly compare monthly costs across 38 AI models. Find the cheapest option for your workload.

Monthly cost estimate

Enter your typical request shape. Costs below are projected over one month, based on current public list-price API rates.

Requests per month

Input tokens per request

Output tokens per request

Per month: 100K requests · 50.0M input tokens · 30.0M output tokens. Excludes prompt caching, batch discounts, retries, and fees.

Cheapest

DeepSeek V4 Flash

$11.00

per month at this volume

Best value (quality ≥ 80)

DeepSeek V4 Flash · Q 80

$11.00

per month at this volume

Most expensive

GPT-5.5 Pro

$6900.00

per month at this volume

Save 30-60% with Mixture-of-Routers

Most production traffic is mixed-difficulty. Send the easy 60% to a cheap model and the hard 10% to a frontier model — same quality, fraction of the cost.

See the math

Full breakdown by model

Sorted cheapest to most expensive

Model	Cost / request	Input cost / mo	Output cost / mo	Total / mo
DeepSeek V4 Flash $0.1 in / $0.2 out per 1M	$0.000110	$5.00	$6.00	$11.00
Gemini 2.0 Flash $0.1 in / $0.4 out per 1M	$0.000170	$5.00	$12.00	$17.00
Llama 4 Scout $0.15 in / $0.4 out per 1M	$0.000195	$7.50	$12.00	$19.50
Qwen 2.5 Coder 32B $0.15 in / $0.45 out per 1M	$0.000210	$7.50	$13.50	$21.00
GPT-4o Mini $0.15 in / $0.6 out per 1M	$0.000255	$7.50	$18.00	$25.50
Llama 4 Maverick $0.2 in / $0.6 out per 1M	$0.000280	$10.00	$18.00	$28.00
Grok 3 Mini $0.3 in / $0.5 out per 1M	$0.000300	$15.00	$15.00	$30.00
Codestral $0.3 in / $0.9 out per 1M	$0.000420	$15.00	$27.00	$42.00
Qwen 2.5 72B $0.3 in / $0.9 out per 1M	$0.000420	$15.00	$27.00	$42.00
DeepSeek V3 $0.27 in / $1.1 out per 1M	$0.000465	$13.50	$33.00	$46.50
DeepSeek R1 $0.55 in / $2.19 out per 1M	$0.000932	$27.50	$65.70	$93.20
Amazon Nova Pro $0.8 in / $3.2 out per 1M	$0.001360	$40.00	$96.00	$136.00
Grok 4.3 $1.25 in / $2.5 out per 1M	$0.001375	$62.50	$75.00	$137.50
Kimi K2.6 $0.73 in / $3.49 out per 1M	$0.001412	$36.50	$104.70	$141.20
GLM-5.1 $0.98 in / $3.08 out per 1M	$0.001414	$49.00	$92.40	$141.40
Claude 3.5 Haiku $0.8 in / $4 out per 1M	$0.001600	$40.00	$120.00	$160.00
o3 Mini $1.1 in / $4.4 out per 1M	$0.001870	$55.00	$132.00	$187.00
DeepSeek V4 Pro $1.74 in / $3.48 out per 1M	$0.001914	$87.00	$104.40	$191.40
Qwen 3.6 Plus $1.4 in / $5.6 out per 1M	$0.002380	$70.00	$168.00	$238.00
Mistral Large 2 $2 in / $6 out per 1M	$0.002800	$100.00	$180.00	$280.00
GPT-4.1 $2 in / $8 out per 1M	$0.003400	$100.00	$240.00	$340.00
Qwen 3.7 Max $2.5 in / $7.5 out per 1M	$0.003500	$125.00	$225.00	$350.00
Gemini 2.5 Pro $1.25 in / $10 out per 1M	$0.003625	$62.50	$300.00	$362.50
GPT-4o $2.5 in / $10 out per 1M	$0.004250	$125.00	$300.00	$425.00
Command R+ $2.5 in / $10 out per 1M	$0.004250	$125.00	$300.00	$425.00
Gemini 3.1 Pro $2 in / $12 out per 1M	$0.004600	$100.00	$360.00	$460.00
Claude Sonnet 4 $3 in / $15 out per 1M	$0.006000	$150.00	$450.00	$600.00
Grok 3 $3 in / $15 out per 1M	$0.006000	$150.00	$450.00	$600.00
Sonar Pro $3 in / $15 out per 1M	$0.006000	$150.00	$450.00	$600.00
Claude Sonnet 4.6 $3 in / $15 out per 1M	$0.006000	$150.00	$450.00	$600.00
Claude Opus 4.7 $5 in / $25 out per 1M	$0.0100	$250.00	$750.00	$1000.00
Claude Opus 4.8 $5 in / $25 out per 1M	$0.0100	$250.00	$750.00	$1000.00
GPT-5.5 $5 in / $30 out per 1M	$0.0115	$250.00	$900.00	$1150.00
o3 $10 in / $40 out per 1M	$0.0170	$500.00	$1200.00	$1700.00
Claude Opus 4 $15 in / $75 out per 1M	$0.0300	$750.00	$2250.00	$3000.00
GPT-5.5 Pro $30 in / $180 out per 1M	$0.0690	$1500.00	$5400.00	$6900.00
Gemma 4 27B Self-host Open weights (Apache 2.0) — token cost is $0; infra cost depends on hardware	—	—	—	Self-host
Nemotron 3 Nano Omni Self-host Open weights (NVIDIA Open Model License) — token cost is $0; infra cost depends on hardware	—	—	—	Self-host

List-price estimate. Real bills typically run 1.3-1.7x higher after retries, system-prompt re-sends, and tool-call round-trips. See per-million-tokens true cost for the adders.

How AI API Pricing Works

AI model providers charge based on tokens — the basic unit of text processing. One token is roughly 4 characters or ¾ of a word. Most providers charge separately for input tokens (your prompt) and output tokens (the model's response), with output tokens typically costing 2-5x more than input tokens.

Typical Usage Patterns

Chatbot (customer support): ~500 input tokens, ~300 output tokens per message, 50K-500K messages/month
Code generation: ~1,000 input tokens, ~500 output tokens per request, 10K-100K requests/month
Document analysis: ~2,000 input tokens, ~200 output tokens per document, 5K-50K documents/month
Content generation: ~300 input tokens, ~1,000 output tokens per piece, 1K-20K pieces/month

Cost Optimization Strategies

The most impactful strategy is intelligent model routing. Rather than sending every request to a premium model, analyze the complexity of each request and route simple ones to cheaper, faster models. Swfte Connect does this automatically, typically reducing API costs by 30-60%.

Other strategies include: using cached input pricing (available from Google and DeepSeek), optimizing prompts to reduce token usage, batching API calls, and self-hosting open-source models for predictable, high-volume workloads.

Comparing Providers

See our full pricing index for a comprehensive comparison of all providers, including historical pricing trends. Or check the model leaderboard to understand the quality vs. cost tradeoffs.