AI Token Cost Calculator

Three common prompt scenarios — short chat, long document summary, agentic tool-use loop — costed across every major AI model using official May 2026 pricing. The bars are dollars per month at the stated request volume.

Short customer-support chat

Median support ticket: 800 input tokens (system + ticket body), 250 output tokens (reply). 100K tickets/month.

The spread

Same workload. Gemini 2.0 Flash: $18.00/month. GPT-5.5 Pro: $6,900/month. That is a 383x cost ratio for the same prompt.

GPT-5.5 Pro
$6,900
Claude Opus 4
$3,075
o3
$1,800
GPT-5.5
$1,150
Claude Opus 4.7
$1,025
Claude Sonnet 4
$615
Grok 3
$615
Sonar Pro
$615
Gemini 3.1 Pro
$543
GPT-4o
$450
Command R+
$450
GPT-4.1
$360
Gemini 2.5 Pro
$350
Mistral Large 2
$310
Qwen 3.6 Plus
$252
DeepSeek V4 Pro
$226
o3 Mini
$198
Claude 3.5 Haiku
$164
Amazon Nova Pro
$144
DeepSeek R1
$98.75
DeepSeek V3
$49.10
Codestral
$46.50
Qwen 2.5 72B
$46.50
Grok 3 Mini
$36.50
Llama 4 Maverick
$31.00
GPT-4o Mini
$27.00
Qwen 2.5 Coder 32B
$23.25
Llama 4 Scout
$22.00
DeepSeek V4 Flash
$18.20
Gemini 2.0 Flash
$18.00

Based on official provider pricing as of 2026-05-06. Excludes prompt-caching, batch, and volume discounts. Self-hosted open-weight models (Gemma 4, Nemotron 3 Nano Omni) excluded from this view because their cost is infrastructure-driven, not token-priced.

Long document summary

40K input tokens (a 30-page PDF), 600 output tokens (executive summary). 5K runs/month.

The spread

Same workload. Gemini 2.0 Flash: $21.20/month. GPT-5.5 Pro: $6,540/month. That is a 308x cost ratio for the same prompt.

GPT-5.5 Pro
$6,540
Claude Opus 4
$3,225
o3
$2,120
GPT-5.5
$1,090
Claude Opus 4.7
$1,075
Gemini 3.1 Pro
$732
Claude Sonnet 4
$645
Grok 3
$645
Sonar Pro
$645
GPT-4o
$530
Command R+
$530
GPT-4.1
$424
Mistral Large 2
$418
DeepSeek V4 Pro
$358
Qwen 3.6 Plus
$297
Gemini 2.5 Pro
$280
o3 Mini
$233
Claude 3.5 Haiku
$172
Amazon Nova Pro
$170
DeepSeek R1
$117
Codestral
$62.70
Qwen 2.5 72B
$62.70
Grok 3 Mini
$61.50
DeepSeek V3
$57.30
Llama 4 Maverick
$41.80
GPT-4o Mini
$31.80
Qwen 2.5 Coder 32B
$31.35
Llama 4 Scout
$31.20
DeepSeek V4 Flash
$28.84
Gemini 2.0 Flash
$21.20

Based on official provider pricing as of 2026-05-06. Excludes prompt-caching, batch, and volume discounts. Self-hosted open-weight models (Gemma 4, Nemotron 3 Nano Omni) excluded from this view because their cost is infrastructure-driven, not token-priced.

Agentic tool-use loop

6K input tokens per turn (history + tool defs), 400 output tokens, 4 turns per task. 20K tasks/month.

The spread

Same workload. Gemini 2.0 Flash: $60.80/month. GPT-5.5 Pro: $20,160/month. That is a 332x cost ratio for the same prompt.

GPT-5.5 Pro
$20,160
Claude Opus 4
$9,600
o3
$6,080
GPT-5.5
$3,360
Claude Opus 4.7
$3,200
Gemini 3.1 Pro
$2,016
Claude Sonnet 4
$1,920
Grok 3
$1,920
Sonar Pro
$1,920
GPT-4o
$1,520
Command R+
$1,520
GPT-4.1
$1,216
Mistral Large 2
$1,152
DeepSeek V4 Pro
$947
Gemini 2.5 Pro
$920
Qwen 3.6 Plus
$851
o3 Mini
$669
Claude 3.5 Haiku
$512
Amazon Nova Pro
$486
DeepSeek R1
$334
Codestral
$173
Qwen 2.5 72B
$173
DeepSeek V3
$165
Grok 3 Mini
$160
Llama 4 Maverick
$115
GPT-4o Mini
$91.20
Qwen 2.5 Coder 32B
$86.40
Llama 4 Scout
$84.80
DeepSeek V4 Flash
$76.16
Gemini 2.0 Flash
$60.80

Based on official provider pricing as of 2026-05-06. Excludes prompt-caching, batch, and volume discounts. Self-hosted open-weight models (Gemma 4, Nemotron 3 Nano Omni) excluded from this view because their cost is infrastructure-driven, not token-priced.

How to read these visualizations

The same exact prompt sent to GPT-5.5 Pro versus DeepSeek V4 Flash can differ in cost by more than 200x. That spread is not quality — it is brand, infrastructure, and pricing strategy. Two of the three scenarios above show that even at the "expensive-end" of frontier models, the per-call cost is small; the magic happens at scale, where switching from GPT-5.5 Pro to DeepSeek V4 Pro on a 100K-call/month workload changes the annual line item from millions to tens of thousands.

The right tactic is not "always pick cheap"

For ~80% of production traffic, a mid-tier model (DeepSeek V4 Pro, Gemini 3.1 Pro, Claude Sonnet 4) is the right call. The remaining ~20% — high-stakes reasoning, ambiguous customer queries, agent planning — earns its keep at frontier prices. This is exactly the cascade and Mixture-of-Routers patterns described in our LLM routing deep-dive.

Related calculators

Pricing data sourced from official provider pages and OpenRouter, May 2026. All numbers exclude prompt caching (90% saving on cached input for Anthropic), batch (50% saving on most providers), and committed-use discounts.