AI Token Cost Calculator

Three common prompt scenarios — short chat, long document summary, agentic tool-use loop — costed across every major AI model using official May 2026 pricing. The bars are dollars per month at the stated request volume.

Short customer-support chat

Median support ticket: 800 input tokens (system + ticket body), 250 output tokens (reply). 100K tickets/month.

The spread

Same workload. Gemini 2.0 Flash: $18.00/month. GPT-5.5 Pro: $6,900/month. That is a 383x cost ratio for the same prompt.

GPT-5.5 Pro

$6,900

Claude Opus 4

$3,075

$1,800

GPT-5.5

$1,150

Claude Opus 4.7

$1,025

Claude Sonnet 4

$615

Grok 3

$615

Sonar Pro

$615

Gemini 3.1 Pro

$543

GPT-4o

$450

Command R+

$450

GPT-4.1

$360

Gemini 2.5 Pro

$350

Mistral Large 2

$310

Qwen 3.6 Plus

$252

DeepSeek V4 Pro

$226

o3 Mini

$198

Claude 3.5 Haiku

$164

Amazon Nova Pro

$144

DeepSeek R1

$98.75

DeepSeek V3

$49.10

Codestral

$46.50

Qwen 2.5 72B

$46.50

Grok 3 Mini

$36.50

Llama 4 Maverick

$31.00

GPT-4o Mini

$27.00

Qwen 2.5 Coder 32B

$23.25

Llama 4 Scout

$22.00

DeepSeek V4 Flash

$18.20

Gemini 2.0 Flash

$18.00

Based on official provider pricing as of 2026-05-06. Excludes prompt-caching, batch, and volume discounts. Self-hosted open-weight models (Gemma 4, Nemotron 3 Nano Omni) excluded from this view because their cost is infrastructure-driven, not token-priced.

Long document summary

40K input tokens (a 30-page PDF), 600 output tokens (executive summary). 5K runs/month.

The spread

Same workload. Gemini 2.0 Flash: $21.20/month. GPT-5.5 Pro: $6,540/month. That is a 308x cost ratio for the same prompt.

GPT-5.5 Pro

$6,540

Claude Opus 4

$3,225

$2,120

GPT-5.5

$1,090

Claude Opus 4.7

$1,075

Gemini 3.1 Pro

$732

Claude Sonnet 4

$645

Grok 3

$645

Sonar Pro

$645

GPT-4o

$530

Command R+

$530

GPT-4.1

$424

Mistral Large 2

$418

DeepSeek V4 Pro

$358

Qwen 3.6 Plus

$297

Gemini 2.5 Pro

$280

o3 Mini

$233

Claude 3.5 Haiku

$172

Amazon Nova Pro

$170

DeepSeek R1

$117

Codestral

$62.70

Qwen 2.5 72B

$62.70

Grok 3 Mini

$61.50

DeepSeek V3

$57.30

Llama 4 Maverick

$41.80

GPT-4o Mini

$31.80

Qwen 2.5 Coder 32B

$31.35

Llama 4 Scout

$31.20

DeepSeek V4 Flash

$28.84

Gemini 2.0 Flash

$21.20

Agentic tool-use loop

6K input tokens per turn (history + tool defs), 400 output tokens, 4 turns per task. 20K tasks/month.

The spread

Same workload. Gemini 2.0 Flash: $60.80/month. GPT-5.5 Pro: $20,160/month. That is a 332x cost ratio for the same prompt.

GPT-5.5 Pro

$20,160

Claude Opus 4

$9,600

$6,080

GPT-5.5

$3,360

Claude Opus 4.7

$3,200

Gemini 3.1 Pro

$2,016

Claude Sonnet 4

$1,920

Grok 3

$1,920

Sonar Pro

$1,920

GPT-4o

$1,520

Command R+

$1,520

GPT-4.1

$1,216

Mistral Large 2

$1,152

DeepSeek V4 Pro

$947

Gemini 2.5 Pro

$920

Qwen 3.6 Plus

$851

o3 Mini

$669

Claude 3.5 Haiku

$512

Amazon Nova Pro

$486

DeepSeek R1

$334

Codestral

$173

Qwen 2.5 72B

$173

DeepSeek V3

$165

Grok 3 Mini

$160

Llama 4 Maverick

$115

GPT-4o Mini

$91.20

Qwen 2.5 Coder 32B

$86.40

Llama 4 Scout

$84.80

DeepSeek V4 Flash

$76.16

Gemini 2.0 Flash

$60.80

How to read these visualizations

The same exact prompt sent to GPT-5.5 Pro versus DeepSeek V4 Flash can differ in cost by more than 200x. That spread is not quality — it is brand, infrastructure, and pricing strategy. Two of the three scenarios above show that even at the "expensive-end" of frontier models, the per-call cost is small; the magic happens at scale, where switching from GPT-5.5 Pro to DeepSeek V4 Pro on a 100K-call/month workload changes the annual line item from millions to tens of thousands.

The right tactic is not "always pick cheap"

For ~80% of production traffic, a mid-tier model (DeepSeek V4 Pro, Gemini 3.1 Pro, Claude Sonnet 4) is the right call. The remaining ~20% — high-stakes reasoning, ambiguous customer queries, agent planning — earns its keep at frontier prices. This is exactly the cascade and Mixture-of-Routers patterns described in our LLM routing deep-dive.

Related calculators

Cheap vs Expensive Model Comparison — same prompt, two extremes
Model-Mixing Cost Savings — what cascade and MoR actually save
AI Model Leaderboard
Vendor Lock-in Leaderboard

Pricing data sourced from official provider pages and OpenRouter, May 2026. All numbers exclude prompt caching (90% saving on cached input for Anthropic), batch (50% saving on most providers), and committed-use discounts.