AI Token Cost Calculator

Three common prompt scenarios — short chat, long document summary, agentic tool-use loop — costed across every major AI model using official June 2026 pricing. The bars are dollars per month at the stated request volume.

Short customer-support chat

Median support ticket: 800 input tokens (system + ticket body), 250 output tokens (reply). 100K tickets/month.

The spread

Same workload. DeepSeek V4 Flash: $13.00/month. GPT-5.5 Pro: $6,900/month. That is a 531x cost ratio for the same prompt.

GPT-5.5 Pro

$6,900

Claude Opus 4

$3,075

$1,800

GPT-5.5

$1,150

Claude Opus 4.7

$1,025

Claude Opus 4.8

$1,025

Claude Sonnet 4

$615

Grok 3

$615

Sonar Pro

$615

Claude Sonnet 4.6

$615

Gemini 3.1 Pro

$460

GPT-4o

$450

Command R+

$450

Qwen 3.7 Max

$388

GPT-4.1

$360

Gemini 2.5 Pro

$350

Mistral Large 2

$310

Qwen 3.6 Plus

$252

DeepSeek V4 Pro

$226

o3 Mini

$198

Claude 3.5 Haiku

$164

Grok 4.3

$163

GLM-5.1

$155

Kimi K2.6

$146

Amazon Nova Pro

$144

DeepSeek R1

$98.75

DeepSeek V3

$49.10

Codestral

$46.50

Qwen 2.5 72B

$46.50

Grok 3 Mini

$36.50

Llama 4 Maverick

$31.00

GPT-4o Mini

$27.00

Qwen 2.5 Coder 32B

$23.25

Llama 4 Scout

$22.00

Gemini 2.0 Flash

$18.00

DeepSeek V4 Flash

$13.00

Based on official provider pricing as of 2026-05-06. Excludes prompt-caching, batch, and volume discounts. Self-hosted open-weight models (Gemma 4, Nemotron 3 Nano Omni) excluded from this view because their cost is infrastructure-driven, not token-priced.

Long document summary

40K input tokens (a 30-page PDF), 600 output tokens (executive summary). 5K runs/month.

The spread

Same workload. DeepSeek V4 Flash: $20.60/month. GPT-5.5 Pro: $6,540/month. That is a 317x cost ratio for the same prompt.

GPT-5.5 Pro

$6,540

Claude Opus 4

$3,225

$2,120

GPT-5.5

$1,090

Claude Opus 4.7

$1,075

Claude Opus 4.8

$1,075

Claude Sonnet 4

$645

Grok 3

$645

Sonar Pro

$645

Claude Sonnet 4.6

$645

GPT-4o

$530

Command R+

$530

Qwen 3.7 Max

$523

Gemini 3.1 Pro

$436

GPT-4.1

$424

Mistral Large 2

$418

DeepSeek V4 Pro

$358

Qwen 3.6 Plus

$297

Gemini 2.5 Pro

$280

Grok 4.3

$258

o3 Mini

$233

GLM-5.1

$205

Claude 3.5 Haiku

$172

Amazon Nova Pro

$170

Kimi K2.6

$156

DeepSeek R1

$117

Codestral

$62.70

Qwen 2.5 72B

$62.70

Grok 3 Mini

$61.50

DeepSeek V3

$57.30

Llama 4 Maverick

$41.80

GPT-4o Mini

$31.80

Qwen 2.5 Coder 32B

$31.35

Llama 4 Scout

$31.20

Gemini 2.0 Flash

$21.20

DeepSeek V4 Flash

$20.60

Agentic tool-use loop

6K input tokens per turn (history + tool defs), 400 output tokens, 4 turns per task. 20K tasks/month.

The spread

Same workload. DeepSeek V4 Flash: $54.40/month. GPT-5.5 Pro: $20,160/month. That is a 371x cost ratio for the same prompt.

GPT-5.5 Pro

$20,160

Claude Opus 4

$9,600

$6,080

GPT-5.5

$3,360

Claude Opus 4.7

$3,200

Claude Opus 4.8

$3,200

Claude Sonnet 4

$1,920

Grok 3

$1,920

Sonar Pro

$1,920

Claude Sonnet 4.6

$1,920

GPT-4o

$1,520

Command R+

$1,520

Qwen 3.7 Max

$1,440

Gemini 3.1 Pro

$1,344

GPT-4.1

$1,216

Mistral Large 2

$1,152

DeepSeek V4 Pro

$947

Gemini 2.5 Pro

$920

Qwen 3.6 Plus

$851

Grok 4.3

$680

o3 Mini

$669

GLM-5.1

$569

Claude 3.5 Haiku

$512

Amazon Nova Pro

$486

Kimi K2.6

$462

DeepSeek R1

$334

Codestral

$173

Qwen 2.5 72B

$173

DeepSeek V3

$165

Grok 3 Mini

$160

Llama 4 Maverick

$115

GPT-4o Mini

$91.20

Qwen 2.5 Coder 32B

$86.40

Llama 4 Scout

$84.80

Gemini 2.0 Flash

$60.80

DeepSeek V4 Flash

$54.40

How to read these visualizations

The same exact prompt sent to GPT-5.5 Pro versus DeepSeek V4 Flash can differ in cost by more than 200x. That spread is not quality — it is brand, infrastructure, and pricing strategy. Two of the three scenarios above show that even at the "expensive-end" of frontier models, the per-call cost is small; the magic happens at scale, where switching from GPT-5.5 Pro to DeepSeek V4 Pro on a 100K-call/month workload changes the annual line item from millions to tens of thousands.

The right tactic is not "always pick cheap"

For ~80% of production traffic, a mid-tier model (DeepSeek V4 Pro, Gemini 3.1 Pro, Claude Sonnet 4) is the right call. The remaining ~20% — high-stakes reasoning, ambiguous customer queries, agent planning — earns its keep at frontier prices. This is exactly the cascade and Mixture-of-Routers patterns described in our LLM routing deep-dive.

Related calculators

Cheap vs Expensive Model Comparison — same prompt, two extremes
Model-Mixing Cost Savings — what cascade and MoR actually save
AI Model Leaderboard
Vendor Lock-in Leaderboard

Pricing data sourced from official provider pages and OpenRouter, June 2026. All numbers exclude prompt caching (90% saving on cached input for Anthropic), batch (50% saving on most providers), and committed-use discounts.