How much does the Claude API cost in May 2026?

Claude Opus 4.7 is $5/$25 per 1M tokens (input/output) with cached input at $0.5/1M. Sonnet 4 is $3/$15. Haiku 3.5 is $0.8/$4. Batch tier is 50% off list on all three. Cached input is 90% off list — the largest single lever on effective cost.

How does Claude prompt caching pricing actually work?

Anthropic charges 90% less than list for tokens served from cache hits. Opus 4.7 cached input is $0.50 per 1M instead of $5. Cache writes are 25% more expensive than list, and entries persist for 5 minutes by default. Production teams who cache the system prompt and tool definitions on every call typically cut their Claude bill by 40-60%.

Why did my Claude bill rise after upgrading to Opus 4.7?

Opus 4.7 ships with a new tokenizer that produces ~35% more tokens for the same input text vs Opus 4.6. List price is unchanged at $5/$25 per 1M tokens, so the effective cost rose by approximately one-third without any visible price change. Re-baseline your token-count assumptions when migrating.

Is Claude cheaper than GPT-5.5 or Gemini 3.1 Pro?

Claude Opus 4.7 ($5/$25 per 1M) is roughly 6x cheaper than GPT-5.5 Pro ($30/$180), comparable to baseline GPT-5.5 ($5/$30), and slightly more expensive than Gemini 3.1 Pro ($3.50/$10.50) on output. For coding workloads, Opus 4.7 offers the best Arena-Elo per dollar of any frontier closed model.

Does Anthropic offer a batch tier?

Yes. The Claude Message Batches API offers 50% off list prices on all three current models for asynchronous workloads (24-hour turnaround SLA). Batch is stackable with prompt caching for cumulative discounts.

Updated May 6, 2026

Claude API Pricing — May 2026

Definitive Claude API pricing reference. Per-1M-token rates for Opus 4.7, Sonnet 4, and Haiku 3.5, plus cached-input pricing, batch tier discounts, the Opus 4.7 tokenizer drift, and a clean comparison vs OpenAI, Gemini, and DeepSeek.

Claude model line, list pricing

Model	Input /1M	Cached /1M	Output /1M	Context
Claude Opus 4.7 Coding Arena #1 at 1567 Elo. New tokenizer produces ~35% more tokens per input vs 4.6 — list price unchanged but effective cost rose.	$5.00	$0.50	$25.00	500K
Claude Sonnet 4 The default workhorse tier. Strong tool-use, agentic coding, and long-context reasoning at a fraction of Opus pricing.	$3.00	$0.30	$15.00	1M
Claude Haiku 3.5 Cheapest Anthropic tier. Suited to classification, extraction, and routing. Quality is well below Sonnet but throughput is high.	$0.80	$0.08	$4.00	200K

All prices in USD per 1M tokens. Cached column is the per-1M rate for cache hits — 90% off list. Cache writes are 25% more expensive than list. Default cache TTL is 5 minutes.

Prompt caching: the largest single lever

Cached input savings, per 1M input tokens (Claude May 2026)

  Opus 4.7    list $5.00   ████████████████████   cached $0.50  ██   -90%
  Sonnet 4    list $3.00   ████████████           cached $0.30  ██   -90%
  Haiku 3.5   list $0.80   ███                    cached $0.08  ▏    -90%

Production teams that cache the system prompt + tool defs typically
cut Claude bills 40-60% with one config change.

Anthropic's prompt caching is the single largest lever on effective Claude bill in 2026. The mechanism: mark a prefix (system prompt, tool definitions, retrieved context) as cacheable, and any subsequent call within the 5-minute TTL pays 10% of list for those tokens. The break-even on cache writes vs list is around 2 reuses, so any prompt sent more than twice within 5 minutes saves money.

Batch tier — 50% off list

Model	Batch In /1M	Batch Out /1M	SLA
Claude Opus 4.7	$2.50	$12.50	24h
Claude Sonnet 4	$1.50	$7.50	24h
Claude Haiku 3.5	$0.40	$2.00	24h

The Claude Message Batches API gives 50% off list for asynchronous work with a 24-hour turnaround SLA. Stack it with prompt caching for cumulative discounts: a cached batch call on Opus 4.7 effectively prices at ~$0.25 per 1M cached input, far below most synchronous frontier alternatives.

The Opus 4.7 tokenizer drift

Anthropic shipped a new tokenizer with Opus 4.7. Same English text now produces ~35% more tokens than under the Opus 4.6 tokenizer. List price is unchanged at $5/$25 per 1M, but the effective cost rose by roughly one-third on identical inputs.

Effective cost change after Opus 4.6 to 4.7 migration

  Workload          List price           Effective change
  System prompt     $5/$25 per 1M        +35% (more tokens)
  Long-context      $5/$25 per 1M        +35% (more tokens)
  Coding task       $5/$25 per 1M        +28% (code is denser)
  Code-only         $5/$25 per 1M        +18% (mostly stable)

See our per-million-tokens true cost writeup for the full hidden-adders breakdown.

Claude vs OpenAI vs Gemini vs DeepSeek

Vendor	Model	In /1M	Out /1M	Cache	Batch
Anthropic	Claude Opus 4.7	$5.00	$25.00	Yes	Yes
OpenAI	GPT-5.5 Pro	$30.00	$180.00	Yes	Yes
OpenAI	GPT-5.5	$5.00	$30.00	Yes	Yes
Google	Gemini 3.1 Pro	$3.50	$10.50	Yes	Yes
DeepSeek	DeepSeek V4 Pro	$1.74	$3.48	Yes	—
DeepSeek	DeepSeek V4 Flash	$0.14	$0.28	—	—

Output cost per 1M tokens (frontier tier, May 2026)

  GPT-5.5 Pro       $180.00  ████████████████████████████  most expensive
  GPT-5.5            $30.00  █████                         standard tier
  Claude Opus 4.7    $25.00  ████                          coding leader
  Sonnet 4           $15.00  ██                            workhorse
  Gemini 3.1 Pro     $10.50  █▌                            text Arena #1
  DeepSeek V4 Pro     $3.48  ▌                             open weights
  DeepSeek V4 Flash   $0.28  ▏                             cheapest tier

Spread between cheapest and most expensive: ~643x

What to do this quarter

Turn on prompt caching everywhere. System prompts and tool definitions are the same on every call — they are pure cache wins. Expect 40-60% bill reduction with one config change.
Move asynchronous workloads to batch. Anything that can wait 24 hours should be on the batch tier at 50% off. Stacks with caching.
Re-baseline tokens on every Anthropic launch. The 4.6 to 4.7 tokenizer change added 35% to effective cost with no list-price change. Treat tokenizer churn as a price-change event.
Use Sonnet 4 as the default, Opus 4.7 by exception. Sonnet 4 covers 70% of typical agentic and chat workloads at one-fifth the price. Reserve Opus 4.7 for hardest-task coding and reasoning where it earns its premium.
Pair with a cheap-tier fallback. Cascade trivial traffic to DeepSeek V4 Flash or Gemini 3.1 Pro. Quality delta on simple work is rounding error; cost delta is 10-200x.
Negotiate volume committed-spend discounts. Anthropic offers committed-use pricing above ~$50K/mo. The public list is the floor.
Track the bill, not the list. Effective rate = bill divided by external prompts handled. That is the only number that matters at quarter end.