Claude API Pricing — May 2026
Definitive Claude API pricing reference. Per-1M-token rates for Opus 4.7, Sonnet 4, and Haiku 3.5, plus cached-input pricing, batch tier discounts, the Opus 4.7 tokenizer drift, and a clean comparison vs OpenAI, Gemini, and DeepSeek.
Claude model line, list pricing
| Model | Input /1M | Cached /1M | Output /1M | Context |
|---|---|---|---|---|
| Claude Opus 4.7 Coding Arena #1 at 1567 Elo. New tokenizer produces ~35% more tokens per input vs 4.6 — list price unchanged but effective cost rose. | $5.00 | $0.50 | $25.00 | 500K |
| Claude Sonnet 4 The default workhorse tier. Strong tool-use, agentic coding, and long-context reasoning at a fraction of Opus pricing. | $3.00 | $0.30 | $15.00 | 1M |
| Claude Haiku 3.5 Cheapest Anthropic tier. Suited to classification, extraction, and routing. Quality is well below Sonnet but throughput is high. | $0.80 | $0.08 | $4.00 | 200K |
All prices in USD per 1M tokens. Cached column is the per-1M rate for cache hits — 90% off list. Cache writes are 25% more expensive than list. Default cache TTL is 5 minutes.
Prompt caching: the largest single lever
Cached input savings, per 1M input tokens (Claude May 2026) Opus 4.7 list $5.00 ████████████████████ cached $0.50 ██ -90% Sonnet 4 list $3.00 ████████████ cached $0.30 ██ -90% Haiku 3.5 list $0.80 ███ cached $0.08 ▏ -90% Production teams that cache the system prompt + tool defs typically cut Claude bills 40-60% with one config change.
Anthropic's prompt caching is the single largest lever on effective Claude bill in 2026. The mechanism: mark a prefix (system prompt, tool definitions, retrieved context) as cacheable, and any subsequent call within the 5-minute TTL pays 10% of list for those tokens. The break-even on cache writes vs list is around 2 reuses, so any prompt sent more than twice within 5 minutes saves money.
Batch tier — 50% off list
| Model | Batch In /1M | Batch Out /1M | SLA |
|---|---|---|---|
| Claude Opus 4.7 | $2.50 | $12.50 | 24h |
| Claude Sonnet 4 | $1.50 | $7.50 | 24h |
| Claude Haiku 3.5 | $0.40 | $2.00 | 24h |
The Claude Message Batches API gives 50% off list for asynchronous work with a 24-hour turnaround SLA. Stack it with prompt caching for cumulative discounts: a cached batch call on Opus 4.7 effectively prices at ~$0.25 per 1M cached input, far below most synchronous frontier alternatives.
The Opus 4.7 tokenizer drift
Anthropic shipped a new tokenizer with Opus 4.7. Same English text now produces ~35% more tokens than under the Opus 4.6 tokenizer. List price is unchanged at $5/$25 per 1M, but the effective cost rose by roughly one-third on identical inputs.
Effective cost change after Opus 4.6 to 4.7 migration Workload List price Effective change System prompt $5/$25 per 1M +35% (more tokens) Long-context $5/$25 per 1M +35% (more tokens) Coding task $5/$25 per 1M +28% (code is denser) Code-only $5/$25 per 1M +18% (mostly stable)
See our per-million-tokens true cost writeup for the full hidden-adders breakdown.
Claude vs OpenAI vs Gemini vs DeepSeek
| Vendor | Model | In /1M | Out /1M | Cache | Batch |
|---|---|---|---|---|---|
| Anthropic | Claude Opus 4.7 | $5.00 | $25.00 | Yes | Yes |
| OpenAI | GPT-5.5 Pro | $30.00 | $180.00 | Yes | Yes |
| OpenAI | GPT-5.5 | $5.00 | $30.00 | Yes | Yes |
| Gemini 3.1 Pro | $3.50 | $10.50 | Yes | Yes | |
| DeepSeek | DeepSeek V4 Pro | $1.74 | $3.48 | Yes | — |
| DeepSeek | DeepSeek V4 Flash | $0.14 | $0.28 | — | — |
Output cost per 1M tokens (frontier tier, May 2026) GPT-5.5 Pro $180.00 ████████████████████████████ most expensive GPT-5.5 $30.00 █████ standard tier Claude Opus 4.7 $25.00 ████ coding leader Sonnet 4 $15.00 ██ workhorse Gemini 3.1 Pro $10.50 █▌ text Arena #1 DeepSeek V4 Pro $3.48 ▌ open weights DeepSeek V4 Flash $0.28 ▏ cheapest tier Spread between cheapest and most expensive: ~643x
What to do this quarter
- Turn on prompt caching everywhere. System prompts and tool definitions are the same on every call — they are pure cache wins. Expect 40-60% bill reduction with one config change.
- Move asynchronous workloads to batch. Anything that can wait 24 hours should be on the batch tier at 50% off. Stacks with caching.
- Re-baseline tokens on every Anthropic launch. The 4.6 to 4.7 tokenizer change added 35% to effective cost with no list-price change. Treat tokenizer churn as a price-change event.
- Use Sonnet 4 as the default, Opus 4.7 by exception. Sonnet 4 covers 70% of typical agentic and chat workloads at one-fifth the price. Reserve Opus 4.7 for hardest-task coding and reasoning where it earns its premium.
- Pair with a cheap-tier fallback. Cascade trivial traffic to DeepSeek V4 Flash or Gemini 3.1 Pro. Quality delta on simple work is rounding error; cost delta is 10-200x.
- Negotiate volume committed-spend discounts. Anthropic offers committed-use pricing above ~$50K/mo. The public list is the floor.
- Track the bill, not the list. Effective rate = bill divided by external prompts handled. That is the only number that matters at quarter end.
Related reading
- AI Model Leaderboard — quality vs price across all providers
- Per Million Tokens True Cost — the hidden adders pushing your bill 1.5-3x above list
- Token Cost Calculator — interactive cost estimator
- OpenRouter vs Swfte Pricing — enterprise procurement comparison
Teams running Claude alongside other providers typically front the API with Swfte Connect to keep the OpenAI-compatible surface, prompt caching, and per-route fallback in one place rather than re-implementing it per vendor.
Sources: official Anthropic pricing page, May 2026-05-06. Tokenizer drift figures from internal Swfte Connect telemetry on representative SaaS workload mixes.