Updated May 6, 2026

Claude API Pricing — May 2026

Definitive Claude API pricing reference. Per-1M-token rates for Opus 4.7, Sonnet 4, and Haiku 3.5, plus cached-input pricing, batch tier discounts, the Opus 4.7 tokenizer drift, and a clean comparison vs OpenAI, Gemini, and DeepSeek.

Claude model line, list pricing

ModelInput /1MCached /1MOutput /1MContext
Claude Opus 4.7

Coding Arena #1 at 1567 Elo. New tokenizer produces ~35% more tokens per input vs 4.6 — list price unchanged but effective cost rose.

$5.00$0.50$25.00500K
Claude Sonnet 4

The default workhorse tier. Strong tool-use, agentic coding, and long-context reasoning at a fraction of Opus pricing.

$3.00$0.30$15.001M
Claude Haiku 3.5

Cheapest Anthropic tier. Suited to classification, extraction, and routing. Quality is well below Sonnet but throughput is high.

$0.80$0.08$4.00200K

All prices in USD per 1M tokens. Cached column is the per-1M rate for cache hits — 90% off list. Cache writes are 25% more expensive than list. Default cache TTL is 5 minutes.

Prompt caching: the largest single lever

Cached input savings, per 1M input tokens (Claude May 2026)

  Opus 4.7    list $5.00   ████████████████████   cached $0.50  ██   -90%
  Sonnet 4    list $3.00   ████████████           cached $0.30  ██   -90%
  Haiku 3.5   list $0.80   ███                    cached $0.08  ▏    -90%

Production teams that cache the system prompt + tool defs typically
cut Claude bills 40-60% with one config change.

Anthropic's prompt caching is the single largest lever on effective Claude bill in 2026. The mechanism: mark a prefix (system prompt, tool definitions, retrieved context) as cacheable, and any subsequent call within the 5-minute TTL pays 10% of list for those tokens. The break-even on cache writes vs list is around 2 reuses, so any prompt sent more than twice within 5 minutes saves money.

Batch tier — 50% off list

ModelBatch In /1MBatch Out /1MSLA
Claude Opus 4.7$2.50$12.5024h
Claude Sonnet 4$1.50$7.5024h
Claude Haiku 3.5$0.40$2.0024h

The Claude Message Batches API gives 50% off list for asynchronous work with a 24-hour turnaround SLA. Stack it with prompt caching for cumulative discounts: a cached batch call on Opus 4.7 effectively prices at ~$0.25 per 1M cached input, far below most synchronous frontier alternatives.

The Opus 4.7 tokenizer drift

Anthropic shipped a new tokenizer with Opus 4.7. Same English text now produces ~35% more tokens than under the Opus 4.6 tokenizer. List price is unchanged at $5/$25 per 1M, but the effective cost rose by roughly one-third on identical inputs.

Effective cost change after Opus 4.6 to 4.7 migration

  Workload          List price           Effective change
  System prompt     $5/$25 per 1M        +35% (more tokens)
  Long-context      $5/$25 per 1M        +35% (more tokens)
  Coding task       $5/$25 per 1M        +28% (code is denser)
  Code-only         $5/$25 per 1M        +18% (mostly stable)

See our per-million-tokens true cost writeup for the full hidden-adders breakdown.

Claude vs OpenAI vs Gemini vs DeepSeek

VendorModelIn /1MOut /1MCacheBatch
AnthropicClaude Opus 4.7$5.00$25.00YesYes
OpenAIGPT-5.5 Pro$30.00$180.00YesYes
OpenAIGPT-5.5$5.00$30.00YesYes
GoogleGemini 3.1 Pro$3.50$10.50YesYes
DeepSeekDeepSeek V4 Pro$1.74$3.48Yes
DeepSeekDeepSeek V4 Flash$0.14$0.28
Output cost per 1M tokens (frontier tier, May 2026)

  GPT-5.5 Pro       $180.00  ████████████████████████████  most expensive
  GPT-5.5            $30.00  █████                         standard tier
  Claude Opus 4.7    $25.00  ████                          coding leader
  Sonnet 4           $15.00  ██                            workhorse
  Gemini 3.1 Pro     $10.50  █▌                            text Arena #1
  DeepSeek V4 Pro     $3.48  ▌                             open weights
  DeepSeek V4 Flash   $0.28  ▏                             cheapest tier

Spread between cheapest and most expensive: ~643x

What to do this quarter

  1. Turn on prompt caching everywhere. System prompts and tool definitions are the same on every call — they are pure cache wins. Expect 40-60% bill reduction with one config change.
  2. Move asynchronous workloads to batch. Anything that can wait 24 hours should be on the batch tier at 50% off. Stacks with caching.
  3. Re-baseline tokens on every Anthropic launch. The 4.6 to 4.7 tokenizer change added 35% to effective cost with no list-price change. Treat tokenizer churn as a price-change event.
  4. Use Sonnet 4 as the default, Opus 4.7 by exception. Sonnet 4 covers 70% of typical agentic and chat workloads at one-fifth the price. Reserve Opus 4.7 for hardest-task coding and reasoning where it earns its premium.
  5. Pair with a cheap-tier fallback. Cascade trivial traffic to DeepSeek V4 Flash or Gemini 3.1 Pro. Quality delta on simple work is rounding error; cost delta is 10-200x.
  6. Negotiate volume committed-spend discounts. Anthropic offers committed-use pricing above ~$50K/mo. The public list is the floor.
  7. Track the bill, not the list. Effective rate = bill divided by external prompts handled. That is the only number that matters at quarter end.

Related reading

Teams running Claude alongside other providers typically front the API with Swfte Connect to keep the OpenAI-compatible surface, prompt caching, and per-route fallback in one place rather than re-implementing it per vendor.

Sources: official Anthropic pricing page, May 2026-05-06. Tokenizer drift figures from internal Swfte Connect telemetry on representative SaaS workload mixes.