Updated May 15, 2026 · 8 min read

Claude API pricing (July 2026)

TL;DR: Claude runs a three-tier line in 2026, and Opus 4.7 (flagship, Arena coding #1), Sonnet 4 (workhorse, 1M context), Haiku 3.5 (cheap). Anthropic’s 90% prompt cache discount is the deepest in the market. Effective cost on cacheable workloads is often 10% of headline.

Cut your Claude bill

Stop paying headline rates

Three ways to turn this pricing page into production savings. All start with a free Swfte account — no card.

Run Claude free

Sign in and prompt any Claude model in 30 seconds. No card. Caching and routing on day one — most teams settle 60–80% below the headline rates on this page.

Start free

Track Claude price moves

Email the moment Claude cuts a tier, ships a cheaper model, or a competitor undercuts them. One-click subscribe — the AI budget owner's passive radar.

Get price alerts

50% OFF · 6 MO

The Cost-Cutter Challenge

Send a redacted snippet of last month's Claude bill. We model the routing setup that would have cut it 60–80%. Biggest verified saving each month: 50% off for 6 months.

Submit your bill

One winner picked monthly · discount applies to your first paid plan · see challenge rules

Every Claude model and its per-token price

Model	Input / 1M	Cached input / 1M	Output / 1M	Context	Notes
Claude Opus 4.7	$5.00	$0.500	$25.00	500K	Coding Arena #1 at 1567 Elo. New tokenizer ~35% more tokens vs 4.6.
Claude Sonnet 4	$3.00	$0.300	$15.00	1M	The production workhorse. Best long-context retrieval. 2x faster than Opus.
Claude Haiku 3.5	$0.80	$0.080	$4.00	200K	Cheapest Anthropic tier. Classification, routing, extraction.

All prices in USD per 1 million tokens. Last reviewed 2026-05-15. Provider pricing pages are authoritative, confirm before contracting.

How Claude pricing actually works

Claude pricing in 2026 prices the flagship aggressively against OpenAI. Opus 4.7 at $5 / $25 per 1M is roughly 1/6 the cost of GPT-5.5 Pro while leading the Arena coding leaderboard at 1567 Elo. Sonnet 4 at $3 / $15 with a 1M-token context window is the production default for most teams. Haiku 3.5 at $0.80 / $4 is the cheap tier for classification, routing, and high-volume extraction.

Claude is the default LLM for serious coding work. Claude Code authors approximately 4% of all public GitHub commits, by a wide margin the largest share of any AI coding tool. Cursor, the most popular AI IDE, uses Claude Opus 4.7 as its default in agent mode. Most Fortune 500 enterprises run Claude via AWS Bedrock or Google Vertex AI for VPC residency and procurement integration.

Anthropic offers 90% prompt cache discount, 50% batch discount, and ZDR by default for paid API. Bedrock and Vertex expose Claude with VPC residency.

Prompt caching: the 90% discount most teams ignore

Anthropic’s prompt cache is the deepest discount in the market: 90% off the cached input rate. You opt in by attaching cache_control on the message blocks you want cached (typically the system prompt, tool definitions, and any large stable prefix like a codebase or RAG corpus). The 5-minute TTL covers most agent loops; the 1-hour TTL (at 25% of input rate) covers longer-running back-office workloads. Cache hit rate above 80% is achievable on most production workloads with deliberate prompt structure.

On a typical coding agent run that re-sends the same 200K-token codebase across 10 turns, prompt caching reduces effective input cost by 80-90%. The cached-input column in the table above is the right number to plug into a production budget; the headline input rate is the "new conversation" rate, not the steady-state rate.

Batch inference, and half-price overnight

Anthropic’s Message Batches API offers 50% off both input and output for batches of up to 100,000 requests, with a 24-hour completion SLA. Combined with prompt caching, a cache-warm batched call can land under 10% of the headline price. The right workloads are nightly evaluation suites, document enrichment, content classification, and synthetic training data generation.

Batch + cache stack. The combined effective rate for a cache-warm, batched call is often 5-10% of the headline price. For workloads like nightly eval suites, large-scale classification, document enrichment, and synthetic data generation, batching is free money.

Four real production cost scenarios

Workload	Detail	Headline cost	With cache	With batch
Coding agent (Sonnet 4)	200K codebase × 10 turns × 2K out	$6.00 + $0.30 = $6.30	$0.60 + $0.30 = $0.90	N/A; interactive
RAG chat (Sonnet 4)	10M context in (90% repeat), 1M out	$30.00 + $15.00 = $45.00	$3.00 + $15.00 = $18.00	$15.00 + $7.50 = $22.50
Hard reasoning (Opus 4.7)	500K in, 50K out	$2.50 + $1.25 = $3.75	$0.25 + $1.25 = $1.50	$1.25 + $0.625 = $1.88
Classification (Haiku 3.5)	100M in, 5M out	$80 + $20 = $100	$8 + $20 = $28	$40 + $10 = $50

The routing pattern that cuts Claude spend 60-80%

The dominant production pattern is: default to Sonnet 4; promote to Opus 4.7 on a small set of triggers (complexity score, multi-step planning intent, prompt size above 100K, more than 5 tool calls expected); fall back to Haiku for any pure-classification turn. Most fleets run 70% Sonnet / 20% Haiku / 10% Opus by request count, which lands at a similar split by spend because the cheaper tiers dominate by volume.

A typical production fleet settles into a 70/25/5 split. 70% of requests handled by the smallest competent tier, 25% by the mid-tier workhorse, 5% promoted to the flagship. Done well, this cuts model spend 60-80% versus naive single-model use without any measurable quality drop on the bulk of requests.

With an AI gateway in front, the routing rule is one config block: declare a default model, declare promotion triggers, declare a fallback to a second provider for availability. Applications keep using a single OpenAI-compatible endpoint. See Swfte for a managed runtime that bundles the gateway, observability, eval, and per-team cost ceilings.

Enterprise considerations

Claude reaches enterprise via three channels. Anthropic direct API has zero data retention by default for paid contracts. AWS Bedrock exposes Claude with regional residency, IAM-based access, and integration with AWS data services. Google Vertex AI does the same on GCP with Vertex Pipelines / Workbench integration. Most regulated industries (finance, healthcare, government) standardise on Bedrock or Vertex for procurement reasons; pricing matches direct API at scale.

Prompt caching: Available — use it from day one; the headline rate is misleading without it.
Batch inference: Available, 50% discount, up to 24h SLA.
Fine-tuning: Not currently offered.
On-prem / VPC: Available via Bedrock / Vertex / Azure or direct VPC contract.
Zero data retention: Available; default on enterprise contracts.

How Claude compares to the rest of the market

Against OpenAI, Claude wins on coding, agent loops, and per-token price at the frontier; Opus 4.7 is roughly 1/6 the price of GPT-5.5 Pro with stronger coding benchmarks. Against Gemini, Claude wins on coding and long-form writing; Gemini wins on raw price and multimodal. Against DeepSeek, Claude has the quality edge on the hardest benchmarks but DeepSeek V4 Pro is ~3x cheaper at similar Arena Elo on bulk workloads. The standard combination is Claude for code/agents, Gemini or DeepSeek for high-volume background work.

For a full side-by-side, see the API pricing index and the AI model leaderboard for quality / speed / value rankings.

Frequently asked questions about Claude API pricing

What is Claude API pricing in 2026?

Claude Opus 4.7 is $5 per 1M input tokens and $25 per 1M output tokens. Claude Sonnet 4 is $3 / $15. Claude Haiku 3.5 is $0.80 / $4. All tiers support 90% prompt cache discount and 50% batch discount.

How does Claude Opus 4.7 pricing compare to GPT-5.5 Pro?

Claude Opus 4.7 is $5 / $25 per 1M tokens. about 1/6 the price of GPT-5.5 Pro ($30 / $180). Opus 4.7 also leads the Arena coding leaderboard at 1567 Elo. For coding and agent workloads, Opus 4.7 is the clear price-performance leader at the frontier.

Is Claude Sonnet 4 cheaper than Opus 4.7?

Yes, Sonnet 4 is $3 / $15 vs Opus 4.7 at $5 / $25. That's 40% cheaper on input and 40% cheaper on output. Sonnet 4 also has a 1M context window (vs Opus 500K) and runs roughly 2× faster. Most production teams default to Sonnet and only route to Opus for the hardest 10-20% of requests.

How does Claude prompt caching work?

Anthropic's prompt cache is opt-in via the cache_control parameter. You mark a stable prefix (system prompt, tools, codebase, RAG corpus) with cache_control: {"type": "ephemeral"} and pay the headline rate the first time. Subsequent requests within the 5-min TTL pay 10% of input for the cached prefix: a 90% discount, the deepest in the industry. The 1-hr extended cache costs 25% of input.

What is the new Opus 4.7 tokenizer issue?

Opus 4.7 ships a new tokenizer that produces roughly 35% more tokens than Opus 4.6 for the same English input. List price is unchanged but effective cost rose proportionally. For ASCII-heavy code workloads the inflation is closer to 20-25%; for natural language closer to 35-40%. Budget accordingly.

Does Claude support batch inference?

Yes; 50% off both input and output, with a 24-hour SLA. Submit up to 100,000 requests per batch via the Message Batches API. Batch + cache stack, and the combined effective rate for a cache-warm, batched call can be under 10% of headline.

How do I get Claude on AWS or Google Cloud?

AWS Bedrock and Google Vertex AI both expose Claude with VPC residency, region pinning, and procurement through the cloud's standard contract. Pricing matches Anthropic's direct API (with cloud markup that is usually rebated at volume).

Does Anthropic offer fine-tuning?

Not as of 2026. Anthropic's position is that prompt engineering, retrieval, and tool use cover most of what fine-tuning was historically used for. For sovereignty or weight-control reasons, DeepSeek V4 (open weights, Apache 2.0) is the most common fine-tuning alternative.

What about Claude's 1M context window?

Available on Claude Sonnet 4. Pricing is unchanged. $3 / $15 per 1M. Tokens beyond 200K do incur a small surcharge on most workloads in the form of slower throughput, not a higher per-token rate. For long-context RAG over entire codebases or document collections, Sonnet 4 at 1M context is the strongest price/performance option.

How can I reduce Claude API costs?

Six levers, in order of impact: (1) Default to Sonnet 4, not Opus 4.7, 40% cheaper, 2× faster, within striking distance on quality. (2) Always use prompt caching on stable prefixes: it's 90% off, the biggest discount in the market. (3) Batch any latency-tolerant workload. (4) Route classification and extraction to Haiku 3.5 at $0.80 input. (5) Set per-team budget ceilings via a gateway. (6) For the hardest 5% promote to Opus 4.7.