Updated May 15, 2026 · 8 min read

Claude API pricing (May 2026)

TL;DR: Claude runs a three-tier line in 2026, and Opus 4.7 (flagship, Arena coding #1), Sonnet 4 (workhorse, 1M context), Haiku 3.5 (cheap). Anthropic’s 90% prompt cache discount is the deepest in the market. Effective cost on cacheable workloads is often 10% of headline.

Every Claude model and its per-token price

ModelInput / 1MCached input / 1MOutput / 1MContextNotes
Claude Opus 4.7$5.00$0.500$25.00500KCoding Arena #1 at 1567 Elo. New tokenizer ~35% more tokens vs 4.6.
Claude Sonnet 4$3.00$0.300$15.001MThe production workhorse. Best long-context retrieval. 2x faster than Opus.
Claude Haiku 3.5$0.80$0.080$4.00200KCheapest Anthropic tier. Classification, routing, extraction.

All prices in USD per 1 million tokens. Last reviewed 2026-05-15. Provider pricing pages are authoritative, confirm before contracting.

How Claude pricing actually works

Claude pricing in 2026 prices the flagship aggressively against OpenAI. Opus 4.7 at $5 / $25 per 1M is roughly 1/6 the cost of GPT-5.5 Pro while leading the Arena coding leaderboard at 1567 Elo. Sonnet 4 at $3 / $15 with a 1M-token context window is the production default for most teams. Haiku 3.5 at $0.80 / $4 is the cheap tier for classification, routing, and high-volume extraction.

Claude is the default LLM for serious coding work. Claude Code authors approximately 4% of all public GitHub commits, by a wide margin the largest share of any AI coding tool. Cursor, the most popular AI IDE, uses Claude Opus 4.7 as its default in agent mode. Most Fortune 500 enterprises run Claude via AWS Bedrock or Google Vertex AI for VPC residency and procurement integration.

Anthropic offers 90% prompt cache discount, 50% batch discount, and ZDR by default for paid API. Bedrock and Vertex expose Claude with VPC residency.

Prompt caching: the 90% discount most teams ignore

Anthropic’s prompt cache is the deepest discount in the market: 90% off the cached input rate. You opt in by attaching cache_control on the message blocks you want cached (typically the system prompt, tool definitions, and any large stable prefix like a codebase or RAG corpus). The 5-minute TTL covers most agent loops; the 1-hour TTL (at 25% of input rate) covers longer-running back-office workloads. Cache hit rate above 80% is achievable on most production workloads with deliberate prompt structure.

On a typical coding agent run that re-sends the same 200K-token codebase across 10 turns, prompt caching reduces effective input cost by 80-90%. The cached-input column in the table above is the right number to plug into a production budget; the headline input rate is the "new conversation" rate, not the steady-state rate.

Batch inference, and half-price overnight

Anthropic’s Message Batches API offers 50% off both input and output for batches of up to 100,000 requests, with a 24-hour completion SLA. Combined with prompt caching, a cache-warm batched call can land under 10% of the headline price. The right workloads are nightly evaluation suites, document enrichment, content classification, and synthetic training data generation.

Batch + cache stack. The combined effective rate for a cache-warm, batched call is often 5-10% of the headline price. For workloads like nightly eval suites, large-scale classification, document enrichment, and synthetic data generation, batching is free money.

Four real production cost scenarios

WorkloadDetailHeadline costWith cacheWith batch
Coding agent (Sonnet 4)200K codebase × 10 turns × 2K out$6.00 + $0.30 = $6.30$0.60 + $0.30 = $0.90N/A; interactive
RAG chat (Sonnet 4)10M context in (90% repeat), 1M out$30.00 + $15.00 = $45.00$3.00 + $15.00 = $18.00$15.00 + $7.50 = $22.50
Hard reasoning (Opus 4.7)500K in, 50K out$2.50 + $1.25 = $3.75$0.25 + $1.25 = $1.50$1.25 + $0.625 = $1.88
Classification (Haiku 3.5)100M in, 5M out$80 + $20 = $100$8 + $20 = $28$40 + $10 = $50

The routing pattern that cuts Claude spend 60-80%

The dominant production pattern is: default to Sonnet 4; promote to Opus 4.7 on a small set of triggers (complexity score, multi-step planning intent, prompt size above 100K, more than 5 tool calls expected); fall back to Haiku for any pure-classification turn. Most fleets run 70% Sonnet / 20% Haiku / 10% Opus by request count, which lands at a similar split by spend because the cheaper tiers dominate by volume.

A typical production fleet settles into a 70/25/5 split. 70% of requests handled by the smallest competent tier, 25% by the mid-tier workhorse, 5% promoted to the flagship. Done well, this cuts model spend 60-80% versus naive single-model use without any measurable quality drop on the bulk of requests.

With an AI gateway in front, the routing rule is one config block: declare a default model, declare promotion triggers, declare a fallback to a second provider for availability. Applications keep using a single OpenAI-compatible endpoint. See Swfte for a managed runtime that bundles the gateway, observability, eval, and per-team cost ceilings.

Enterprise considerations

Claude reaches enterprise via three channels. Anthropic direct API has zero data retention by default for paid contracts. AWS Bedrock exposes Claude with regional residency, IAM-based access, and integration with AWS data services. Google Vertex AI does the same on GCP with Vertex Pipelines / Workbench integration. Most regulated industries (finance, healthcare, government) standardise on Bedrock or Vertex for procurement reasons; pricing matches direct API at scale.

  • Prompt caching: Available use it from day one; the headline rate is misleading without it.
  • Batch inference: Available, 50% discount, up to 24h SLA.
  • Fine-tuning: Not currently offered.
  • On-prem / VPC: Available via Bedrock / Vertex / Azure or direct VPC contract.
  • Zero data retention: Available; default on enterprise contracts.

How Claude compares to the rest of the market

Against OpenAI, Claude wins on coding, agent loops, and per-token price at the frontier; Opus 4.7 is roughly 1/6 the price of GPT-5.5 Pro with stronger coding benchmarks. Against Gemini, Claude wins on coding and long-form writing; Gemini wins on raw price and multimodal. Against DeepSeek, Claude has the quality edge on the hardest benchmarks but DeepSeek V4 Pro is ~3x cheaper at similar Arena Elo on bulk workloads. The standard combination is Claude for code/agents, Gemini or DeepSeek for high-volume background work.

For a full side-by-side, see the API pricing index and the AI model leaderboard for quality / speed / value rankings.

Frequently asked questions about Claude API pricing

What is Claude API pricing in 2026?

Claude Opus 4.7 is $5 per 1M input tokens and $25 per 1M output tokens. Claude Sonnet 4 is $3 / $15. Claude Haiku 3.5 is $0.80 / $4. All tiers support 90% prompt cache discount and 50% batch discount.

How does Claude Opus 4.7 pricing compare to GPT-5.5 Pro?

Claude Opus 4.7 is $5 / $25 per 1M tokens. about 1/6 the price of GPT-5.5 Pro ($30 / $180). Opus 4.7 also leads the Arena coding leaderboard at 1567 Elo. For coding and agent workloads, Opus 4.7 is the clear price-performance leader at the frontier.

Is Claude Sonnet 4 cheaper than Opus 4.7?

Yes, Sonnet 4 is $3 / $15 vs Opus 4.7 at $5 / $25. That's 40% cheaper on input and 40% cheaper on output. Sonnet 4 also has a 1M context window (vs Opus 500K) and runs roughly 2× faster. Most production teams default to Sonnet and only route to Opus for the hardest 10-20% of requests.

How does Claude prompt caching work?

Anthropic's prompt cache is opt-in via the cache_control parameter. You mark a stable prefix (system prompt, tools, codebase, RAG corpus) with cache_control: {"type": "ephemeral"} and pay the headline rate the first time. Subsequent requests within the 5-min TTL pay 10% of input for the cached prefix: a 90% discount, the deepest in the industry. The 1-hr extended cache costs 25% of input.

What is the new Opus 4.7 tokenizer issue?

Opus 4.7 ships a new tokenizer that produces roughly 35% more tokens than Opus 4.6 for the same English input. List price is unchanged but effective cost rose proportionally. For ASCII-heavy code workloads the inflation is closer to 20-25%; for natural language closer to 35-40%. Budget accordingly.

Does Claude support batch inference?

Yes; 50% off both input and output, with a 24-hour SLA. Submit up to 100,000 requests per batch via the Message Batches API. Batch + cache stack, and the combined effective rate for a cache-warm, batched call can be under 10% of headline.

How do I get Claude on AWS or Google Cloud?

AWS Bedrock and Google Vertex AI both expose Claude with VPC residency, region pinning, and procurement through the cloud's standard contract. Pricing matches Anthropic's direct API (with cloud markup that is usually rebated at volume).

Does Anthropic offer fine-tuning?

Not as of 2026. Anthropic's position is that prompt engineering, retrieval, and tool use cover most of what fine-tuning was historically used for. For sovereignty or weight-control reasons, DeepSeek V4 (open weights, Apache 2.0) is the most common fine-tuning alternative.

What about Claude's 1M context window?

Available on Claude Sonnet 4. Pricing is unchanged. $3 / $15 per 1M. Tokens beyond 200K do incur a small surcharge on most workloads in the form of slower throughput, not a higher per-token rate. For long-context RAG over entire codebases or document collections, Sonnet 4 at 1M context is the strongest price/performance option.

How can I reduce Claude API costs?

Six levers, in order of impact: (1) Default to Sonnet 4, not Opus 4.7, 40% cheaper, 2× faster, within striking distance on quality. (2) Always use prompt caching on stable prefixes: it's 90% off, the biggest discount in the market. (3) Batch any latency-tolerant workload. (4) Route classification and extraction to Haiku 3.5 at $0.80 input. (5) Set per-team budget ceilings via a gateway. (6) For the hardest 5% promote to Opus 4.7.

Run Claude on a gateway you control

Swfte routes traffic across every major provider, enforces prompt caching, applies per-team budgets, and logs every request for audit. OpenAI-compatible API. Free tier.

Free tier · SOC2 Type II · On-prem / VPC available