Updated May 15, 2026 · 8 min read

OpenAI API pricing (July 2026)

TL;DR: OpenAI runs a four-tier model line in 2026. GPT-5.5 Pro (flagship), GPT-5.5 (workhorse), GPT-5.5 mini (cheap), GPT-5 nano (cheapest). Per-1M input/output rates span 600× across the line. Automatic prompt caching at 75% off and batch inference at 50% off make the realised cost much lower than the headline rates.

Cut your OpenAI bill

Stop paying headline rates

Three ways to turn this pricing page into production savings. All start with a free Swfte account — no card.

Run OpenAI free

Sign in and prompt any OpenAI model in 30 seconds. No card. Caching and routing on day one — most teams settle 60–80% below the headline rates on this page.

Start free

Track OpenAI price moves

Email the moment OpenAI cuts a tier, ships a cheaper model, or a competitor undercuts them. One-click subscribe — the AI budget owner's passive radar.

Get price alerts

50% OFF · 6 MO

The Cost-Cutter Challenge

Send a redacted snippet of last month's OpenAI bill. We model the routing setup that would have cut it 60–80%. Biggest verified saving each month: 50% off for 6 months.

Submit your bill

One winner picked monthly · discount applies to your first paid plan · see challenge rules

Every OpenAI model and its per-token price

Model	Input / 1M	Cached input / 1M	Output / 1M	Context	Notes
GPT-5.5 Pro	$30.00	$7.500	$180.00	1M	Reasoning + multimodal flagship. AAII 59 — #1 reasoning leaderboard.
GPT-5.5	$5.00	$1.250	$30.00	256K	Default tier. Strong general-purpose with tool use + voice.
GPT-5.5 mini	$0.50	$0.125	$2.00	128K	Cheapest GPT-5 tier — good for classification, simple chat.
GPT-5 nano	$0.05	$0.013	$0.40	128K	Smallest. Routing, classification, extraction at high throughput.
GPT-4o	$2.50	$1.250	$10.00	128K	Legacy multimodal flagship. Still supported.
GPT-4o mini	$0.15	$0.075	$0.60	128K	Legacy small tier. Cheap, fast, multimodal.
o3 (reasoning)	$10.00	$2.500	$40.00	200K	Deep reasoning model — math, code, research.
gpt-oss-120b	$1.00	—	$3.00	128K	Open-weights. Self-hostable; benchmark pricing on hosted endpoints varies.

All prices in USD per 1 million tokens. Last reviewed 2026-05-15. Provider pricing pages are authoritative, confirm before contracting.

How OpenAI pricing actually works

OpenAI API pricing in 2026 follows the same shape across every model family, a per-1M-token input rate, a per-1M-token output rate (typically 4-6× input), an automatic cached-input rate at 25% of the headline, and an optional batch mode at 50% off. The flagship-to-cheapest spread is roughly 600× ($30 input on GPT-5.5 Pro down to $0.05 on GPT-5 nano), which means model routing is far more cost-relevant than negotiating volume discounts.

The OpenAI API is the most widely deployed LLM API in production. Roughly 65% of developers using AI tools touch it daily according to the 2025 Stack Overflow survey, and the API powers most of the Fortune 500's customer-facing AI products. Microsoft's Azure OpenAI variant serves the same models inside an enterprise procurement and compliance surface, which is the default in regulated industries.

OpenAI offers 50% batch discount, automatic prompt caching, and ZDR on enterprise contracts. Azure OpenAI exposes the same models with VPC residency.

Prompt caching: the 90% discount most teams ignore

OpenAI's prompt caching is automatic: no opt-in required. Any prefix of 1,024 or more tokens that has been seen in the last 5-10 minutes is served from cache at 25% of the normal input rate. The cache hit rate depends on how prompts are structured: stable content (system prompt, tools, RAG context) at the start, dynamic content at the end. Reverse that and you bust the cache on every turn.

On a typical coding agent run that re-sends the same 200K-token codebase across 10 turns, prompt caching reduces effective input cost by 80-90%. The cached-input column in the table above is the right number to plug into a production budget; the headline input rate is the "new conversation" rate, not the steady-state rate.

Batch inference, and half-price overnight

The batch API accepts a JSONL file of up to 50,000 requests and returns completions within 24 hours at 50% of normal pricing for both input and output. There is no minimum size. The right workloads are anything where a few hours of latency is fine; nightly evaluation runs, document enrichment pipelines, synthetic training data generation, large-scale classification.

Batch + cache stack. The combined effective rate for a cache-warm, batched call is often 5-10% of the headline price. For workloads like nightly eval suites, large-scale classification, document enrichment, and synthetic data generation, batching is free money.

Four real production cost scenarios

Workload	Detail	Headline cost	With cache	With batch
Chat (GPT-5.5)	1M tokens in, 100K tokens out	$5.00 + $3.00 = $8.00	$1.25 + $3.00 = $4.25	$2.50 + $1.50 = $4.00
RAG (GPT-5.5)	10M context in (80% repeat), 1M out	$50.00 + $30.00 = $80.00	$12.50 + $30.00 = $42.50	$25.00 + $15.00 = $40.00
Agent loop (GPT-5.5)	50K context × 8 turns × 1K out	~$13	~$5	N/A, and interactive
High-volume classification (GPT-5.5 mini)	100M in, 5M out (extraction)	$50 + $10 = $60	$12.50 + $10 = $22.50	$25 + $5 = $30

The routing pattern that cuts OpenAI spend 60-80%

The dominant pattern in 2026 is intent-based routing. Simple queries, and chat, classification, extraction. route to GPT-5 nano or GPT-5.5 mini at sub-cent cost. Workhorse queries route to GPT-5.5 at $5/$30. Hard-reasoning queries (math, multi-step planning, code refactoring) promote to GPT-5.5 Pro. The routing decision is usually made by a tiny classifier model running in front, or by tool-call signals from the previous turn.

A typical production fleet settles into a 70/25/5 split. 70% of requests handled by the smallest competent tier, 25% by the mid-tier workhorse, 5% promoted to the flagship. Done well, this cuts model spend 60-80% versus naive single-model use without any measurable quality drop on the bulk of requests.

With an AI gateway in front, the routing rule is one config block: declare a default model, declare promotion triggers, declare a fallback to a second provider for availability. Applications keep using a single OpenAI-compatible endpoint. See Swfte for a managed runtime that bundles the gateway, observability, eval, and per-team cost ceilings.

Enterprise considerations

Enterprise procurement for OpenAI typically goes through one of three channels. Direct OpenAI Enterprise contracts include zero data retention by default, dedicated capacity options, and a customer success motion. Azure OpenAI exposes the same models with Microsoft enterprise terms, regional residency, and integration with Entra ID / Defender. AWS Bedrock does not offer GPT-5.5, Bedrock is for Anthropic, Meta, Mistral, and AI21 models. For multi-cloud or VPC residency, route through a gateway that can address all three.

Prompt caching: Available — use it from day one; the headline rate is misleading without it.
Batch inference: Available, 50% discount, up to 24h SLA.
Fine-tuning: Supported on most tiers.
On-prem / VPC: Available via Bedrock / Vertex / Azure or direct VPC contract.
Zero data retention: Available; default on enterprise contracts.

How OpenAI compares to the rest of the market

Against Claude (Anthropic), OpenAI is broader: multimodal, voice, image generation, fine-tuning, open weights; but Claude is preferred for coding and agent loops. Against Gemini (Google), GPT-5.5 leads reasoning but Gemini wins on price (Gemini 3.0 is 4× cheaper than GPT-5.5 at comparable quality on most general workloads). Against DeepSeek, GPT-5.5 leads quality on the hardest benchmarks but DeepSeek V4 Pro is ~1/8 the cost for similar Arena Elo on bulk workloads.

For a full side-by-side, see the API pricing index and the AI model leaderboard for quality / speed / value rankings.

Frequently asked questions about OpenAI API pricing

What is the OpenAI API pricing in 2026?

GPT-5.5 Pro is $30 per 1M input tokens and $180 per 1M output tokens. The non-Pro GPT-5.5 is $5 / $30. GPT-5.5 mini is $0.50 / $2. GPT-5 nano is $0.05 / $0.40. All tiers support automatic prompt caching at 75% off and batch inference at 50% off.

How does GPT-5.5 pricing compare to GPT-4o?

GPT-5.5 ($5 / $30) is double the input price of GPT-4o ($2.50 / $10) but with substantially better reasoning, tool use, and a much larger 256K context. Most production teams have moved off GPT-4o by mid-2026 unless cost is paramount and quality is sufficient.

Is GPT-5.5 Pro worth 6× the price of GPT-5.5?

For most workloads, no. GPT-5.5 Pro shines on the hardest reasoning tasks (AAII 59, the leaderboard #1) and 1M context. The standard pattern: use GPT-5.5 as default and route to Pro only when a reasoning-difficulty signal trips. usually 5-10% of traffic.

How does OpenAI prompt caching work?

OpenAI prompt caching is automatic, any prefix of 1,024+ tokens that has been sent in the last 5-10 minutes is served from cache at 25% of the normal input rate. No code changes required. To maximise hit rate, structure prompts with stable prefixes (system prompt, tools, RAG context) before any dynamic content.

What is the batch API discount?

50% off both input and output, in exchange for a 24-hour completion SLA. Submit a JSONL file of up to 50,000 requests via the batch endpoint. Best for evaluation runs, content generation pipelines, and back-office processing where seconds-of-latency don't matter.

Does OpenAI offer fine-tuning?

Yes: on GPT-5.5 mini and most legacy models. Fine-tuning pricing is per-1M tokens of training data plus a per-1M-tokens premium on inference (typically 2-3× the base model rate). For most cases, prompt engineering + retrieval is cheaper than fine-tuning.

Can I get on-prem GPT-5.5?

Through Azure OpenAI for VPC residency on Microsoft cloud, with the same models and pricing. True on-prem (air-gapped) is not currently offered for GPT-5.5; only gpt-oss-120b and 20b have open weights for full self-hosting.

What is gpt-oss?

OpenAI's open-weights line; gpt-oss-120b (the big one) and gpt-oss-20b. Apache 2.0 licensed for commercial use. Quality lands between GPT-4o and GPT-5.5 mini at fraction of the cost on hosted endpoints, and fully self-hostable on 8× H100 or smaller hardware for the 20B variant.

What about ChatGPT Enterprise vs the API?

They're different products. ChatGPT Enterprise is a managed chat app priced per seat with included usage. The API is per-token raw access. For developer use cases, the API plus a gateway is always cheaper than per-seat ChatGPT. For end-user productivity, ChatGPT Enterprise is the right unit.

How can I reduce OpenAI API costs?

Five levers, in order of impact: (1) Route most traffic to GPT-5.5 mini or nano. (2) Turn on prompt caching and structure prompts for cache hits. (3) Batch any latency-tolerant workload for 50% off. (4) Use a multi-provider gateway to fall back to a cheaper Claude / DeepSeek / Gemini tier when quality permits. (5) Set per-team budget ceilings so runaway agents don't burn the month's budget.