Updated May 15, 2026 · 7 min read

Grok API pricing (July 2026)

TL;DR: Grok is xAI’s LLM family and the only frontier model with native, real-time X (Twitter) data access. Grok 4 is $5 / $15 per 1M tokens; competitive input price, lower output price than GPT-5.5. No prompt caching or batch discounts yet, so plan against headline rates.

Cut your Grok bill

Stop paying headline rates

Three ways to turn this pricing page into production savings. All start with a free Swfte account — no card.

Run Grok free

Sign in and prompt any Grok model in 30 seconds. No card. Caching and routing on day one — most teams settle 60–80% below the headline rates on this page.

Start free

Track Grok price moves

Email the moment Grok cuts a tier, ships a cheaper model, or a competitor undercuts them. One-click subscribe — the AI budget owner's passive radar.

Get price alerts

50% OFF · 6 MO

The Cost-Cutter Challenge

Send a redacted snippet of last month's Grok bill. We model the routing setup that would have cut it 60–80%. Biggest verified saving each month: 50% off for 6 months.

Submit your bill

One winner picked monthly · discount applies to your first paid plan · see challenge rules

Every Grok model and its per-token price

Model	Input / 1M	Cached input / 1M	Output / 1M	Context	Notes
Grok 4	$5.00	—	$15.00	256K	Flagship Grok. Tool use, real-time X data, vision.
Grok 3	$3.00	—	$15.00	128K	Previous-generation flagship. Lower cost.
Grok 4 mini	$0.30	—	$0.50	128K	Cheapest Grok tier. Strong cost-per-token.

All prices in USD per 1 million tokens. Last reviewed 2026-05-15. Provider pricing pages are authoritative, confirm before contracting.

How Grok pricing actually works

Grok API pricing in 2026 is simple by frontier-provider standards, and a flat per-1M-token input and output rate, no caching tier, no batch tier. Grok 4 at $5 / $15 sits between Claude Sonnet 4 and GPT-5.5 on price. The cheaper Grok 4 mini at $0.30 / $0.50 competes with Gemini 2.5 Flash on cost but is meaningfully behind on quality.

Grok adoption is concentrated in three areas. Financial analysis and trading desks use it to pull real-time market sentiment from X. Brand and marketing intelligence teams use the X integration for live monitoring. Developer teams that already pay for X Premium / Enterprise often add Grok to their stack at no marginal cost since usage is bundled at certain X tiers.

xAI API is the newest of the major providers. Real-time X (Twitter) data is the differentiator. Limited caching / batching today.

Prompt caching: the 90% discount most teams ignore

xAI does not currently offer prompt caching. This is the largest gap versus Claude (90% off), OpenAI (75% off), and Gemini (75% off). For workloads with stable prefixes. system prompts, tool definitions, codebases, RAG corpora, Grok is meaningfully more expensive in practice than its headline rate suggests. The roadmap includes caching but no public timeline as of mid-2026.

On a typical coding agent run that re-sends the same 200K-token codebase across 10 turns, prompt caching reduces effective input cost by 80-90%. The cached-input column in the table above is the right number to plug into a production budget; the headline input rate is the "new conversation" rate, not the steady-state rate.

Batch inference, and half-price overnight

No batch tier today. xAI’s API targets interactive workloads. For latency-tolerant bulk work, the standard pattern is to route batch jobs to OpenAI, Anthropic, or Gemini through a gateway, and reserve Grok for the live-data workloads where it is uniquely valuable.

Batch + cache stack. The combined effective rate for a cache-warm, batched call is often 5-10% of the headline price. For workloads like nightly eval suites, large-scale classification, document enrichment, and synthetic data generation, batching is free money.

Four real production cost scenarios

Workload	Detail	Headline cost	With cache	With batch
Real-time X analysis (Grok 4)	500K context in, 20K out	$2.50 + $0.30 = $2.80	—	—
Chat (Grok 4)	1M in, 100K out	$5.00 + $1.50 = $6.50	—	—
High-volume classification (Grok 4 mini)	100M in, 5M out	$30 + $2.50 = $32.50	—	—
Reasoning (Grok 3)	500K in, 50K out	$1.50 + $0.75 = $2.25	—	—

The routing pattern that cuts Grok spend 60-80%

Production fleets using Grok typically do not default to it. The pattern is: default to Claude Sonnet 4 or GPT-5.5; route turns that include a real-time X data signal (sentiment, breaking news, account analysis) to Grok 4; fall back to Claude or GPT if Grok degrades. This treats Grok as a specialist tool rather than a general workhorse: which matches its cost / feature profile.

A typical production fleet settles into a 70/25/5 split. 70% of requests handled by the smallest competent tier, 25% by the mid-tier workhorse, 5% promoted to the flagship. Done well, this cuts model spend 60-80% versus naive single-model use without any measurable quality drop on the bulk of requests.

With an AI gateway in front, the routing rule is one config block: declare a default model, declare promotion triggers, declare a fallback to a second provider for availability. Applications keep using a single OpenAI-compatible endpoint. See Swfte for a managed runtime that bundles the gateway, observability, eval, and per-team cost ceilings.

Enterprise considerations

xAI enterprise contracts include data retention controls and dedicated support. VPC residency is not currently offered. For regulated industries, the gating factor is usually compliance posture; Grok’s certifications lag Anthropic, OpenAI, and Google. Many regulated buyers wait for those to land before approving Grok in production.

Prompt caching: Not available — budget against the headline input rate.
Batch inference: Not available today.
Fine-tuning: Not currently offered.
On-prem / VPC: Limited: cloud API only today.
Zero data retention: Available; default on enterprise contracts.

How Grok compares to the rest of the market

Against GPT-5.5, Grok 4 is similar on input price and cheaper on output, but trails on caching, batching, and coding quality. Against Claude, Grok 4 is more expensive than Sonnet 4 and weaker on coding / agents. Against Gemini, Grok is more expensive on average but offers unique X data access. The standard placement: Grok as a specialist for real-time social-signal workloads, behind a gateway alongside a mainstream provider.

For a full side-by-side, see the API pricing index and the AI model leaderboard for quality / speed / value rankings.

Frequently asked questions about Grok API pricing

What is Grok API pricing in 2026?

Grok 4 is $5 per 1M input tokens and $15 per 1M output tokens with a 256K context window. Grok 3 (legacy) is $3 / $15. Grok 4 mini is $0.30 / $0.50. xAI does not currently offer prompt caching or batch discounts.

How does Grok compare to GPT-5.5?

Grok 4 is roughly the same input price as GPT-5.5 ($5 vs $5) but half the output price ($15 vs $30). Quality is competitive on chat and reasoning but trails on coding benchmarks. The real differentiator is real-time access to X (Twitter) data, which no other major LLM offers natively.

Is Grok 4 worth using for production?

Conditional yes. for use cases where real-time social-media signal is part of the input (financial analysis, news monitoring, brand intelligence, regulatory tracking). For general-purpose chat or coding, Claude or GPT-5.5 typically offer better quality at similar or lower cost.

Does Grok have prompt caching?

Not as of mid-2026. xAI is the newest of the major API providers and has not yet shipped prompt caching or batch inference. Budget against the headline input rate, there is no cache discount to factor in.

Can I use Grok 4 for coding?

Yes, but it is not the top pick. Grok 4 is competent on standard coding benchmarks but does not match Claude Opus 4.7 or GPT-5.5 on SWE-bench or Aider leaderboards. For pure coding, Claude or Cursor + Claude is the standard.

What is the X data access feature?

Grok models can pull live X (Twitter) data as part of their context: searches, timelines, post detail, account profiles. This is unique among frontier LLMs and is the primary reason teams add Grok to their gateway alongside Claude / GPT.

Is xAI API stable enough for production?

Yes for most workloads, but expect feature parity to lag the older providers. Caching, batching, fine-tuning, and many enterprise features are on roadmap but not GA. Production deployments typically use Grok as a secondary model behind a gateway with fallback to a more mature provider.