Grok API pricing (May 2026)
TL;DR: Grok is xAI’s LLM family and the only frontier model with native, real-time X (Twitter) data access. Grok 4 is $5 / $15 per 1M tokens; competitive input price, lower output price than GPT-5.5. No prompt caching or batch discounts yet, so plan against headline rates.
Every Grok model and its per-token price
| Model | Input / 1M | Cached input / 1M | Output / 1M | Context | Notes |
|---|---|---|---|---|---|
| Grok 4 | $5.00 | — | $15.00 | 256K | Flagship Grok. Tool use, real-time X data, vision. |
| Grok 3 | $3.00 | — | $15.00 | 128K | Previous-generation flagship. Lower cost. |
| Grok 4 mini | $0.30 | — | $0.50 | 128K | Cheapest Grok tier. Strong cost-per-token. |
All prices in USD per 1 million tokens. Last reviewed 2026-05-15. Provider pricing pages are authoritative, confirm before contracting.
How Grok pricing actually works
Grok API pricing in 2026 is simple by frontier-provider standards, and a flat per-1M-token input and output rate, no caching tier, no batch tier. Grok 4 at $5 / $15 sits between Claude Sonnet 4 and GPT-5.5 on price. The cheaper Grok 4 mini at $0.30 / $0.50 competes with Gemini 2.5 Flash on cost but is meaningfully behind on quality.
Grok adoption is concentrated in three areas. Financial analysis and trading desks use it to pull real-time market sentiment from X. Brand and marketing intelligence teams use the X integration for live monitoring. Developer teams that already pay for X Premium / Enterprise often add Grok to their stack at no marginal cost since usage is bundled at certain X tiers.
xAI API is the newest of the major providers. Real-time X (Twitter) data is the differentiator. Limited caching / batching today.
Prompt caching: the 90% discount most teams ignore
xAI does not currently offer prompt caching. This is the largest gap versus Claude (90% off), OpenAI (75% off), and Gemini (75% off). For workloads with stable prefixes. system prompts, tool definitions, codebases, RAG corpora, Grok is meaningfully more expensive in practice than its headline rate suggests. The roadmap includes caching but no public timeline as of mid-2026.
On a typical coding agent run that re-sends the same 200K-token codebase across 10 turns, prompt caching reduces effective input cost by 80-90%. The cached-input column in the table above is the right number to plug into a production budget; the headline input rate is the "new conversation" rate, not the steady-state rate.
Batch inference, and half-price overnight
No batch tier today. xAI’s API targets interactive workloads. For latency-tolerant bulk work, the standard pattern is to route batch jobs to OpenAI, Anthropic, or Gemini through a gateway, and reserve Grok for the live-data workloads where it is uniquely valuable.
Batch + cache stack. The combined effective rate for a cache-warm, batched call is often 5-10% of the headline price. For workloads like nightly eval suites, large-scale classification, document enrichment, and synthetic data generation, batching is free money.
Four real production cost scenarios
| Workload | Detail | Headline cost | With cache | With batch |
|---|---|---|---|---|
| Real-time X analysis (Grok 4) | 500K context in, 20K out | $2.50 + $0.30 = $2.80 | — | — |
| Chat (Grok 4) | 1M in, 100K out | $5.00 + $1.50 = $6.50 | — | — |
| High-volume classification (Grok 4 mini) | 100M in, 5M out | $30 + $2.50 = $32.50 | — | — |
| Reasoning (Grok 3) | 500K in, 50K out | $1.50 + $0.75 = $2.25 | — | — |
The routing pattern that cuts Grok spend 60-80%
Production fleets using Grok typically do not default to it. The pattern is: default to Claude Sonnet 4 or GPT-5.5; route turns that include a real-time X data signal (sentiment, breaking news, account analysis) to Grok 4; fall back to Claude or GPT if Grok degrades. This treats Grok as a specialist tool rather than a general workhorse: which matches its cost / feature profile.
A typical production fleet settles into a 70/25/5 split. 70% of requests handled by the smallest competent tier, 25% by the mid-tier workhorse, 5% promoted to the flagship. Done well, this cuts model spend 60-80% versus naive single-model use without any measurable quality drop on the bulk of requests.
With an AI gateway in front, the routing rule is one config block: declare a default model, declare promotion triggers, declare a fallback to a second provider for availability. Applications keep using a single OpenAI-compatible endpoint. See Swfte for a managed runtime that bundles the gateway, observability, eval, and per-team cost ceilings.
Enterprise considerations
xAI enterprise contracts include data retention controls and dedicated support. VPC residency is not currently offered. For regulated industries, the gating factor is usually compliance posture; Grok’s certifications lag Anthropic, OpenAI, and Google. Many regulated buyers wait for those to land before approving Grok in production.
- Prompt caching: Not available — budget against the headline input rate.
- Batch inference: Not available today.
- Fine-tuning: Not currently offered.
- On-prem / VPC: Limited: cloud API only today.
- Zero data retention: Available; default on enterprise contracts.
How Grok compares to the rest of the market
Against GPT-5.5, Grok 4 is similar on input price and cheaper on output, but trails on caching, batching, and coding quality. Against Claude, Grok 4 is more expensive than Sonnet 4 and weaker on coding / agents. Against Gemini, Grok is more expensive on average but offers unique X data access. The standard placement: Grok as a specialist for real-time social-signal workloads, behind a gateway alongside a mainstream provider.
For a full side-by-side, see the API pricing index and the AI model leaderboard for quality / speed / value rankings.
Frequently asked questions about Grok API pricing
What is Grok API pricing in 2026?
Grok 4 is $5 per 1M input tokens and $15 per 1M output tokens with a 256K context window. Grok 3 (legacy) is $3 / $15. Grok 4 mini is $0.30 / $0.50. xAI does not currently offer prompt caching or batch discounts.
How does Grok compare to GPT-5.5?
Grok 4 is roughly the same input price as GPT-5.5 ($5 vs $5) but half the output price ($15 vs $30). Quality is competitive on chat and reasoning but trails on coding benchmarks. The real differentiator is real-time access to X (Twitter) data, which no other major LLM offers natively.
Is Grok 4 worth using for production?
Conditional yes. for use cases where real-time social-media signal is part of the input (financial analysis, news monitoring, brand intelligence, regulatory tracking). For general-purpose chat or coding, Claude or GPT-5.5 typically offer better quality at similar or lower cost.
Does Grok have prompt caching?
Not as of mid-2026. xAI is the newest of the major API providers and has not yet shipped prompt caching or batch inference. Budget against the headline input rate, there is no cache discount to factor in.
Can I use Grok 4 for coding?
Yes, but it is not the top pick. Grok 4 is competent on standard coding benchmarks but does not match Claude Opus 4.7 or GPT-5.5 on SWE-bench or Aider leaderboards. For pure coding, Claude or Cursor + Claude is the standard.
What is the X data access feature?
Grok models can pull live X (Twitter) data as part of their context: searches, timelines, post detail, account profiles. This is unique among frontier LLMs and is the primary reason teams add Grok to their gateway alongside Claude / GPT.
Is xAI API stable enough for production?
Yes for most workloads, but expect feature parity to lag the older providers. Caching, batching, fine-tuning, and many enterprise features are on roadmap but not GA. Production deployments typically use Grok as a secondary model behind a gateway with fallback to a more mature provider.
Run Grok on a gateway you control
Swfte routes traffic across every major provider, enforces prompt caching, applies per-team budgets, and logs every request for audit. OpenAI-compatible API. Free tier.
Free tier · SOC2 Type II · On-prem / VPC available