How much does Gemini 3.1 Pro cost per 1M tokens in May 2026?

List price is $3.50 input and $10.50 output per 1M tokens. The same rate applies across the full 2M-token context window. Cached input is 75% off ($0.875). Batch tier is 50% off ($1.75/$5.25). Stacking caching on batch yields $0.4375 input / $5.25 output.

Does Gemini 3.1 Pro charge extra for long-context queries?

No. The per-1M-token rate is flat across the full 2M context window. This is the largest practical advantage over GPT-5.5 (1M cap) and Claude Opus 4.7 (1M cap) for genuinely long-context workloads — book-length summarization, repository-wide analysis, and full-archive retrieval.

How does Gemini context caching pricing work?

Gemini context caching gives 75% off list on cached input — less aggressive than Anthropic at 90% off, but better than OpenAI at 50% off. Storage carries a separate fee of roughly $1 per 1M tokens per hour, so the break-even on caching is shorter sessions where the cached content is reused multiple times within a few hours.

When is Gemini 3.1 Pro the right choice?

Long-context workloads (above 1M tokens), scientific reasoning (GPQA Diamond 94.3% leads), multimodal pipelines mixing image, audio, and text, and any case where input cost dominates the bill. The text Arena Elo of ~1500 is competitive with Opus 4.7 and GPT-5.5.

Is Gemini 3.1 Pro cheaper than Claude Opus 4.7 for coding?

Cheaper, yes — 30% on input and 58% on output. But Opus 4.7 leads the Coding Arena at 1567 Elo with a meaningful gap on SWE-bench. For pure engineering workloads, Opus is the better default; Gemini 3.1 Pro shines on science, long-context, and multimodal — and on cost per task when input tokens dominate.

Updated May 6, 2026

Gemini 3.1 Pro Cost & Pricing (May 2026)

Per-1M-token rates for every Gemini 3.1 Pro tier — standard, cached, and batch. Cost-per-task estimates for typical workloads (including a 500-page book) and a like-for-like comparison vs Claude Opus 4.7, GPT-5.5, and DeepSeek V4 Pro.

$3.50 input / 1M$10.50 output / 1M2M contextLargest frontier context

Pricing tiers — every way to buy Gemini 3.1 Pro

Tier	Input /1M	Output /1M	Notes
Standard (sync)	$3.5000	$10.50	List price for the synchronous API. Same rate up to the full 2M-token context window — no long-context surcharge.
Cached input	$0.8750	$10.50	75% off list on cached input via Gemini context caching. Storage fee applies separately ($1 per 1M tokens per hour).
Batch (24h SLA)	$1.7500	$5.25	50% off list for asynchronous workloads. Stackable with caching.
Cached + Batch	$0.4375	$5.25	Stacked discount. The cheapest way to run Gemini 3.1 Pro for repeatable async work — useful for nightly enrichment and document pipelines.

All prices in USD per 1M tokens. The standout fact: pricing is flat across the full 2M context window — no long-context surcharge as of May 2026.

Cost per task — what you actually pay

Task	In tok	Out tok	Standard	Cached	Batch
Short chat reply	800	200	$0.0049	$0.0028	$0.0024
Long-doc summary (500-page book)	800,000	3,000	$2.8315	$0.7315	$1.4158
Agentic loop (12 tool turns)	45,000	6,000	$0.2205	$0.1024	$0.1103
RAG query (10-doc context)	12,000	600	$0.0483	$0.0168	$0.0242

Cost per single invocation. The 500-page book row shows the long-context advantage clearly — under $3 for a full book on standard tier, under $0.75 cached.

Gemini 3.1 Pro vs nearest alternatives

Model	In /1M	Out /1M	Context	Note
Gemini 3.1 Pro	$3.50	$10.50	2M	This page. Text Arena leader at ~1500 Elo. GPQA Diamond 94.3%.
Claude Opus 4.7	$5.00	$25.00	1M	43% more on input, 138% more on output. Coding Arena #1 — better for engineering work.
GPT-5.5	$5.00	$30.00	1M	43% more on input, 186% more on output. Stronger ecosystem (voice, vision).
DeepSeek V4 Pro	$1.74	$3.48	1M	50% cheaper input, 67% cheaper output. Apache 2.0 — also self-hostable.

When Gemini 3.1 Pro is worth the price

Long-context workloads. 2M tokens with flat pricing is unmatched. Book-length summaries, full repository analysis, multi-document synthesis.
Scientific reasoning. GPQA Diamond 94.3% leads the field — useful for research, technical literature review, and graduate-level reasoning tasks.
Multimodal pipelines. Native handling of image, audio, video, and text in one model with consistent pricing.
Input-heavy workloads. When input dominates the bill, $3.50 per 1M is materially cheaper than competitors.

When to switch to a cheaper alternative

DeepSeek V4 Pro ($1.74 / $3.48) — 50% cheaper input, 67% cheaper output. Apache 2.0 also enables self-host. Quality gap is small for most agentic and chat work.
Gemini 2.5 Flash or DeepSeek V4 Flash — for classification, routing, and high-throughput simple tasks where Pro-tier quality is wasted.
Claude Opus 4.7 — when coding and agentic quality matter more than per-token cost. Coding Arena #1.

AI Model Leaderboard — quality vs price across all providers
Per Million Tokens True Cost — hidden adders pushing bills 1.5-3x above list
Token Cost Calculator — interactive estimator
Gemini 3.1 Pro deep-dive — full benchmarks and architecture

Teams running Gemini 3.1 Pro alongside other providers typically front the API with Swfte Connect to route across these models behind one OpenAI-compatible surface with prompt caching and per-route fallback.

Sources: official Google AI / Vertex AI pricing pages, May 2026-05-06.