Cost of Summarization: AI Model Pricing Compared (May 2026)

Summarization is the canonical long-context task: a knowledge platform takes a 30-page document and condenses it into an executive brief. We price that exact workload across every major LLM using May 2026 list pricing.

The reference scenario

  • Task: Summarize a 30K token document into a 600 token executive summary
  • Input tokens per call: 30,000
  • Output tokens per call: 600
  • Monthly volume: 5,000 summaries (mid-market knowledge platform)
  • Total tokens / month: 153M

Excludes prompt caching, batch discounts, and committed-use deals. Self-hosted open-weight models excluded.

Cost across 10 models, sorted cheapest first

RankModelPer callPer monthvs cheapest
1Gemini 2.0 Flash$0.0032$16.20
2DeepSeek V4 Flash$0.0044$21.841.3x
3Claude 3.5 Haiku$0.0264$1328.1x
4Qwen 3.6 Plus$0.0454$22714.0x
5DeepSeek V4 Pro$0.0543$27116.8x
6Claude Sonnet 4$0.0990$49530.6x
7Gemini 3.1 Pro$0.1113$55734.4x
8Claude Opus 4.7$0.1650$82550.9x
9GPT-5.5$0.1680$84051.9x
10GPT-5.5 Pro$1.01$5,040311.1x

Monthly spend at 5K summaries

Gemini 2.0 Flash       #................................... $16.20
DeepSeek V4 Flash      #................................... $21.84
Claude 3.5 Haiku       #................................... $132
Qwen 3.6 Plus          ##.................................. $227
DeepSeek V4 Pro        ##.................................. $271
Claude Sonnet 4        ####................................ $495
Gemini 3.1 Pro         ####................................ $557
Claude Opus 4.7        ######.............................. $825
GPT-5.5                ######.............................. $840
GPT-5.5 Pro            #################################### $5,040

Per-call cost (cents)

Gemini 2.0 Flash       #............................. $0.0032
DeepSeek V4 Flash      #............................. $0.0044
Claude 3.5 Haiku       #............................. $0.0264
Qwen 3.6 Plus          #............................. $0.0454
DeepSeek V4 Pro        ##............................ $0.0543
Claude Sonnet 4        ###........................... $0.0990
Gemini 3.1 Pro         ###........................... $0.1113
Claude Opus 4.7        #####......................... $0.1650
GPT-5.5                #####......................... $0.1680
GPT-5.5 Pro            ############################## $1.01

Which model wins for summarization?

For accurate summaries: Gemini 3.1 Pro is our recommended pick. The 2M context window handles 30-page documents without chunking, the multilingual fidelity is best-in-class, and at $3.50 / $10.50 per 1M tokens it sits in the middle of the pricing band — not the cheapest, but the best quality-per-dollar at long context. Runner-up: Claude Opus 4.7, which produces tighter, more stylistically-controlled prose at roughly 50% higher per-token cost.

For high-volume summarization: DeepSeek V4 Pro is the right default. It is roughly 10x cheaper than GPT-5.5 and 4x cheaper than Gemini 3.1 Pro, with summary quality close enough to frontier on standard documents that the gap rarely shows up in human evaluation. Use DeepSeek V4 Flash for the trivial 60-70% of traffic where the document is short and the summary need is shallow — sub-cent per call at any reasonable volume.

When to use a cheap model

  • Short or templated documents (under 5K tokens)
  • Internal-only summaries that humans will read once
  • Bulk pre-processing for downstream tasks (classification, retrieval)
  • Languages well-represented in the model's training (English, Mandarin, Spanish)
  • Workloads where you can A/B-test and accept a small quality drop

When to use a frontier model

  • Customer-facing summaries (legal, financial, medical)
  • Long documents (50K+ tokens) where coherence matters
  • Multilingual or low-resource languages
  • Summaries that drive a downstream decision (board memos, M&A briefs)
  • When the cost of a wrong summary >> cost of inference

The cascade pattern saves 70-80%

Most teams running production summarization at scale do not pick one model. They route: cheap model first, then a quality-check classifier decides whether to escalate to a frontier model. On a 5K/month workload the cascade typically delivers 90% of frontier-quality summaries at 20-25% of the frontier cost. We built a calculator for that on the Model-Mixing Cost Savings page.

Related

Pricing data sourced from official provider pages and OpenRouter, May 2026-05-06. Effective production cost will be 1.5-3x higher once you account for system prompts, retries, and priority-tier surcharges — see our per-million-tokens true cost breakdown.