DeepSeek V4 Pro Cost & Pricing (May 2026)
Per-1M-token rates for every DeepSeek V4 Pro tier — DeepSeek API, cached input, partner hosting, and Apache 2.0 self-host. Cost-per-task estimates and a like-for-like comparison vs Claude Opus 4.7, GPT-5.5, and Gemini 3.1 Pro.
Pricing tiers — every way to buy or run V4 Pro
| Tier | Input /1M | Output /1M | Notes |
|---|---|---|---|
| Standard (DeepSeek API) | $1.740 | $3.48 | List price on the official DeepSeek API. Lowest frontier-tier rate by a wide margin. |
| Cached input (DeepSeek) | $0.174 | $3.48 | Context caching: 90% off list on input cache hits. Aggressive — matches Anthropic on the discount rate but at a far lower base. |
| Self-hosted (Apache 2.0) | $0.000 | $0.00 | Per-token cost is $0 — you pay GPU infra instead. 1.6T MoE / 49B active, runs on 8x H100 or 4x H200. Operating cost dominated by GPU-hours, not tokens. |
| Hosted via partners | $1.400 | $2.80 | Together AI, Fireworks, and others typically host at 15-25% below DeepSeek list. Quality and tool-use parity vary; benchmark before adopting. |
Self-host break-even on a reserved 8x H100 node ($20-25/hr) is roughly 5-6B tokens/month at DeepSeek API list. Above that threshold, self-host wins on cost; below, the API wins.
Cost per task — what you actually pay
| Task | In tok | Out tok | DeepSeek API | Cached | Partner |
|---|---|---|---|---|---|
| Short chat reply | 800 | 200 | $0.0021 | $0.0008 | $0.0017 |
| Long-doc summary (50-page PDF) | 80,000 | 1,500 | $0.1444 | $0.0191 | $0.1162 |
| Agentic loop (12 tool turns) | 45,000 | 6,000 | $0.0992 | $0.0287 | $0.0798 |
| RAG query (10-doc context) | 12,000 | 600 | $0.0230 | $0.0042 | $0.0185 |
Cost per single invocation. Self-hosted column is omitted because the per-task cost is essentially $0 — total cost is a function of GPU-hours, not tokens.
V4 Pro vs nearest alternatives
| Model | In /1M | Out /1M | Context | Note |
|---|---|---|---|---|
| DeepSeek V4 Pro | $1.74 | $3.48 | 1M | This page. Apache 2.0, 1.6T MoE / 49B active. Best price-per-quality of any frontier model. |
| Claude Opus 4.7 | $5.00 | $25.00 | 1M | ~3x input, ~7x output. Coding Arena #1 — pays back on engineering work, not on chat. |
| GPT-5.5 | $5.00 | $30.00 | 1M | ~3x input, ~9x output. Stronger ecosystem and voice tooling. |
| Gemini 3.1 Pro | $3.50 | $10.50 | 2M | 2x input, 3x output. Better long-context and multimodal. |
| DeepSeek V4 Flash | $0.14 | $0.28 | 1M | Same family, ~12x cheaper. Apache 2.0. Use as cascade tier for trivial work. |
When DeepSeek V4 Pro is the right choice
- Cost-sensitive scale. Best price-per-quality of any frontier-tier model. At high volume, the bill delta vs closed frontier models is several multiples.
- Sovereignty and self-host. Apache 2.0 weights make on-prem and air-gapped deployments viable. No vendor can deprecate or reprice the model out from under you.
- Agentic loops at scale. Once tool-use reliability is acceptable for your domain, the per-loop cost advantage compounds across millions of invocations.
- RAG, classification, extraction. Quality is indistinguishable from frontier closed models on these tasks in practice.
When to switch to a more expensive alternative
- Claude Opus 4.7 — when coding quality and tool-use reliability on long agentic chains drive the bottom line. The Coding Arena gap is real.
- Gemini 3.1 Pro — when you need 2M context or multimodal pipelines.
- GPT-5.5 — when ecosystem (voice, Assistants API, vision tooling) outweighs the price delta.
- Data-residency concerns. The official DeepSeek API processes data in PRC infrastructure. Use partner-hosted or self-hosted for regulated workloads.
Related
- AI Model Leaderboard — quality vs price across all providers
- Per Million Tokens True Cost — hidden adders pushing bills 1.5-3x above list
- Token Cost Calculator — interactive estimator
- DeepSeek V4 Pro deep-dive — full benchmarks and architecture
Teams running V4 Pro alongside other providers typically front the API with Swfte Connect to route across these models behind one OpenAI-compatible surface with prompt caching and per-route fallback.
Sources: official DeepSeek pricing page and partner pricing, May 2026-05-06. Self-host break-even modeled on reserved 8x H100 cloud rates.