Updated May 6, 2026

DeepSeek V4 Pro Cost & Pricing (May 2026)

Per-1M-token rates for every DeepSeek V4 Pro tier — DeepSeek API, cached input, partner hosting, and Apache 2.0 self-host. Cost-per-task estimates and a like-for-like comparison vs Claude Opus 4.7, GPT-5.5, and Gemini 3.1 Pro.

$1.74 input / 1M$3.48 output / 1M1M contextApache 2.0 — self-host viable

Pricing tiers — every way to buy or run V4 Pro

TierInput /1MOutput /1MNotes
Standard (DeepSeek API)$1.740$3.48List price on the official DeepSeek API. Lowest frontier-tier rate by a wide margin.
Cached input (DeepSeek)$0.174$3.48Context caching: 90% off list on input cache hits. Aggressive — matches Anthropic on the discount rate but at a far lower base.
Self-hosted (Apache 2.0)$0.000$0.00Per-token cost is $0 — you pay GPU infra instead. 1.6T MoE / 49B active, runs on 8x H100 or 4x H200. Operating cost dominated by GPU-hours, not tokens.
Hosted via partners$1.400$2.80Together AI, Fireworks, and others typically host at 15-25% below DeepSeek list. Quality and tool-use parity vary; benchmark before adopting.

Self-host break-even on a reserved 8x H100 node ($20-25/hr) is roughly 5-6B tokens/month at DeepSeek API list. Above that threshold, self-host wins on cost; below, the API wins.

Cost per task — what you actually pay

TaskIn tokOut tokDeepSeek APICachedPartner
Short chat reply800200$0.0021$0.0008$0.0017
Long-doc summary (50-page PDF)80,0001,500$0.1444$0.0191$0.1162
Agentic loop (12 tool turns)45,0006,000$0.0992$0.0287$0.0798
RAG query (10-doc context)12,000600$0.0230$0.0042$0.0185

Cost per single invocation. Self-hosted column is omitted because the per-task cost is essentially $0 — total cost is a function of GPU-hours, not tokens.

V4 Pro vs nearest alternatives

ModelIn /1MOut /1MContextNote
DeepSeek V4 Pro$1.74$3.481MThis page. Apache 2.0, 1.6T MoE / 49B active. Best price-per-quality of any frontier model.
Claude Opus 4.7$5.00$25.001M~3x input, ~7x output. Coding Arena #1 — pays back on engineering work, not on chat.
GPT-5.5$5.00$30.001M~3x input, ~9x output. Stronger ecosystem and voice tooling.
Gemini 3.1 Pro$3.50$10.502M2x input, 3x output. Better long-context and multimodal.
DeepSeek V4 Flash$0.14$0.281MSame family, ~12x cheaper. Apache 2.0. Use as cascade tier for trivial work.

When DeepSeek V4 Pro is the right choice

  • Cost-sensitive scale. Best price-per-quality of any frontier-tier model. At high volume, the bill delta vs closed frontier models is several multiples.
  • Sovereignty and self-host. Apache 2.0 weights make on-prem and air-gapped deployments viable. No vendor can deprecate or reprice the model out from under you.
  • Agentic loops at scale. Once tool-use reliability is acceptable for your domain, the per-loop cost advantage compounds across millions of invocations.
  • RAG, classification, extraction. Quality is indistinguishable from frontier closed models on these tasks in practice.

When to switch to a more expensive alternative

  • Claude Opus 4.7 — when coding quality and tool-use reliability on long agentic chains drive the bottom line. The Coding Arena gap is real.
  • Gemini 3.1 Pro — when you need 2M context or multimodal pipelines.
  • GPT-5.5 — when ecosystem (voice, Assistants API, vision tooling) outweighs the price delta.
  • Data-residency concerns. The official DeepSeek API processes data in PRC infrastructure. Use partner-hosted or self-hosted for regulated workloads.

Related

Teams running V4 Pro alongside other providers typically front the API with Swfte Connect to route across these models behind one OpenAI-compatible surface with prompt caching and per-route fallback.

Sources: official DeepSeek pricing page and partner pricing, May 2026-05-06. Self-host break-even modeled on reserved 8x H100 cloud rates.