SMQTS v1.3 · Pinned 2026-04-24

DeepSeek V4 Pro — Deep Dive Research Report (May 2026)

The quality-per-dollar leader. Apache 2.0. Frontier-adjacent quality at one-tenth the cost.

Download research report (.md)

Model Snapshot

Released

2026-04-24

License

Apache 2.0

Context

1M tokens

Parameters

1.6T MoE / 49B active

Input price

$1.74 / 1M

Output price

$3.48 / 1M

Arena Elo

1462

Self-host

Yes

Executive Summary

DeepSeek V4 Pro is the most consequential release of April 2026. At $1.74/$3.48 per 1M tokens with Apache 2.0 weights, it collapses the price floor for frontier-adjacent quality work. On 7 of 20 SMQTS categories it is roughly substitutable for closed-frontier models in blind pairwise comparison; on the other 13 it loses, but most of those losses do not matter for most production workloads. For 70% of typical traffic, V4 Pro is the right answer. For the remaining 30%, the gap to closed-frontier is real — and we list exactly which 30%.

Three strengths

  1. Quality-per-dollar leader. No other model comes close. Roughly 7-10x cheaper than GPT-5.5 for similar Arena Elo bands.
  2. Apache 2.0 weights. Self-hostable, no vendor lock-in beyond operational cost.
  3. Strong on N4 (extraction) and N2 (summarization). Roughly substitutable for frontier on most extraction and summarization workloads in pairwise blind tests.

Three weaknesses

  1. Multi-file code refactor. Single-file edits are good; cross-file consistency degrades meaningfully.
  2. Hard reasoning. 12-point gap to Gemini 3.1 Pro on N3.
  3. Tool-call schema compliance. 18-point gap to GPT-5.5 on P10. Agent loops require more recovery round-trips.

Architecture and Training

  • Mixture-of-experts: 1.6T total parameters, 49B active per token. ~256 experts, 8 active per token.
  • Native 1M context, with the expected decay past ~700K. Long-context retrieval accuracy is competitive with Gemini up to ~500K but falls behind past that.
  • Trained on ~14T tokens (community estimate from the technical report) with heavy code and math weighting, plus the new V4 emphasis on multilingual non-English data.
  • Apache 2.0 for both model weights and inference code. Commercial use unrestricted.
  • Tokenizer: SentencePiece-based, ~152K vocabulary. Compresses code roughly 8% better than cl100k_base.

Pricing Reality

PathInput ($/1M)Output ($/1M)Notes
DeepSeek API direct$1.74$3.48Lowest list price
DeepSeek cached input$0.17$3.4810x off cached input
Together AI hosted$2.00$4.00US-based; small premium
Fireworks AI hosted$1.95$3.85US-based
Self-host (rough est)$0.50-1.20$1.20-2.508x H200 / 4x B200; depends on utilization

The honest comparison vs Opus 4.7. Opus 4.7 standard output is $25 per 1M; V4 Pro is $3.48. On a workload where they are quality-equivalent (per the cost-quality section), the spend ratio is about 7.2x. Even when V4 Pro loses some output quality and you re-route 30% of traffic to Opus, the blended monthly bill is still roughly 4-5x cheaper than running 100% on Opus.

SMQTS Results — Programming Series

CategoryDeepSeek V4 ProOpus 4.7GPT-5.5Gemini 3.1 Pro
P1 Multi-file refactor74948683
P2 Bug-finding from stack trace78928784
P3 Code review76918885
P4 Test generation77899083
P5 SQL from natural language82878991
P6 Algorithm from spec79938988
P7 Migration scripts71928380
P8 Documentation78908885
P9 Diff comprehension76918683
P10 Tool-using agent loops74899285
Average76.591.287.684.6

V4 Pro never wins a programming category outright. Its best-relative is P5 (SQL) where it lands within 7-9 points of the leaders — defensible substitution. Its worst-relative is P7 (migration scripts) at 21 points behind Opus 4.7 — do not substitute here.

SMQTS Results — Non-Programming Series

CategoryDeepSeek V4 ProOpus 4.7Gemini 3.1 Pro
N1 Long-form drafting838789
N2 Summarization869190
N3 Multi-step reasoning828394
N4 Information extraction858987
N5 Translation787692
N6 Style transfer829087
N7 Adversarial resistance789288
N8 Structured output838788
N9 Domain QA839089
N10 Multi-turn coherence809189
Average82.087.689.3

Quality-per-dollar headline (output)

Cost to deliver 79.2 quality-blend score per 1M output tokens
================================================================
DeepSeek V4 Pro     $3.48   ###
Gemini 3.1 Pro     $10.50   ##########
Claude Opus 4.7    $25.00   #########################
GPT-5.5            $30.00   ##############################
GPT-5.5 Pro       $180.00   ##################################################

SMQTS Results — Cost-Quality Validation

The most important section of this report. Pairwise blind grading of V4 Pro against frontier models across the 50-prompt cost-quality sample:

WorkloadV4 Pro winsFrontier winsTieVerdict
Information extraction (N4) vs GPT-5.534%22%44%Substitute
Summarization (N2) vs Opus 4.722%38%40%Substitute
SQL from NL (P5) vs Gemini 3.1 Pro32%34%34%Substitute
Multi-file refactor (P1) vs Opus 4.711%71%18%Do not substitute
Hard reasoning (N3) vs Gemini 3.1 Pro14%61%25%Do not substitute
Tool loops (P10) vs GPT-5.513%67%20%Do not substitute

Reading: a workload tagged Substitute is one where V4 Pro ties or wins more than 60% of pairwise blind comparisons. Do not substitute means the frontier model wins more than 60% of the time. The cost gap is large enough that a substitute decision usually saves 70-90% of spend on that workload.

Strengths in Detail

Quality per dollar

No model comes close. Even after the quality discount on workloads where V4 Pro loses outright, the cost saving dominates the procurement math for any organisation processing more than ~$5K/month of API spend.

Self-hostability

For regulated industries (healthcare, finance, government) where data cannot leave the organisational perimeter, V4 Pro is the only frontier-adjacent model that is fully self-hostable. The hardware bar is meaningful (8x H200 or 4x B200 for production deployment) but well within enterprise capability.

Extraction and summarization

On N2 and N4, V4 Pro ties or wins more than 60% of pairwise comparisons against the frontier four. For high-volume extraction pipelines (invoice parsing, contract review, support ticket classification), V4 Pro is the right default.

Weaknesses and Failure Modes

Multi-file refactor

Single-file edits are competent. Cross-file consistency is not. On a 6-file refactor where all files share an implementation contract, V4 Pro tends to update 4 of 6 files correctly and leave the other 2 partially updated, breaking the build.

Hard reasoning

On N3 (multi-step reasoning, hardest sub-set), V4 Pro hits 82 weighted points to Gemini 3.1 Pro's 94. The specific failure: V4 Pro starts a reasoning chain correctly but loses the thread by step 3 of 4, defaulting to a plausible-sounding wrong answer rather than restarting.

Tool-call compliance

On P10, V4 Pro's tool-call success rate on first attempt is 79.3%. GPT-5.5 is at 97.4%. The recovery loop usually succeeds on retry, but the extra round-trip costs latency and compounds in long agent traces.

When to Use V4 Pro

  • High-volume extraction, classification, summarization. The cost saving dominates the small quality discount.
  • Self-hosted regulated workloads. Apache 2.0 makes this viable.
  • Cascade lower tier. Route 70% of traffic here; route the remaining 30% to a frontier model.
  • Cost-sensitive bulk experimentation. Iterate ten times faster on prompt design at one-tenth the spend.
  • Pre-production prototyping. Build at V4 Pro cost; promote categories that need it to frontier later.

When NOT to Use V4 Pro

  • Multi-file code refactor. Use Claude Opus 4.7.
  • Hard reasoning workloads. Use Gemini 3.1 Pro.
  • Tool-using production agents with strict schema requirements. Use GPT-5.5.
  • Workloads in jurisdictions restricting Chinese-origin AI. Use a US-hosted alternative or self-host the weights.
  • Workloads where one wrong answer is catastrophic. Frontier models have lower fabrication rates.

Comparison to Direct Rivals

vs Claude Opus 4.7 (cost-quality)

DimensionV4 ProOpus 4.7
Output price ($/1M)$3.48$25.00
LicenseApache 2.0Closed
SMQTS programming avg76.591.2
SMQTS non-programming avg82.087.6
Cost ratio (output)1x7.2x

vs Gemma 4 27B (open-weight)

DimensionV4 ProGemma 4 27B
LicenseApache 2.0Apache 2.0
Active params49B27B (single-GPU friendly)
SMQTS programming avg76.565.9
Self-host hardware8x H200 / 4x B2001x H100 / 1x H200

Procurement Notes

Enterprise readiness

Direct DeepSeek API: SOC 2 Type II in progress (community report; not yet GA). For US enterprises, the practical paths are Together AI, Fireworks AI, and Groq, which host the open weights with full US-data-residency posture and standard enterprise compliance. Self-hosting is also a real option for organisations with sufficient internal infra capability.

Lock-in score

1.0 / 5 — among the lowest possible scores. Open weights mean the "leave" cost is operational rebuild on a new provider, not vendor lock. Prompt format is OpenAI-compatible chat. Swfte Connect abstracts even the operational rebuild.

Contract leverage

Direct DeepSeek pricing is at-list. Together / Fireworks / Groq offer volume discounts at $50K+/month and have been willing to match competitor list prices on multi-year contracts. Self-hosting puts you on hardware committed-use economics, which can be 30-50% cheaper than hosted API at steady utilization.