Executive Summary
DeepSeek V4 Pro is the most consequential release of April 2026. At $1.74/$3.48 per 1M tokens with Apache 2.0 weights, it collapses the price floor for frontier-adjacent quality work. On 7 of 20 SMQTS categories it is roughly substitutable for closed-frontier models in blind pairwise comparison; on the other 13 it loses, but most of those losses do not matter for most production workloads. For 70% of typical traffic, V4 Pro is the right answer. For the remaining 30%, the gap to closed-frontier is real — and we list exactly which 30%.
Three strengths
- Quality-per-dollar leader. No other model comes close. Roughly 7-10x cheaper than GPT-5.5 for similar Arena Elo bands.
- Apache 2.0 weights. Self-hostable, no vendor lock-in beyond operational cost.
- Strong on N4 (extraction) and N2 (summarization). Roughly substitutable for frontier on most extraction and summarization workloads in pairwise blind tests.
Three weaknesses
- Multi-file code refactor. Single-file edits are good; cross-file consistency degrades meaningfully.
- Hard reasoning. 12-point gap to Gemini 3.1 Pro on N3.
- Tool-call schema compliance. 18-point gap to GPT-5.5 on P10. Agent loops require more recovery round-trips.
Architecture and Training
- Mixture-of-experts: 1.6T total parameters, 49B active per token. ~256 experts, 8 active per token.
- Native 1M context, with the expected decay past ~700K. Long-context retrieval accuracy is competitive with Gemini up to ~500K but falls behind past that.
- Trained on ~14T tokens (community estimate from the technical report) with heavy code and math weighting, plus the new V4 emphasis on multilingual non-English data.
- Apache 2.0 for both model weights and inference code. Commercial use unrestricted.
- Tokenizer: SentencePiece-based, ~152K vocabulary. Compresses code roughly 8% better than cl100k_base.
Pricing Reality
| Path | Input ($/1M) | Output ($/1M) | Notes |
|---|---|---|---|
| DeepSeek API direct | $1.74 | $3.48 | Lowest list price |
| DeepSeek cached input | $0.17 | $3.48 | 10x off cached input |
| Together AI hosted | $2.00 | $4.00 | US-based; small premium |
| Fireworks AI hosted | $1.95 | $3.85 | US-based |
| Self-host (rough est) | $0.50-1.20 | $1.20-2.50 | 8x H200 / 4x B200; depends on utilization |
The honest comparison vs Opus 4.7. Opus 4.7 standard output is $25 per 1M; V4 Pro is $3.48. On a workload where they are quality-equivalent (per the cost-quality section), the spend ratio is about 7.2x. Even when V4 Pro loses some output quality and you re-route 30% of traffic to Opus, the blended monthly bill is still roughly 4-5x cheaper than running 100% on Opus.
SMQTS Results — Programming Series
| Category | DeepSeek V4 Pro | Opus 4.7 | GPT-5.5 | Gemini 3.1 Pro |
|---|---|---|---|---|
| P1 Multi-file refactor | 74 | 94 | 86 | 83 |
| P2 Bug-finding from stack trace | 78 | 92 | 87 | 84 |
| P3 Code review | 76 | 91 | 88 | 85 |
| P4 Test generation | 77 | 89 | 90 | 83 |
| P5 SQL from natural language | 82 | 87 | 89 | 91 |
| P6 Algorithm from spec | 79 | 93 | 89 | 88 |
| P7 Migration scripts | 71 | 92 | 83 | 80 |
| P8 Documentation | 78 | 90 | 88 | 85 |
| P9 Diff comprehension | 76 | 91 | 86 | 83 |
| P10 Tool-using agent loops | 74 | 89 | 92 | 85 |
| Average | 76.5 | 91.2 | 87.6 | 84.6 |
V4 Pro never wins a programming category outright. Its best-relative is P5 (SQL) where it lands within 7-9 points of the leaders — defensible substitution. Its worst-relative is P7 (migration scripts) at 21 points behind Opus 4.7 — do not substitute here.
SMQTS Results — Non-Programming Series
| Category | DeepSeek V4 Pro | Opus 4.7 | Gemini 3.1 Pro |
|---|---|---|---|
| N1 Long-form drafting | 83 | 87 | 89 |
| N2 Summarization | 86 | 91 | 90 |
| N3 Multi-step reasoning | 82 | 83 | 94 |
| N4 Information extraction | 85 | 89 | 87 |
| N5 Translation | 78 | 76 | 92 |
| N6 Style transfer | 82 | 90 | 87 |
| N7 Adversarial resistance | 78 | 92 | 88 |
| N8 Structured output | 83 | 87 | 88 |
| N9 Domain QA | 83 | 90 | 89 |
| N10 Multi-turn coherence | 80 | 91 | 89 |
| Average | 82.0 | 87.6 | 89.3 |
Quality-per-dollar headline (output)
Cost to deliver 79.2 quality-blend score per 1M output tokens ================================================================ DeepSeek V4 Pro $3.48 ### Gemini 3.1 Pro $10.50 ########## Claude Opus 4.7 $25.00 ######################### GPT-5.5 $30.00 ############################## GPT-5.5 Pro $180.00 ##################################################
SMQTS Results — Cost-Quality Validation
The most important section of this report. Pairwise blind grading of V4 Pro against frontier models across the 50-prompt cost-quality sample:
| Workload | V4 Pro wins | Frontier wins | Tie | Verdict |
|---|---|---|---|---|
| Information extraction (N4) vs GPT-5.5 | 34% | 22% | 44% | Substitute |
| Summarization (N2) vs Opus 4.7 | 22% | 38% | 40% | Substitute |
| SQL from NL (P5) vs Gemini 3.1 Pro | 32% | 34% | 34% | Substitute |
| Multi-file refactor (P1) vs Opus 4.7 | 11% | 71% | 18% | Do not substitute |
| Hard reasoning (N3) vs Gemini 3.1 Pro | 14% | 61% | 25% | Do not substitute |
| Tool loops (P10) vs GPT-5.5 | 13% | 67% | 20% | Do not substitute |
Reading: a workload tagged Substitute is one where V4 Pro ties or wins more than 60% of pairwise blind comparisons. Do not substitute means the frontier model wins more than 60% of the time. The cost gap is large enough that a substitute decision usually saves 70-90% of spend on that workload.
Strengths in Detail
Quality per dollar
No model comes close. Even after the quality discount on workloads where V4 Pro loses outright, the cost saving dominates the procurement math for any organisation processing more than ~$5K/month of API spend.
Self-hostability
For regulated industries (healthcare, finance, government) where data cannot leave the organisational perimeter, V4 Pro is the only frontier-adjacent model that is fully self-hostable. The hardware bar is meaningful (8x H200 or 4x B200 for production deployment) but well within enterprise capability.
Extraction and summarization
On N2 and N4, V4 Pro ties or wins more than 60% of pairwise comparisons against the frontier four. For high-volume extraction pipelines (invoice parsing, contract review, support ticket classification), V4 Pro is the right default.
Weaknesses and Failure Modes
Multi-file refactor
Single-file edits are competent. Cross-file consistency is not. On a 6-file refactor where all files share an implementation contract, V4 Pro tends to update 4 of 6 files correctly and leave the other 2 partially updated, breaking the build.
Hard reasoning
On N3 (multi-step reasoning, hardest sub-set), V4 Pro hits 82 weighted points to Gemini 3.1 Pro's 94. The specific failure: V4 Pro starts a reasoning chain correctly but loses the thread by step 3 of 4, defaulting to a plausible-sounding wrong answer rather than restarting.
Tool-call compliance
On P10, V4 Pro's tool-call success rate on first attempt is 79.3%. GPT-5.5 is at 97.4%. The recovery loop usually succeeds on retry, but the extra round-trip costs latency and compounds in long agent traces.
When to Use V4 Pro
- High-volume extraction, classification, summarization. The cost saving dominates the small quality discount.
- Self-hosted regulated workloads. Apache 2.0 makes this viable.
- Cascade lower tier. Route 70% of traffic here; route the remaining 30% to a frontier model.
- Cost-sensitive bulk experimentation. Iterate ten times faster on prompt design at one-tenth the spend.
- Pre-production prototyping. Build at V4 Pro cost; promote categories that need it to frontier later.
When NOT to Use V4 Pro
- Multi-file code refactor. Use Claude Opus 4.7.
- Hard reasoning workloads. Use Gemini 3.1 Pro.
- Tool-using production agents with strict schema requirements. Use GPT-5.5.
- Workloads in jurisdictions restricting Chinese-origin AI. Use a US-hosted alternative or self-host the weights.
- Workloads where one wrong answer is catastrophic. Frontier models have lower fabrication rates.
Comparison to Direct Rivals
vs Claude Opus 4.7 (cost-quality)
| Dimension | V4 Pro | Opus 4.7 |
|---|---|---|
| Output price ($/1M) | $3.48 | $25.00 |
| License | Apache 2.0 | Closed |
| SMQTS programming avg | 76.5 | 91.2 |
| SMQTS non-programming avg | 82.0 | 87.6 |
| Cost ratio (output) | 1x | 7.2x |
vs Gemma 4 27B (open-weight)
| Dimension | V4 Pro | Gemma 4 27B |
|---|---|---|
| License | Apache 2.0 | Apache 2.0 |
| Active params | 49B | 27B (single-GPU friendly) |
| SMQTS programming avg | 76.5 | 65.9 |
| Self-host hardware | 8x H200 / 4x B200 | 1x H100 / 1x H200 |
Procurement Notes
Enterprise readiness
Direct DeepSeek API: SOC 2 Type II in progress (community report; not yet GA). For US enterprises, the practical paths are Together AI, Fireworks AI, and Groq, which host the open weights with full US-data-residency posture and standard enterprise compliance. Self-hosting is also a real option for organisations with sufficient internal infra capability.
Lock-in score
1.0 / 5 — among the lowest possible scores. Open weights mean the "leave" cost is operational rebuild on a new provider, not vendor lock. Prompt format is OpenAI-compatible chat. Swfte Connect abstracts even the operational rebuild.
Contract leverage
Direct DeepSeek pricing is at-list. Together / Fireworks / Groq offer volume discounts at $50K+/month and have been willing to match competitor list prices on multi-year contracts. Self-hosting puts you on hardware committed-use economics, which can be 30-50% cheaper than hosted API at steady utilization.