Executive Summary
Gemma 4 27B is the open-weight model you pick when DeepSeek V4 Pro's 8-GPU footprint is not practical. Apache 2.0, single-GPU friendly, Quality Index ~75. It is below frontier and below the bigger open-weight models on most quality dimensions, but the operational economics — one GPU, no external API calls, predictable latency — make it the right answer for a specific set of deployments. Do not pick Gemma 4 for general-purpose production; pick it for on-prem, edge, or tightly-controlled regulated workloads.
Three strengths
- Single-GPU deploy. Runs on one H100/H200/B200 with comfortable context. The lowest hardware bar in the open-weight frontier-adjacent tier.
- Apache 2.0. No usage restrictions, no vendor dependency, fully redistributable.
- Strong on simple structured tasks. Competitive on N4 (extraction) and N8 (structured output) relative to its size class.
Three weaknesses
- Below frontier on quality. Quality Index ~75 vs frontier ~85-87.
- Hard reasoning. 22-point gap to Gemini 3.1 Pro on N3.
- Tool-using agents. 24-point gap to GPT-5.5 on P10. Not a fit for production agent loops without significant scaffolding.
Architecture and Training
- 27B dense transformer. Not MoE — every parameter activates per token, which keeps inference deterministic but caps the parameter ceiling at what fits in single-GPU memory.
- 128K native context. Smaller than the frontier 1M+ but workable for most production RAG.
- Trained on ~9T tokens (per Google's model card) with the Gemini family training corpus distilled down. Knowledge cutoff January 2026.
- Apache 2.0 for weights, tokenizer, and example inference code. Same license as Gemma 1, 2, 3 lineage.
- Designed for quantization. Google publishes INT8 and FP8 quantized variants alongside the FP16 reference. The INT8 variant runs at full quality on a single H100 80GB.
Pricing Reality
Gemma 4 has no API list price. Cost is operational. The practical numbers:
| Deployment | Approx $/1M output | Notes |
|---|---|---|
| Self-host on H100 (on-demand) | $1.20-2.50 | Depends on utilization |
| Self-host on H100 (committed-use) | $0.50-1.00 | 1-3yr commit, >60% utilization |
| Self-host on B200 | $0.30-0.80 | FP8, batched, high utilization |
| Together AI hosted | $0.30 / 1M (input+output) | Hosted API |
| Fireworks AI hosted | $0.30 / 1M (combined) | Hosted API |
| Groq hosted | $0.20 / 1M (combined) | LPU; lowest hosted price |
The procurement comparison. Hosted Gemma 4 at $0.20-0.30 per 1M is meaningfully cheaper than DeepSeek V4 Pro hosted ($1.74/$3.48) but at the cost of meaningful quality. If your workload tolerates ~75 Quality Index, hosted Gemma 4 is the cheapest credible-quality option in market. If it does not, the cheaper price does not save you anything.
SMQTS Results — Programming Series
| Category | Gemma 4 27B | DeepSeek V4 Pro | Opus 4.7 |
|---|---|---|---|
| P1 Multi-file refactor | 66 | 74 | 94 |
| P2 Bug-finding from stack trace | 69 | 78 | 92 |
| P3 Code review | 68 | 76 | 91 |
| P4 Test generation | 67 | 77 | 89 |
| P5 SQL from natural language | 72 | 82 | 87 |
| P6 Algorithm from spec | 68 | 79 | 93 |
| P7 Migration scripts | 61 | 71 | 92 |
| P8 Documentation | 67 | 78 | 90 |
| P9 Diff comprehension | 65 | 76 | 91 |
| P10 Tool-using agent loops | 56 | 74 | 89 |
| Average | 65.9 | 76.5 | 91.2 |
SMQTS Results — Non-Programming Series
| Category | Gemma 4 27B | DeepSeek V4 Pro | Gemini 3.1 Pro |
|---|---|---|---|
| N1 Long-form drafting | 76 | 83 | 89 |
| N2 Summarization | 81 | 86 | 90 |
| N3 Multi-step reasoning | 72 | 82 | 94 |
| N4 Information extraction | 80 | 85 | 87 |
| N5 Translation | 76 | 78 | 92 |
| N6 Style transfer | 76 | 82 | 87 |
| N7 Adversarial resistance | 72 | 78 | 88 |
| N8 Structured output | 81 | 83 | 88 |
| N9 Domain QA | 79 | 83 | 89 |
| N10 Multi-turn coherence | 74 | 80 | 89 |
| Average | 76.7 | 82.0 | 89.3 |
Open-weight class headline
Open-weight Quality Index, May 2026 (composite) ================================================= DeepSeek V4 Pro 79.2 ##################################### Qwen 3.6 Plus 78.4 #################################### Llama 4 Maverick 76.1 ################################### Gemma 4 27B 71.8 ################################## Mistral Large 3 70.5 #################################
SMQTS Results — Cost-Quality Validation
Gemma 4 vs DeepSeek V4 Pro (the natural open-weight rival), pairwise blind grading on the 50-prompt sample:
| Workload | Gemma 4 wins | V4 Pro wins | Tie |
|---|---|---|---|
| Information extraction (N4) | 23% | 34% | 43% |
| Summarization (N2) | 19% | 38% | 43% |
| Multi-file refactor (P1) | 10% | 61% | 29% |
| Hard reasoning (N3) | 11% | 56% | 33% |
| Tool loops (P10) | 7% | 61% | 32% |
Procurement reading. V4 Pro wins outright on every category in pairwise grading. Gemma 4 substitutes acceptably (40%+ tie rate) only on N4 and N2 — workloads where the quality bar is forgiving. The reason to pick Gemma 4 over V4 Pro is therefore not quality; it is operational fit: single-GPU deploy, on-prem constraint, edge environment, or a hardware budget that does not include 8x H200.
Strengths in Detail
Hardware fit
Single H100 80GB at INT8 runs Gemma 4 27B at 128K context with batched throughput of ~80-120 tokens/sec per replica. Single H200 at FP16 doubles that. Single B200 doubles again. No other open-weight in the "quality > 70" band fits this hardware profile.
License clarity
Apache 2.0 is the cleanest open-source license a model can have. No usage restrictions, no commercial-use carveouts, no data-redistribution clauses. Compare to Llama's acceptable-use restrictions or Qwen's region-specific terms — Gemma 4 is the simplest legal procurement story in the open-weight tier.
Structured tasks
On N4 and N8, Gemma 4 is competitive within its size class. For workloads dominated by extraction, classification, or schema-strict generation, the quality-per-watt is favourable.
Weaknesses and Failure Modes
Reasoning depth
The 27B dense parameter ceiling shows on N3. Multi-step reasoning chains break by step 2-3 of 4 in roughly 38% of hard-subset prompts. Gemma 4 is not the model for graduate physics or planning tasks.
Tool-call compliance
On P10, first-attempt tool-call success is 71%, with frequent format errors (markdown-wrapped JSON instead of a tool envelope). Building production agent loops on Gemma 4 requires meaningful prompt scaffolding and a forgiving parser; even then expect 2-3x retry overhead vs GPT-5.5.
Long-context limits
128K context is real but tight. Past 80K tokens, citation accuracy in N9 starts to degrade. Workloads with regular 100K+ prompts should look at DeepSeek V4 Pro (1M context) or Gemini 3.1 Pro (2M context).
When to Use Gemma 4 27B
- On-prem regulated workloads with strict data-locality requirements.
- Edge deployments with single-GPU hardware budget.
- Cost-floor batch processing at $0.20-0.30 per 1M (combined) on hosted APIs.
- Predictable-latency workloads where you cannot tolerate external API variance.
- Educational / research deployment where model accessibility matters more than peak quality.
When NOT to Use Gemma 4 27B
- General-purpose production traffic. Use V4 Pro or a frontier model.
- Multi-file code refactor or migration. Use Claude Opus 4.7.
- Tool-using production agents. Use GPT-5.5.
- Hard reasoning workloads. Use Gemini 3.1 Pro.
- Long-context RAG. 128K is not enough for the largest retrieval workloads; use V4 Pro or Gemini 3.1 Pro.
Comparison to Direct Rivals
vs DeepSeek V4 Pro
| Dimension | Gemma 4 27B | V4 Pro |
|---|---|---|
| License | Apache 2.0 | Apache 2.0 |
| Active params | 27B | 49B (1.6T total) |
| Min hardware | 1x H100 | 8x H200 / 4x B200 |
| Context | 128K | 1M |
| Quality Index | 71.8 | 79.2 |
| Hosted API low | $0.20 (combined) | $1.74 / $3.48 |
vs Llama 4 Maverick
| Dimension | Gemma 4 27B | Llama 4 Maverick |
|---|---|---|
| License | Apache 2.0 | Llama Community License |
| Active params | 27B | 17B (400B MoE total) |
| Min hardware (FP16) | 1x H200 | 2x H100 |
| Quality Index | 71.8 | 76.1 |
| Tool-call P10 | 56 | 68 |
Procurement Notes
Enterprise readiness
Self-hosted: enterprise-readiness depends on the operator, not the model. Hosted on Together / Fireworks / Groq: standard enterprise compliance posture from those providers (SOC 2, ISO 27001, US data residency). Vertex AI also offers Gemma 4 with full Google Cloud governance. The model itself has no managed service from Google — it ships, you operate.
Lock-in score
1.0 / 5 — among the lowest possible. Self-hosted means zero vendor exposure beyond hardware economics. Hosted on aggregators is a one-line config change away from another aggregator. Swfte Connect handles the routing layer.
Contract leverage
Self-hosted: hardware committed-use discounts are the lever. Hosted aggregators offer volume pricing at $25K+/month. Because the weights are open and identical across providers, the price competition is genuine and the negotiation dynamic favours the buyer.