SMQTS v1.3 · Pinned 2026-04-12

Gemma 4 27B — Deep Dive Research Report (May 2026)

The pragmatic open-weight pick. Single-GPU friendly. Apache 2.0. The right answer for on-prem and edge.

Download research report (.md)

Model Snapshot

Released

2026-04-12

License

Apache 2.0

Context

128K tokens

Parameters

27B dense

Min hardware

1x H100 (80GB)

Comfortable

1x H200 / B200

Quality Index

~75

Self-host

Designed for it

Executive Summary

Gemma 4 27B is the open-weight model you pick when DeepSeek V4 Pro's 8-GPU footprint is not practical. Apache 2.0, single-GPU friendly, Quality Index ~75. It is below frontier and below the bigger open-weight models on most quality dimensions, but the operational economics — one GPU, no external API calls, predictable latency — make it the right answer for a specific set of deployments. Do not pick Gemma 4 for general-purpose production; pick it for on-prem, edge, or tightly-controlled regulated workloads.

Three strengths

  1. Single-GPU deploy. Runs on one H100/H200/B200 with comfortable context. The lowest hardware bar in the open-weight frontier-adjacent tier.
  2. Apache 2.0. No usage restrictions, no vendor dependency, fully redistributable.
  3. Strong on simple structured tasks. Competitive on N4 (extraction) and N8 (structured output) relative to its size class.

Three weaknesses

  1. Below frontier on quality. Quality Index ~75 vs frontier ~85-87.
  2. Hard reasoning. 22-point gap to Gemini 3.1 Pro on N3.
  3. Tool-using agents. 24-point gap to GPT-5.5 on P10. Not a fit for production agent loops without significant scaffolding.

Architecture and Training

  • 27B dense transformer. Not MoE — every parameter activates per token, which keeps inference deterministic but caps the parameter ceiling at what fits in single-GPU memory.
  • 128K native context. Smaller than the frontier 1M+ but workable for most production RAG.
  • Trained on ~9T tokens (per Google's model card) with the Gemini family training corpus distilled down. Knowledge cutoff January 2026.
  • Apache 2.0 for weights, tokenizer, and example inference code. Same license as Gemma 1, 2, 3 lineage.
  • Designed for quantization. Google publishes INT8 and FP8 quantized variants alongside the FP16 reference. The INT8 variant runs at full quality on a single H100 80GB.

Pricing Reality

Gemma 4 has no API list price. Cost is operational. The practical numbers:

DeploymentApprox $/1M outputNotes
Self-host on H100 (on-demand)$1.20-2.50Depends on utilization
Self-host on H100 (committed-use)$0.50-1.001-3yr commit, >60% utilization
Self-host on B200$0.30-0.80FP8, batched, high utilization
Together AI hosted$0.30 / 1M (input+output)Hosted API
Fireworks AI hosted$0.30 / 1M (combined)Hosted API
Groq hosted$0.20 / 1M (combined)LPU; lowest hosted price

The procurement comparison. Hosted Gemma 4 at $0.20-0.30 per 1M is meaningfully cheaper than DeepSeek V4 Pro hosted ($1.74/$3.48) but at the cost of meaningful quality. If your workload tolerates ~75 Quality Index, hosted Gemma 4 is the cheapest credible-quality option in market. If it does not, the cheaper price does not save you anything.

SMQTS Results — Programming Series

CategoryGemma 4 27BDeepSeek V4 ProOpus 4.7
P1 Multi-file refactor667494
P2 Bug-finding from stack trace697892
P3 Code review687691
P4 Test generation677789
P5 SQL from natural language728287
P6 Algorithm from spec687993
P7 Migration scripts617192
P8 Documentation677890
P9 Diff comprehension657691
P10 Tool-using agent loops567489
Average65.976.591.2

SMQTS Results — Non-Programming Series

CategoryGemma 4 27BDeepSeek V4 ProGemini 3.1 Pro
N1 Long-form drafting768389
N2 Summarization818690
N3 Multi-step reasoning728294
N4 Information extraction808587
N5 Translation767892
N6 Style transfer768287
N7 Adversarial resistance727888
N8 Structured output818388
N9 Domain QA798389
N10 Multi-turn coherence748089
Average76.782.089.3

Open-weight class headline

Open-weight Quality Index, May 2026 (composite)
=================================================
DeepSeek V4 Pro    79.2   #####################################
Qwen 3.6 Plus      78.4   ####################################
Llama 4 Maverick   76.1   ###################################
Gemma 4 27B        71.8   ##################################
Mistral Large 3    70.5   #################################

SMQTS Results — Cost-Quality Validation

Gemma 4 vs DeepSeek V4 Pro (the natural open-weight rival), pairwise blind grading on the 50-prompt sample:

WorkloadGemma 4 winsV4 Pro winsTie
Information extraction (N4)23%34%43%
Summarization (N2)19%38%43%
Multi-file refactor (P1)10%61%29%
Hard reasoning (N3)11%56%33%
Tool loops (P10)7%61%32%

Procurement reading. V4 Pro wins outright on every category in pairwise grading. Gemma 4 substitutes acceptably (40%+ tie rate) only on N4 and N2 — workloads where the quality bar is forgiving. The reason to pick Gemma 4 over V4 Pro is therefore not quality; it is operational fit: single-GPU deploy, on-prem constraint, edge environment, or a hardware budget that does not include 8x H200.

Strengths in Detail

Hardware fit

Single H100 80GB at INT8 runs Gemma 4 27B at 128K context with batched throughput of ~80-120 tokens/sec per replica. Single H200 at FP16 doubles that. Single B200 doubles again. No other open-weight in the "quality > 70" band fits this hardware profile.

License clarity

Apache 2.0 is the cleanest open-source license a model can have. No usage restrictions, no commercial-use carveouts, no data-redistribution clauses. Compare to Llama's acceptable-use restrictions or Qwen's region-specific terms — Gemma 4 is the simplest legal procurement story in the open-weight tier.

Structured tasks

On N4 and N8, Gemma 4 is competitive within its size class. For workloads dominated by extraction, classification, or schema-strict generation, the quality-per-watt is favourable.

Weaknesses and Failure Modes

Reasoning depth

The 27B dense parameter ceiling shows on N3. Multi-step reasoning chains break by step 2-3 of 4 in roughly 38% of hard-subset prompts. Gemma 4 is not the model for graduate physics or planning tasks.

Tool-call compliance

On P10, first-attempt tool-call success is 71%, with frequent format errors (markdown-wrapped JSON instead of a tool envelope). Building production agent loops on Gemma 4 requires meaningful prompt scaffolding and a forgiving parser; even then expect 2-3x retry overhead vs GPT-5.5.

Long-context limits

128K context is real but tight. Past 80K tokens, citation accuracy in N9 starts to degrade. Workloads with regular 100K+ prompts should look at DeepSeek V4 Pro (1M context) or Gemini 3.1 Pro (2M context).

When to Use Gemma 4 27B

  • On-prem regulated workloads with strict data-locality requirements.
  • Edge deployments with single-GPU hardware budget.
  • Cost-floor batch processing at $0.20-0.30 per 1M (combined) on hosted APIs.
  • Predictable-latency workloads where you cannot tolerate external API variance.
  • Educational / research deployment where model accessibility matters more than peak quality.

When NOT to Use Gemma 4 27B

  • General-purpose production traffic. Use V4 Pro or a frontier model.
  • Multi-file code refactor or migration. Use Claude Opus 4.7.
  • Tool-using production agents. Use GPT-5.5.
  • Hard reasoning workloads. Use Gemini 3.1 Pro.
  • Long-context RAG. 128K is not enough for the largest retrieval workloads; use V4 Pro or Gemini 3.1 Pro.

Comparison to Direct Rivals

vs DeepSeek V4 Pro

DimensionGemma 4 27BV4 Pro
LicenseApache 2.0Apache 2.0
Active params27B49B (1.6T total)
Min hardware1x H1008x H200 / 4x B200
Context128K1M
Quality Index71.879.2
Hosted API low$0.20 (combined)$1.74 / $3.48

vs Llama 4 Maverick

DimensionGemma 4 27BLlama 4 Maverick
LicenseApache 2.0Llama Community License
Active params27B17B (400B MoE total)
Min hardware (FP16)1x H2002x H100
Quality Index71.876.1
Tool-call P105668

Procurement Notes

Enterprise readiness

Self-hosted: enterprise-readiness depends on the operator, not the model. Hosted on Together / Fireworks / Groq: standard enterprise compliance posture from those providers (SOC 2, ISO 27001, US data residency). Vertex AI also offers Gemma 4 with full Google Cloud governance. The model itself has no managed service from Google — it ships, you operate.

Lock-in score

1.0 / 5 — among the lowest possible. Self-hosted means zero vendor exposure beyond hardware economics. Hosted on aggregators is a one-line config change away from another aggregator. Swfte Connect handles the routing layer.

Contract leverage

Self-hosted: hardware committed-use discounts are the lever. Hosted aggregators offer volume pricing at $25K+/month. Because the weights are open and identical across providers, the price competition is genuine and the negotiation dynamic favours the buyer.