technology

AWS Bedrock 2026: Models, Pricing, and How It Compares to Azure and Vertex

AWS Bedrock 2026: supported models, per-token pricing, on-demand vs provisioned, vs Azure OpenAI and Vertex AI.

May 6, 2026

English

AWS Bedrock entered 2026 as the most model-diverse managed AI service of the three hyperscaler platforms — and as of April 28, that diversity now includes the full GPT-5.5 family alongside Claude, Llama, Mistral, Cohere, AI21, Stability, and Amazon's own Titan and Nova lines. For AWS-native enterprises that previously had to pick between staying in-VPC and reaching frontier models, the answer is no longer a trade-off.

This is the practical Bedrock guide we wish we had bookmarked when we first started shipping AI workloads on AWS. It covers what Bedrock actually is (and what it is not), every supported model family in May 2026, on-demand vs provisioned-throughput pricing math, the comparison against Azure OpenAI Service and Google Vertex AI, the handful of features (Knowledge Bases, Agents, Guardrails) that make Bedrock genuinely differentiated, and a getting-started checklist that skips the marketing detours.

What AWS Bedrock Is

Bedrock is a fully managed service that exposes a single API surface for first- and third-party foundation models, billed through your existing AWS account, deployed inside your existing VPC topology, and governed by your existing IAM, KMS, and CloudTrail tooling.

The two-sentence version: Bedrock turns frontier-model APIs into AWS line items. You get model access without leaving your AWS perimeter, without negotiating a separate vendor contract, and without standing up your own inference infrastructure.

What Bedrock is not:

It is not a model. Bedrock is the gateway; the models are Claude, Llama, Titan, etc.
It is not "AWS's ChatGPT." That product is Amazon Q, which sits on top of Bedrock for some of its capabilities but is a separate user-facing product.
It is not free. Every invocation has a per-token charge, and provisioned-throughput commitments come with hourly minimums.

Bedrock's architectural advantage over consuming model APIs directly is that data never leaves AWS. For HIPAA, FedRAMP, ITAR, and other regulated workloads this single property is the difference between "we can use this" and "legal said no."

Supported Model Families in May 2026

Bedrock supports five major third-party model families plus Amazon's first-party models. As of the April 28 GPT-5.5 announcement (covered in our April 2026 AI model releases roundup), the OpenAI lineup is now part of this list.

Anthropic Claude

The most-deployed Bedrock model family, by every public metric AWS shares. Available models in May 2026:

Claude Opus 4.7 — the frontier flagship; 1M context; $15 input / $75 output per 1M tokens
Claude Sonnet 4.6 — the workhorse tier; 1M context; $3 input / $15 output per 1M tokens
Claude Haiku 4.5 — the speed/value tier; 200k context; $0.80 input / $4 output per 1M tokens

Claude was the original headline integration that made Bedrock relevant in 2024; Anthropic remains AWS's deepest partner and Claude models tend to land on Bedrock first or simultaneously with Anthropic's own API.

OpenAI GPT (added April 2026)

The newest family on Bedrock:

GPT-5.5 "Spud" — frontier; 1M context; $5 input / $15 output per 1M tokens
GPT-Rosalind — multimodal vision/audio variant; $4 input / $12 output per 1M tokens
GPT-Codex — code specialist; $2.50 input / $7.50 output per 1M tokens

The April 28 launch landed GPT-5.5 on Bedrock the same week as on Azure — historically AWS lagged Azure by months on OpenAI access.

Meta Llama

Llama 4 family in 405B, 70B, and 8B parameter sizes, all available on Bedrock with on-demand pricing:

Llama 4 405B — frontier-tier open-weight; $1.95 input / $2.56 output per 1M tokens
Llama 4 70B — value tier; $0.65 input / $0.86 output per 1M tokens
Llama 4 8B — high-volume tier; $0.18 input / $0.24 output per 1M tokens

Provisioned throughput is available for all three, useful for predictable high-volume workloads.

Amazon Nova and Titan

Amazon's first-party models. The Nova line launched in late 2024 and was upgraded twice in 2025:

Nova Premier — frontier-tier first-party model; $2.50 input / $10 output per 1M tokens
Nova Pro — multimodal mid-tier; $0.80 input / $3.20 output per 1M tokens
Nova Lite — speed tier; $0.06 input / $0.24 output per 1M tokens
Nova Micro — text-only ultra-light; $0.035 input / $0.14 output per 1M tokens
Titan Text Express — legacy text model
Titan Text Embeddings v2 — 1024-dim embeddings; $0.02 per 1M tokens
Titan Image Generator v2 — text-to-image
Nova Reel — text-to-video, 6-second 720p; $0.08 per second

Nova Micro at $0.035 input is the cheapest text model on any hyperscaler — useful for high-volume classification and routing layers.

Mistral AI

Mistral Large 3 — $4 input / $12 output per 1M tokens
Mistral Medium 3 — $0.40 input / $2 output per 1M tokens
Mistral Small 3 — $0.20 input / $0.60 output per 1M tokens
Codestral — code specialist; $0.30 input / $0.90 output per 1M tokens

Mistral on Bedrock lands well in EU-data-residency conversations because Mistral is a French lab with EU data-handling defaults.

Cohere

Command R+ v3 — flagship; $2.50 input / $10 output per 1M tokens
Command R v3 — workhorse; $0.50 input / $1.50 output per 1M tokens
Embed v4 — embeddings; $0.10 per 1M tokens
Rerank v4 — relevance ranking; $1 per 1k searches

Cohere has the cleanest enterprise-RAG story among the third-party providers — Embed + Rerank + Command R+ as a stack delivers measurably better retrieval accuracy than mix-and-matched alternatives in our internal evaluations.

AI21, Stability, and DeepSeek

AI21 Jamba 2 — long-context specialist; $0.50 input / $0.70 output per 1M tokens
Stability SD 3.5 Large — image generation; $0.04 per image
Stability SVD 1.2 — video generation; $0.20 per 4-second clip
DeepSeek V3.5 — added late 2025; $0.28 input / $0.42 output per 1M tokens

DeepSeek V4 (the April 2026 release covered in our DeepSeek V4 deep dive) is "coming soon" on Bedrock per the April launch announcement; not yet GA as of May 6.

Bedrock Pricing: Master Table

Model	Provider	Input $/1M	Output $/1M	Context
Claude Opus 4.7	Anthropic	$15.00	$75.00	1M
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	1M
Claude Haiku 4.5	Anthropic	$0.80	$4.00	200k
GPT-5.5	OpenAI	$5.00	$15.00	1M
GPT-Rosalind	OpenAI	$4.00	$12.00	1M
GPT-Codex	OpenAI	$2.50	$7.50	256k
Llama 4 405B	Meta	$1.95	$2.56	256k
Llama 4 70B	Meta	$0.65	$0.86	128k
Llama 4 8B	Meta	$0.18	$0.24	128k
Nova Premier	Amazon	$2.50	$10.00	300k
Nova Pro	Amazon	$0.80	$3.20	300k
Nova Lite	Amazon	$0.06	$0.24	300k
Nova Micro	Amazon	$0.035	$0.14	128k
Mistral Large 3	Mistral	$4.00	$12.00	128k
Mistral Medium 3	Mistral	$0.40	$2.00	128k
Mistral Small 3	Mistral	$0.20	$0.60	128k
Command R+ v3	Cohere	$2.50	$10.00	128k
Command R v3	Cohere	$0.50	$1.50	128k
AI21 Jamba 2	AI21	$0.50	$0.70	256k
DeepSeek V3.5	DeepSeek	$0.28	$0.42	256k

Pricing is for on-demand US East (N. Virginia) as of May 1, 2026. EU and AP regions price 5-15% higher. For continuously updated pricing across providers see the Swfte AI pricing trends page.

On-Demand vs Provisioned Throughput

Bedrock has two billing modes that exist for different workload shapes.

On-demand

Pay per token, no commitment, no capacity reservation. Good for:

Bursty workloads
Anything under ~10M tokens per day
Anything where latency variance is acceptable
Development and pilot work

Drawback: subject to model-level rate limits. During peak hours these limits bite.

Provisioned throughput

You commit to a minimum number of "model units" for a 1-month or 6-month term, where each model unit guarantees a fixed throughput in tokens per minute. Good for:

Predictable high-volume workloads (>50M tokens per day on a single model)
Latency-critical workloads (provisioned has lower TTFT variance)
Workloads requiring guaranteed capacity during business hours

The 6-month commitment is roughly 30% cheaper per token than on-demand at typical utilization. The 1-month commitment is roughly 15% cheaper.

The math: a workload running 100M tokens / day on Claude Sonnet 4.6 costs roughly $300/day on-demand. The same workload on a 6-month provisioned commitment runs roughly $210/day — savings of $90/day, or $16,200 over the term, but you are locked in regardless of usage drop. Provisioned makes sense when your token volume is reliable to within ~25%; on-demand wins below that threshold.

Bedrock vs Azure OpenAI vs Vertex AI

The honest comparison most procurement teams want:

Dimension	AWS Bedrock	Azure OpenAI	Google Vertex AI
Frontier closed models	Claude, GPT-5.5, Nova Premier	GPT-5.5, GPT-Rosalind	Gemini 3.1 Pro, Claude (3rd)
Open-weight selection	Llama 4, Mistral, DeepSeek	Limited (Llama 4 only)	Llama 4, Gemma, Mistral
First-party model	Nova / Titan	None (rents from OpenAI)	Gemini family
Image generation	Titan v2, Stability SD 3.5	DALL-E 3	Imagen 3
Video generation	Nova Reel, Stability SVD	Sora (limited preview)	Veo 3
Embeddings	Titan, Cohere Embed	text-embedding-3-large	text-embedding-005, Gemini
Provisioned mode	Yes (PTUs)	Yes (PTUs)	Yes (Provisioned Throughput)
Knowledge Bases / RAG	Bedrock Knowledge Bases	Azure AI Search + integration	Vertex AI Search
Agent framework	Bedrock Agents	Azure AI Agent Service	Vertex AI Agent Builder
Guardrails	Bedrock Guardrails	Azure AI Content Safety	Vertex AI Safety Filters
Top regulated certs	FedRAMP High, IL5, HIPAA	FedRAMP High, HIPAA	FedRAMP High, HIPAA

The honest summary:

Bedrock wins on model diversity. No other hyperscaler offers Claude, GPT, Llama, Mistral, Cohere, and a credible first-party line under one billing surface.
Azure wins on OpenAI integration depth. Tightest coupling with the OpenAI roadmap; lowest-latency GPT inference; OpenAI-specific features land on Azure first.
Vertex wins on multimodal frontier. Gemini 3.1 Pro's 2M context and Veo 3 video are the strongest natively-integrated frontier multimodal stack.

The "right" choice tends to be set by where the rest of your data lives. Choosing the AI platform that lives in the cloud where your data already is removes 80% of the deployment friction. The remaining 20% — model availability, pricing, feature gaps — is what multi-cloud routing solutions like Swfte Connect abstract.

Bedrock-Native Features Worth Knowing

Three Bedrock features are genuinely differentiated and worth understanding even if you only use Bedrock for raw model inference.

Bedrock Knowledge Bases

A managed RAG service. You point it at an S3 bucket; it ingests, chunks, embeds (using Titan or Cohere), stores in OpenSearch Serverless or Aurora pgvector, and exposes a single retrieve-and-generate endpoint. Skips the standard RAG-pipeline plumbing. Pricing is per-token for the underlying embedding and generation calls plus the OpenSearch/Aurora costs.

Bedrock Agents

An agent runtime that handles tool definitions, multi-step planning, and orchestration. Bedrock Agents is closest in shape to the OpenAI Assistants API or LangGraph but runs as a managed AWS service with IAM-based tool authorization. The 2026 v2 release added parallel tool execution, persistent memory, and Knowledge-Base-aware retrieval inside the agent loop.

Bedrock Guardrails

Configurable content-policy filters applied as a separate layer between caller and model. You define denied topics, harmful-content categories, PII redaction rules, and custom word filters. Guardrails apply uniformly across model providers — useful for enforcing the same policy across Claude, GPT, and Llama in a single application.

Use Cases Where Bedrock Specifically Wins

Five workload patterns where Bedrock's specific shape gives it a clear edge over alternatives.

Multi-model A/B testing. Running Claude Opus, GPT-5.5, and Llama 4 405B against the same prompt set under a single SDK and a single bill. Trivial on Bedrock; awkward elsewhere.
Regulated workloads with frontier-model requirements. HIPAA + Claude Opus 4.7 inside a VPC with PrivateLink. The certifications and the model are both on the same platform.
Hybrid first-party + third-party stacks. Using Nova Lite for high-volume classification and Claude Sonnet for the final generation, with both billed through the same account.
AWS-native data lakes. Pointing Bedrock Knowledge Bases at S3 with KMS encryption, IAM access, and CloudTrail logging — every component is already in your AWS perimeter.
Long-term provisioned commitments. The 6-month PTU pricing is competitive against any direct API at high sustained volume.

Getting Started Checklist

For a team standing up Bedrock for the first time, a no-detours sequence:

Enable model access. In the Bedrock console, navigate to "Model access" and request access to the models you actually need. Most are auto-granted; Claude Opus, GPT-5.5, and Nova Premier require a brief usage-statement form.
Set up CloudWatch + CloudTrail. Bedrock invocations log to CloudTrail by default. Add CloudWatch metrics for token usage and invocation count per model.
Pick a region intentionally. US East (N. Virginia) has the most models and lowest pricing; EU (Frankfurt) and AP (Tokyo) have model gaps. For data-residency reasons you may need EU or AP; for cost reasons US East wins.
Use the Converse API, not the legacy InvokeModel API. Converse normalizes the request/response format across providers — the same code targets Claude, GPT, Llama, and Nova with only a model ID change.
Set up a Guardrails policy on day one. Even a permissive default Guardrails policy prevents the worst categories of unintended output and is trivial to add later.
Tag every invocation. The inferenceConfig.tags field in Converse propagates to billing. Tag by team, environment, and feature for cost attribution.
Start on-demand; move to provisioned only after 30 days of token-volume data. The provisioned-throughput economics break if your usage estimate is wrong by more than 25%.

For a multi-provider abstraction across Bedrock, Azure OpenAI, and Vertex AI without re-platforming, see Swfte Connect — a single API that routes between every Bedrock model and every model on the other hyperscalers based on cost and quality criteria you set.

FAQ

What is AWS Bedrock?

AWS Bedrock is a fully managed AWS service that exposes foundation models from Anthropic, OpenAI, Meta, Mistral, Cohere, AI21, Stability, DeepSeek, and Amazon's own Nova/Titan lines through a unified API. Models run inside the AWS perimeter, billed on the AWS account, governed by IAM and KMS.

How much does Bedrock cost?

Bedrock pricing is per-token for text models and per-image or per-second for image/video models. Costs range from $0.035 per 1M input tokens (Nova Micro) to $75 per 1M output tokens (Claude Opus 4.7). Provisioned throughput offers ~15-30% savings on predictable high-volume workloads in exchange for a 1-month or 6-month commitment.

What models are available on AWS Bedrock?

As of May 2026: Anthropic Claude (Opus 4.7, Sonnet 4.6, Haiku 4.5); OpenAI GPT-5.5, GPT-Rosalind, GPT-Codex; Meta Llama 4 (405B, 70B, 8B); Amazon Nova (Premier, Pro, Lite, Micro) and Titan; Mistral Large/Medium/Small/Codestral; Cohere Command R+/R, Embed, Rerank; AI21 Jamba 2; Stability SD 3.5 and SVD; DeepSeek V3.5.

What is the difference between AWS Bedrock and Amazon Q?

Bedrock is the model gateway — the API surface for foundation models. Amazon Q is a packaged user-facing product (chatbot for business, developer assistant) that uses Bedrock under the hood. You consume Bedrock when building your own applications; you consume Q when you want a finished product.

Is AWS Bedrock HIPAA compliant?

Yes — AWS Bedrock is in scope under the AWS HIPAA Business Associate Addendum, with the same coverage as other in-scope AWS services. Bedrock is also FedRAMP High, IL5, ISO 27001, SOC 1/2/3, and PCI DSS authorized.

How does Bedrock compare to Azure OpenAI?

Azure OpenAI is single-vendor (OpenAI models only) with the deepest GPT integration and lowest-latency GPT inference. Bedrock is multi-vendor with Claude, GPT, Llama, Mistral, Cohere, and Amazon's first-party models on a single platform. Choose Azure if your stack is OpenAI-centric and Microsoft-native; choose Bedrock if you need model diversity or AWS-native data residency.

Can I run my own fine-tuned models on Bedrock?

Yes — Bedrock Custom Models supports fine-tuning of select base models (Llama, Titan, Nova, Cohere) and continued pre-training on Llama and Titan. Custom models deploy to provisioned throughput only — you cannot run a custom-fine-tuned model on the on-demand tier.

For multi-cloud AI orchestration that routes across Bedrock, Azure OpenAI, and Vertex AI based on cost and quality, see Swfte Connect. For ongoing pricing and benchmark tracking see the Swfte AI leaderboard and pricing trends page.

Posted intechnology

AWS Bedrock Cloud AI Claude Llama Mistral

Enjoyed this article?

Get more insights on AI and enterprise automation delivered to your inbox.

← Back to all articles