pricing

Amazon Nova Pro Pricing in 2026: Per-Token Costs, Real-World Math, and How It Compares

Amazon Nova Pro pricing for 2026: per-token costs, cache and batch discounts, hidden Bedrock surcharges.

May 7, 2026

English

If you are evaluating Amazon Nova Pro for a serious enterprise workload in 2026, the answer to what does it cost? is more interesting than the per-token sticker price suggests. The headline number gets you 60% of the way to a budget; the other 40% is in batch discounts, prompt-caching credits, Bedrock provisioned-throughput surcharges, and the difference between what you pay AWS and what you actually spend on a typical agent workload that round-trips through tools eight or ten times. This post walks through all of that, in numbers, with the comparison points buyers actually care about.

The headline numbers

As of May 2026, Amazon Nova Pro on AWS Bedrock is priced at:

Input tokens: $0.80 per 1M tokens (on-demand)
Output tokens: $3.20 per 1M tokens (on-demand)
Cached input tokens: $0.20 per 1M tokens (75% discount on cache hits, 5-minute TTL)
Batch input tokens: $0.40 per 1M tokens (50% off, asynchronous, 24-hour SLA)
Batch output tokens: $1.60 per 1M tokens (50% off, same conditions)
Provisioned throughput: quoted per model unit per hour, varies by region; expect ~$22-$28/hour per unit at the time of writing

Those are the prices a procurement team will see on the AWS pricing page, and they are correct as far as they go. They are not the prices a finance team will see at the end of the month, because most real workloads do not run at the on-demand sticker price. They run somewhere between cached and on-demand, with a fraction of traffic eligible for batch, and a chunk of long-running agent traffic where caching is decisive.

Where Nova Pro sits in the broader market

To put those numbers in context, here is what the rest of the May 2026 frontier looks like, normalised to cost per 1M output tokens (output tokens dominate any agent workload because every reasoning step gets billed):

Anthropic Claude Opus 4.7: $25 / 1M output
OpenAI GPT-5.5: $30 / 1M output (Pro tier $180 / 1M output)
Google Gemini 3.1 Pro: $10.50 / 1M output
Anthropic Claude Sonnet 4.6: $15 / 1M output
DeepSeek V4 Pro: $3.48 / 1M output
DeepSeek V4 Flash: $0.28 / 1M output
Amazon Nova Pro: $3.20 / 1M output
Amazon Nova Lite: $0.24 / 1M output
Amazon Nova Micro: $0.14 / 1M output

So Nova Pro sits in the upper-mid tier on raw output cost — about 8-10x cheaper than the Anthropic and OpenAI flagships, slightly cheaper than DeepSeek V4 Pro, and well within the band where it competes seriously for production agent traffic. Nova Lite and Nova Micro slot into the same band as DeepSeek V4 Flash for cheap-and-fast workloads.

The cache discount is the part most buyers under-model

Nova Pro's prompt-caching pricing — $0.20 per 1M cached input tokens — is the line that materially changes the unit economics on most agent workloads, and it is also the line that buyers most consistently fail to budget against.

Here is why it matters. A typical agent loop in 2026 looks like this: the system prompt is 8K tokens, the tool catalogue and schemas are another 6K, the rolling conversation history is 20K-50K tokens, and the new user input or tool result on each turn is usually under 2K. On a ten-turn agent loop, the model sees roughly 350K input tokens if you do not cache — and roughly 60K new input tokens plus 290K cached input tokens if you do. The bill at sticker price would be 350K × $0.80 / 1M = $0.28. With caching enabled correctly, it drops to (60K × $0.80 + 290K × $0.20) / 1M = $0.106. That is a 62% reduction on input billing, on the same workload, just by using the cache.

The two things that go wrong with this calculation in practice:

Cache TTL is 5 minutes. A user who comes back after lunch starts a fresh cache. If your sessions are bursty, the cache hit rate drops. Build your cost model with a realistic hit rate (50-70% on warm traffic, 0% on cold).
Cache writes have a small premium. The first call that populates the cache is billed at 125% of the on-demand input rate ($1.00 per 1M instead of $0.80). On short, one-shot prompts, the cache penalty exceeds the cache benefit; only enable caching where the prompt is large enough and the session is long enough.

A reasonable rule of thumb: enable caching for prompts above 4K tokens with at least three follow-up turns expected. Below that, the cache write penalty dominates.

Batch processing: the 50% discount nobody uses

AWS Bedrock supports batch inference for Nova Pro at a flat 50% discount on both input and output, with a 24-hour SLA. For any workload that does not need real-time responses — overnight document extraction, weekly report generation, retroactive labelling, eval runs against held-out datasets — this is free money. Almost no enterprise we talk to has wired up batch routing for any of their async workloads. They are paying on-demand prices for jobs that explicitly do not need on-demand latency. The fix is a queue and a scheduler; the savings on a workload of 100M tokens per week is roughly $200 per week per workload, which over a year is the cost of an engineer-month.

Provisioned throughput: when it pays off

Provisioned throughput on Bedrock — paying ~$22-$28 per hour per model unit instead of paying per token — only makes sense above certain volume thresholds. The break-even rule is roughly: if your steady-state throughput sustains a model unit at 60%+ utilisation for the duration of the commitment, provisioned beats on-demand. Below that, you are paying for capacity you are not using.

For Nova Pro specifically, a single model unit handles roughly 200K input + 200K output tokens per minute under reasonable load. At sticker prices, that is $0.16/min input + $0.64/min output = $0.80/min, or $48/hour at full utilisation. Provisioned at $25/hour beats on-demand if you sustain ~52% utilisation over the commitment window. Most production workloads pulse rather than sustain; do the math against your actual traffic shape, not against the peak.

Hidden Bedrock surcharges

Three line items show up in finance reviews that are easy to miss in the pricing model:

Cross-region invocation fees. Calling a Nova Pro model in us-west-2 from compute running in eu-central-1 adds inter-region data transfer costs. For a chatty agent, this can add 5-8% to the bill.
CloudWatch and AWS X-Ray. If you are logging full prompts and completions for governance, CloudWatch ingestion and storage become a non-trivial line — plan for an extra 10-15% on top of the model bill if you keep 90 days of full prompt logs.
VPC endpoint hourly charges. For a security-mandated VPC endpoint setup, expect ~$8/month per endpoint per AZ. Small absolute number; it shows up in spreadsheets when finance wonders why "AI cost" doesn't match "Bedrock cost".

None of these are deal-breakers, but they collectively add 15-25% to the rough sticker model. Build them into the budget up front so the first AWS bill does not surprise the CFO.

Real-world cost-per-task: three workloads

Sticker prices are an abstraction. Here is what Nova Pro actually costs for three workloads we benchmarked against representative customer data:

1. Invoice extraction (per document)

Average prompt: 4K input (with caching), 600 output
Per-document cost at on-demand: $0.0019 input + $0.00192 output ≈ $0.004 per invoice
With caching on the layout-fingerprint prompt: ≈ $0.002 per invoice
Throughput: ~2 invoices/second per model unit

2. Customer-support agent (per ticket, full multi-turn)

Average session: 60K input total (40K cached), 8K output across 6 turns
Per-ticket cost: $0.024 + $0.0256 ≈ $0.05 per ticket
Equivalent on Claude Opus 4.7: ~$0.50 per ticket. Equivalent on DeepSeek V4 Pro: ~$0.04. Nova Pro splits the middle.

3. Code review agent (per PR)

Average session: 120K input (60K cached), 15K output
Per-review cost: $0.064 + $0.048 ≈ $0.11 per PR
Throughput limit on quality: Nova Pro misses subtle architectural issues that Claude Opus catches; for PR review specifically, the cost-quality trade-off favours upgrading to Claude Sonnet 4.6 ($15/1M output) rather than upgrading to Nova Pro, because Sonnet's diff-reasoning quality at 5x the price is actually better than Nova Pro on this benchmark.

The general pattern: Nova Pro is excellent value on document-style and structured-output workloads, competitive on customer-support agents, and a poor fit for tasks that require deep multi-step reasoning across long contexts.

Where Nova Pro is actually the right pick

After a year of running Nova Pro alongside Claude, GPT, Gemini, and DeepSeek in production traffic for enterprise customers, the workloads where Nova Pro is genuinely the best price-quality choice cluster around:

High-throughput document workflows (invoice OCR + extraction, contract clause extraction, KYC document checks). The combination of cheap output, strong vision, and AWS-native integration with S3 + Textract + Bedrock makes Nova Pro the obvious pick if your data is already in AWS.
Multilingual customer-support agents at scale. Nova Pro's multilingual quality is competitive with Gemini 3.1 Pro on European languages, slightly behind on East Asian languages, and is roughly 3x cheaper than Gemini for the same workload.
AWS-native enterprise stacks where adding Anthropic or OpenAI as a separate vendor introduces procurement friction. Nova Pro inherits the AWS contract you already have, the AWS data-residency guarantees you already have, and the AWS audit trail you already have. For a regulated enterprise on AWS, that procurement-friction reduction is worth real money.

Where Nova Pro is not the right pick:

Frontier-grade code generation. Use Claude Opus 4.7 or DeepSeek V4 Pro.
Deep multi-step reasoning over long contexts. Use Gemini 3.1 Pro or Claude Opus.
Voice-grade latency-sensitive synthesis. Use a dedicated TTS pipeline, not a general LLM.

How to get the best Nova Pro pricing in practice

The optimisations that actually move the bill, ranked by impact:

Enable prompt caching on any prompt above 4K tokens with multi-turn use. 60% input savings is the largest single lever.
Route async workloads to batch. 50% off, no quality difference, slight latency cost.
Right-size the model. Nova Lite and Nova Micro are 13x and 22x cheaper respectively. Most production traffic does not need Pro.
Use a workflow orchestrator that supports model routing. Sending the cheap traffic to Nova Lite and the hard traffic to Nova Pro (or to a frontier model) on the same workflow is the difference between a $40K month and a $10K month at meaningful scale.
Negotiate enterprise pricing above $50K/month. AWS will discount Nova Pro 15-30% on committed-use contracts. The discount is rarely automatic; you have to ask.

The bottom line

Amazon Nova Pro at $0.80 input / $3.20 output is genuinely competitive for a broad class of enterprise workloads in 2026. It is not the cheapest option (DeepSeek V4 Flash and Nova Lite both undercut it for high-volume work), and it is not the highest-quality option (Claude Opus 4.7 and Gemini 3.1 Pro both beat it on hard reasoning). It is the AWS-native option, with all the procurement, residency, and audit-trail conveniences that implies, and for a finance team that is already living inside an AWS bill, that integration alone is often worth the small price premium over DeepSeek.

Build your cost model with caching, batch, right-sizing, and the hidden surcharges all included. Skip any one of those four levers and your forecast will be 30-60% off the actual bill. Pull all four and Nova Pro is one of the best price-quality picks in the market.

Want a programmatic way to compare Amazon Nova Pro pricing against the rest of the frontier? Browse the full pricing comparison or run the cost calculator on your specific workload. Or read the related deep-dive: The AI Workflow Marketplace and Buy vs Build in the Age of AI Coding Assistants.

Veröffentlicht inpricing

amazon-nova-pro-pricing amazon-nova aws-bedrock llm-pricing enterprise-ai-cost

Enjoyed this article?

Get more insights on AI and enterprise automation delivered to your inbox.

← Back to all articles