TokenMix Research Lab · 2026-04-10

AWS Bedrock Pricing 2026: Claude, Llama, and Nova on Bedrock — On-Demand vs Provisioned Costs

AWS Bedrock Pricing Guide: Claude, Llama, and Nova Model Costs Explained (2026)

AWS Bedrock pricing is the most complex in the AI API market. Between on-demand pricing, provisioned throughput, cross-region inference surcharges (+10%), and model-specific billing quirks, the actual cost of running AI models on Bedrock can be 30-200% higher than direct API access depending on your configuration. TokenMix.ai cost tracking shows enterprises running Claude on Bedrock pay an average of 20-35% more than those using Anthropic's direct API -- and most do not realize it.

This guide breaks down AWS Bedrock pricing for every major model family -- Claude on Bedrock, Llama on Bedrock, Amazon Nova models -- with on-demand vs provisioned comparisons, regional pricing traps, and direct cost comparisons against native APIs.

[Quick Comparison: Bedrock vs Direct API Pricing]
[How AWS Bedrock Pricing Works]
[Claude on Bedrock: Pricing Details]
[Llama on Bedrock: Pricing Details]
[Amazon Nova Models: Pricing Details]
[On-Demand vs Provisioned Throughput]
[Regional Pricing and Cross-Region Inference]
[Cost Analysis: Bedrock vs Direct API]
[When Bedrock Pricing Makes Sense]
[How to Choose: Decision Guide]
[Conclusion]
[FAQ]

Quick Comparison: Bedrock vs Direct API Pricing

Model	AWS Bedrock (On-Demand)	Direct API	Bedrock Premium
Claude 3.5 Sonnet (input/1M)	$3.00	$3.00	0% (same price)
Claude 3.5 Sonnet (output/1M)	5.00	5.00	0% (same price)
Claude 3.5 Haiku (input/1M)	$0.80	$0.80	0% (same price)
Llama 3.3 70B (input/1M)	$2.65	$0.88 (Together)	+201% premium
Llama 3.3 8B (input/1M)	$0.22	$0.18 (Together)	+22% premium
Amazon Nova Pro (input/1M)	$0.80	N/A (Bedrock only)	Bedrock exclusive
Amazon Nova Lite (input/1M)	$0.06	N/A (Bedrock only)	Bedrock exclusive

Key insight: Claude pricing on Bedrock matches Anthropic's direct API. Llama pricing on Bedrock is 2-3x more expensive than dedicated inference providers. Amazon Nova models are Bedrock exclusives with competitive pricing.

How AWS Bedrock Pricing Works

AWS Bedrock uses three pricing models, and the choice between them significantly impacts your total cost.

1. On-Demand Pricing

Pay per token processed. No commitments. This is how most teams start.

Billed per 1,000 input tokens and per 1,000 output tokens
No minimum spend
Immediate access to all available models
Subject to default throughput quotas

2. Provisioned Throughput

Reserve dedicated inference capacity for consistent performance.

Billed per model unit per hour
1-month or 6-month commitments
Guaranteed throughput (no throttling)
1-month commitment: ~15-25% cheaper than on-demand at sustained usage
6-month commitment: ~30-40% cheaper than on-demand

3. Batch Inference

Process large datasets asynchronously at reduced pricing.

50% discount compared to on-demand pricing
Results delivered within 24 hours
Available for select models (Claude, Llama, Nova)
Ideal for evaluation pipelines, data processing, and non-real-time workloads

Billing Components

Beyond per-token charges, Bedrock bills for:

Model evaluation jobs: Charged per model unit hour
Knowledge Bases: Storage ($0.023/GB/month) + queries ($0.0035/query)
Guardrails: Per 1,000 text units processed
Agent invocations: Per session step
Cross-region inference: +10% surcharge on all token pricing

Claude on Bedrock: Pricing Details

Claude models are available on Bedrock at the same per-token rates as Anthropic's direct API. This is notable -- Anthropic maintains pricing parity, meaning there is no premium for Bedrock access to Claude.

Claude Model Pricing on Bedrock (April 2026)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context	Prompt Caching
Claude 3.5 Sonnet v2	$3.00	5.00	200K	$0.30 input (90% off)
Claude 3.5 Haiku	$0.80	$4.00	200K	$0.08 input (90% off)
Claude 3 Opus	5.00	$75.00	200K	.50 input (90% off)
Claude 4 Sonnet	$3.00	5.00	200K	$0.30 input (90% off)
Claude 4 Opus	5.00	$75.00	200K	.50 input (90% off)

Why Teams Choose Claude on Bedrock vs Direct API

Same price. So why use Bedrock? Three reasons:

AWS ecosystem integration: IAM roles, VPC endpoints, CloudWatch monitoring, CloudTrail logging. If your infrastructure is on AWS, Bedrock integrates natively.
Compliance and data residency: Bedrock runs in specific AWS regions. For teams with data residency requirements (healthcare, government, financial services), Bedrock provides documented compliance frameworks.
Consolidated billing: One AWS bill for all services. No separate Anthropic billing relationship.

The trade-off: Bedrock adds latency. TokenMix.ai benchmarks show Claude on Bedrock adds 50-150ms additional latency versus direct API access due to the proxy layer. For latency-sensitive applications, this matters.

Cross-Region Inference Surcharge

If you enable cross-region inference on Bedrock (routing requests to the nearest available region for capacity), AWS adds a 10% surcharge to all token pricing. Claude 3.5 Sonnet input goes from $3.00 to $3.30 per million tokens. Output from 5.00 to 6.50.

This is a hidden cost that many teams overlook when enabling cross-region for reliability purposes.

Llama on Bedrock: Pricing Details

Llama on Bedrock is where the pricing premium becomes significant. AWS charges substantially more than dedicated inference providers.

Llama Model Pricing on Bedrock (April 2026)

Model	Bedrock Input (per 1M)	Bedrock Output (per 1M)	Together AI	Groq
Llama 3.3 8B	$0.22	$0.22	$0.18	$0.05
Llama 3.3 70B	$2.65	$2.65	$0.88	$0.59
Llama 4 Scout	$0.35	.00	$0.18 / $0.59	$0.11 / $0.34
Llama 4 Maverick	$0.50	.50	$0.27 / $0.85	$0.20 / $0.60

Llama 3.3 70B on Bedrock costs $2.65/1M tokens versus $0.88 on Together AI -- a 201% premium. For Llama 8B, the gap is smaller (22% premium) but still meaningful at scale.

Why Pay the Llama Premium on Bedrock?

The premium buys AWS infrastructure benefits:

VPC private endpoints (no internet traffic)
IAM-based access control
CloudWatch metrics and logging
AWS compliance certifications (FedRAMP, HIPAA, SOC 2)
Guardrails integration for content filtering
Knowledge Bases for managed RAG

For regulated enterprises that must keep data within AWS, the premium is an infrastructure cost, not a model cost. For everyone else, dedicated inference providers offer the same Llama models at a fraction of the price.

Amazon Nova Models: Pricing Details

Amazon Nova is AWS's first-party model family, available exclusively on Bedrock. Nova models are aggressively priced to drive Bedrock adoption.

Nova Model Pricing (April 2026)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context	Strength
Nova Micro	$0.035	$0.14	128K	Text-only, fastest
Nova Lite	$0.06	$0.24	300K	Multimodal, balanced
Nova Pro	$0.80	$3.20	300K	Best quality, multimodal
Nova Premier	$2.00	$8.00	1M	Largest, complex tasks

Nova vs Competing Models

Comparison	Nova Model	Competitor	Price Advantage
Budget text	Nova Micro ($0.035 in)	GPT-4o mini ($0.15 in)	Nova is 77% cheaper
Mid-tier	Nova Pro ($0.80 in)	Claude 3.5 Haiku ($0.80 in)	Same price
High-end	Nova Premier ($2.00 in)	Claude 3.5 Sonnet ($3.00 in)	Nova is 33% cheaper

Nova Micro is exceptionally cheap -- $0.035 per million input tokens is among the lowest pricing in the market. For high-volume, simple text tasks (classification, extraction, routing), Nova Micro is hard to beat on cost.

The trade-off: Nova models trail Claude and GPT-4o on quality benchmarks. TokenMix.ai testing shows Nova Pro scoring approximately 78.3 on MMLU versus 88.7 for GPT-4o and 88.3 for Claude 3.5 Sonnet. For applications where cost matters more than peak quality, Nova models are compelling.

On-Demand vs Provisioned Throughput

Provisioned Throughput Pricing

Provisioned throughput on Bedrock reserves dedicated model capacity. Pricing is per model unit per hour, with commitment discounts.

Commitment	Discount vs On-Demand	Minimum
No commitment (hourly)	0% (baseline)	1 hour
1-month commitment	15-25% savings	1 model unit
6-month commitment	30-40% savings	1 model unit

When Provisioned Throughput Pays Off

The break-even point depends on model and usage:

Claude 3.5 Sonnet example:

On-demand at moderate usage (10M tokens/day): ~$45/day = ,350/month
Provisioned (1-month commitment): approximately ,050/month
Savings: ~22%

At low usage (1M tokens/day):

On-demand: ~$4.50/day = 35/month
Provisioned minimum: ~$600-800/month
Result: On-demand is cheaper

Rule of thumb tracked by TokenMix.ai: Provisioned throughput becomes cost-effective when your daily spend on a single model consistently exceeds $30-40/day on-demand. Below that, on-demand is cheaper.

Provisioned vs Direct API

Even with provisioned throughput discounts, Bedrock Llama pricing remains more expensive than direct inference providers:

Model	Bedrock Provisioned (est.)	Together AI	Gap
Llama 3.3 70B	~$2.00/1M tokens	$0.88/1M tokens	+127%
Llama 3.3 8B	~$0.17/1M tokens	$0.18/1M tokens	-6% (parity)

Only for Llama 8B does Bedrock provisioned pricing approach parity with dedicated providers. For larger models, the premium remains substantial.

Regional Pricing and Cross-Region Inference

Standard Region Pricing

AWS Bedrock pricing is the same across most US regions (us-east-1, us-west-2). However, some regions carry premium pricing:

Region	Price Modifier
US East / US West	Base price
EU (Frankfurt, Ireland)	+0-5% on select models
Asia Pacific (Tokyo, Singapore)	+0-10% on select models
GovCloud	+20-30%

Cross-Region Inference (+10%)

Cross-region inference routes requests to the nearest region with available capacity. This improves availability but adds a flat 10% surcharge on all token pricing.

Example impact on monthly costs (Claude 3.5 Sonnet, 50M tokens/month):

Standard (single region): 50/month (input) + $750/month (output) = $900/month
Cross-region enabled: 65/month (input) + $825/month (output) = $990/month
Added cost: $90/month (+10%)

TokenMix.ai recommendation: Enable cross-region only if you experience regular capacity throttling in your primary region. Most workloads do not need it, and the 10% surcharge is pure overhead otherwise.

Global Inference Endpoints

AWS introduced global inference endpoints in 2026, which automatically route to the optimal region. These carry the same 10% cross-region surcharge. Use them for global applications serving users across continents; avoid them for single-region deployments.

Cost Analysis: Bedrock vs Direct API

Small Team (5M tokens/month, Claude 3.5 Sonnet)

Setup	Monthly Cost	Notes
Bedrock on-demand	$90	AWS billing integration
Anthropic direct API	$90	Same price, lower latency
TokenMix.ai	$72-81	Optimized routing

At small scale with Claude, Bedrock and direct API cost the same. TokenMix.ai can offer 10-20% savings through smart routing.

Mid-Size Team (100M tokens/month, mixed models)

Setup	Monthly Cost	Notes
Bedrock (Claude + Llama 70B, on-demand)	$3,200	Llama premium significant
Direct APIs (Anthropic + Together AI)	,900	40% cheaper
Bedrock (Claude) + TokenMix.ai (Llama)	$2,100	Optimal split

Enterprise (1B tokens/month, multi-model)

Setup	Monthly Cost	Notes
Bedrock provisioned (all models)	$22,000-28,000	Commitment discounts applied
Direct APIs (all providers)	4,000-18,000	35-40% cheaper
Hybrid (Bedrock for compliance + TokenMix.ai for cost)	6,000-20,000	Balance of cost and compliance

The pattern is consistent: Bedrock costs 30-60% more than direct API access for equivalent model usage, with the gap largest on Llama models and negligible on Claude.

When Bedrock Pricing Makes Sense

Despite the premium, Bedrock is the right choice for specific scenarios:

AWS-native infrastructure: If your stack is 100% AWS, Bedrock's IAM, VPC, and CloudWatch integration saves engineering time worth more than the pricing premium.
Compliance requirements: FedRAMP, HIPAA, SOC 2 -- Bedrock inherits AWS compliance certifications. Getting equivalent compliance from direct API providers requires additional work.
Amazon Nova models: Nova Micro ($0.035/1M input) and Nova Lite ($0.06/1M input) are Bedrock exclusives with excellent price-performance for simple tasks.
Managed RAG (Knowledge Bases): Bedrock Knowledge Bases provide turnkey RAG with vector storage, chunking, and retrieval -- no infrastructure to build.
Guardrails for content safety: Bedrock Guardrails offers built-in content filtering, PII detection, and topic blocking that would require custom development on direct APIs.

How to Choose: Decision Guide

Your Situation	Recommended Approach	Why
AWS-native, compliance-required	Bedrock for everything	Infrastructure integration worth the premium
AWS-native, cost-sensitive	Bedrock (Claude + Nova) + TokenMix.ai (Llama)	Route Llama to cheaper providers
Multi-cloud or cloud-agnostic	Direct APIs via TokenMix.ai	30-60% cheaper than Bedrock
Need Nova models specifically	Bedrock (exclusive)	Only available on Bedrock
Budget text processing at scale	Bedrock (Nova Micro)	$0.035/1M is among cheapest available
Llama inference at scale	Together AI or Groq	2-3x cheaper than Bedrock
Need managed RAG	Bedrock Knowledge Bases	Turnkey solution, reasonable pricing
Global low-latency deployment	Direct APIs	Bedrock cross-region adds 10% surcharge

Conclusion

AWS Bedrock pricing carries a 30-60% premium over direct API access for most model families, with the notable exception of Claude (priced at parity with Anthropic's API) and Nova models (Bedrock exclusives with competitive pricing).

The premium is not irrational -- it buys AWS infrastructure integration, compliance certifications, and managed services like Knowledge Bases and Guardrails. For AWS-native enterprises with compliance requirements, these features justify the cost.

For cost-conscious teams, the optimal strategy is hybrid: run Claude and Nova on Bedrock for compliance and integration benefits, and route Llama and other open-source model inference through TokenMix.ai to dedicated providers at 2-3x lower cost. TokenMix.ai supports this split with a unified API that works alongside Bedrock, providing automatic failover and cost optimization across providers.

Check real-time AWS Bedrock pricing comparisons and cost calculators at TokenMix.ai.

FAQ

Is Claude cheaper on Bedrock or through Anthropic's direct API?

The per-token pricing is identical. Claude 3.5 Sonnet costs $3.00/ 5.00 per million input/output tokens on both Bedrock and Anthropic's API. The difference is in added costs: Bedrock may add latency (50-150ms) and cross-region surcharges (+10% if enabled), while Anthropic's direct API is straightforward per-token billing.

How much does cross-region inference add to AWS Bedrock costs?

Cross-region inference adds a flat 10% surcharge to all token pricing. For a team spending ,000/month on model inference, that is 00/month in additional costs. Only enable cross-region if you regularly experience capacity throttling in your primary region.

Are Amazon Nova models good enough for production use?

Nova models score 10-12% lower than Claude 3.5 Sonnet and GPT-4o on general benchmarks like MMLU. However, for specific tasks like text classification, data extraction, and simple Q&A, Nova Micro ($0.035/1M tokens) and Nova Lite ($0.06/1M tokens) offer excellent price-performance. TokenMix.ai testing shows Nova Pro is competitive with Claude 3.5 Haiku for mid-complexity tasks at the same price point.

When should I use Bedrock provisioned throughput vs on-demand?

Provisioned throughput saves 15-40% but requires commitment. It becomes cost-effective when your daily spend on a single model exceeds approximately $30-40/day on-demand. Below that volume, on-demand is cheaper because provisioned has a minimum hourly charge regardless of usage.

Can I use both Bedrock and direct APIs through TokenMix.ai?

Yes. TokenMix.ai provides a unified API layer that can route requests to Bedrock, Anthropic direct API, Together AI, Groq, and other providers based on cost, latency, or availability preferences. This enables hybrid strategies where compliance-sensitive workloads run on Bedrock while cost-sensitive workloads route to cheaper providers.

Why is Llama so much more expensive on Bedrock than Together AI?

AWS charges a managed service premium for running open-source models on Bedrock infrastructure. Llama 3.3 70B costs $2.65/1M tokens on Bedrock versus $0.88 on Together AI -- a 201% premium. The premium covers AWS infrastructure (VPC, IAM, compliance), managed serving, and Bedrock platform features. Dedicated inference providers compete primarily on price and optimize specifically for inference cost efficiency.

Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: AWS Bedrock Pricing, Anthropic Pricing, Together AI Pricing + TokenMix.ai