AWS Bedrock Pricing Guide: Claude, Llama, and Nova Model Costs Explained (2026)
AWS Bedrock pricing is the most complex in the AI API market. Between on-demand pricing, provisioned throughput, cross-region inference surcharges (+10%), and model-specific billing quirks, the actual cost of running AI models on Bedrock can be 30-200% higher than direct API access depending on your configuration. TokenMix.ai cost tracking shows enterprises running Claude on Bedrock pay an average of 20-35% more than those using Anthropic's direct API -- and most do not realize it.
This guide breaks down AWS Bedrock pricing for every major model family -- Claude on Bedrock, Llama on Bedrock, Amazon Nova models -- with on-demand vs provisioned comparisons, regional pricing traps, and direct cost comparisons against native APIs.
Table of Contents
[Quick Comparison: Bedrock vs Direct API Pricing]
[How AWS Bedrock Pricing Works]
[Claude on Bedrock: Pricing Details]
[Llama on Bedrock: Pricing Details]
[Amazon Nova Models: Pricing Details]
[On-Demand vs Provisioned Throughput]
[Regional Pricing and Cross-Region Inference]
[Cost Analysis: Bedrock vs Direct API]
[When Bedrock Pricing Makes Sense]
[How to Choose: Decision Guide]
[Conclusion]
[FAQ]
Quick Comparison: Bedrock vs Direct API Pricing
Model
AWS Bedrock (On-Demand)
Direct API
Bedrock Premium
Claude 3.5 Sonnet (input/1M)
$3.00
$3.00
0% (same price)
Claude 3.5 Sonnet (output/1M)
5.00
5.00
0% (same price)
Claude 3.5 Haiku (input/1M)
$0.80
$0.80
0% (same price)
Llama 3.3 70B (input/1M)
$2.65
$0.88 (Together)
+201% premium
Llama 3.3 8B (input/1M)
$0.22
$0.18 (Together)
+22% premium
Amazon Nova Pro (input/1M)
$0.80
N/A (Bedrock only)
Bedrock exclusive
Amazon Nova Lite (input/1M)
$0.06
N/A (Bedrock only)
Bedrock exclusive
Key insight: Claude pricing on Bedrock matches Anthropic's direct API. Llama pricing on Bedrock is 2-3x more expensive than dedicated inference providers. Amazon Nova models are Bedrock exclusives with competitive pricing.
How AWS Bedrock Pricing Works
AWS Bedrock uses three pricing models, and the choice between them significantly impacts your total cost.
1. On-Demand Pricing
Pay per token processed. No commitments. This is how most teams start.
Billed per 1,000 input tokens and per 1,000 output tokens
No minimum spend
Immediate access to all available models
Subject to default throughput quotas
2. Provisioned Throughput
Reserve dedicated inference capacity for consistent performance.
Billed per model unit per hour
1-month or 6-month commitments
Guaranteed throughput (no throttling)
1-month commitment: ~15-25% cheaper than on-demand at sustained usage
6-month commitment: ~30-40% cheaper than on-demand
3. Batch Inference
Process large datasets asynchronously at reduced pricing.
50% discount compared to on-demand pricing
Results delivered within 24 hours
Available for select models (Claude, Llama, Nova)
Ideal for evaluation pipelines, data processing, and non-real-time workloads
Billing Components
Beyond per-token charges, Bedrock bills for:
Model evaluation jobs: Charged per model unit hour
Cross-region inference: +10% surcharge on all token pricing
Claude on Bedrock: Pricing Details
Claude models are available on Bedrock at the same per-token rates as Anthropic's direct API. This is notable -- Anthropic maintains pricing parity, meaning there is no premium for Bedrock access to Claude.
Claude Model Pricing on Bedrock (April 2026)
Model
Input (per 1M tokens)
Output (per 1M tokens)
Context
Prompt Caching
Claude 3.5 Sonnet v2
$3.00
5.00
200K
$0.30 input (90% off)
Claude 3.5 Haiku
$0.80
$4.00
200K
$0.08 input (90% off)
Claude 3 Opus
5.00
$75.00
200K
.50 input (90% off)
Claude 4 Sonnet
$3.00
5.00
200K
$0.30 input (90% off)
Claude 4 Opus
5.00
$75.00
200K
.50 input (90% off)
Why Teams Choose Claude on Bedrock vs Direct API
Same price. So why use Bedrock? Three reasons:
AWS ecosystem integration: IAM roles, VPC endpoints, CloudWatch monitoring, CloudTrail logging. If your infrastructure is on AWS, Bedrock integrates natively.
Compliance and data residency: Bedrock runs in specific AWS regions. For teams with data residency requirements (healthcare, government, financial services), Bedrock provides documented compliance frameworks.
Consolidated billing: One AWS bill for all services. No separate Anthropic billing relationship.
The trade-off: Bedrock adds latency. TokenMix.ai benchmarks show Claude on Bedrock adds 50-150ms additional latency versus direct API access due to the proxy layer. For latency-sensitive applications, this matters.
Cross-Region Inference Surcharge
If you enable cross-region inference on Bedrock (routing requests to the nearest available region for capacity), AWS adds a 10% surcharge to all token pricing. Claude 3.5 Sonnet input goes from $3.00 to $3.30 per million tokens. Output from
5.00 to
6.50.
This is a hidden cost that many teams overlook when enabling cross-region for reliability purposes.
Llama on Bedrock: Pricing Details
Llama on Bedrock is where the pricing premium becomes significant. AWS charges substantially more than dedicated inference providers.
Llama Model Pricing on Bedrock (April 2026)
Model
Bedrock Input (per 1M)
Bedrock Output (per 1M)
Together AI
Groq
Llama 3.3 8B
$0.22
$0.22
$0.18
$0.05
Llama 3.3 70B
$2.65
$2.65
$0.88
$0.59
Llama 4 Scout
$0.35
.00
$0.18 / $0.59
$0.11 / $0.34
Llama 4 Maverick
$0.50
.50
$0.27 / $0.85
$0.20 / $0.60
Llama 3.3 70B on Bedrock costs $2.65/1M tokens versus $0.88 on Together AI -- a 201% premium. For Llama 8B, the gap is smaller (22% premium) but still meaningful at scale.
For regulated enterprises that must keep data within AWS, the premium is an infrastructure cost, not a model cost. For everyone else, dedicated inference providers offer the same Llama models at a fraction of the price.
Amazon Nova Models: Pricing Details
Amazon Nova is AWS's first-party model family, available exclusively on Bedrock. Nova models are aggressively priced to drive Bedrock adoption.
Nova Model Pricing (April 2026)
Model
Input (per 1M tokens)
Output (per 1M tokens)
Context
Strength
Nova Micro
$0.035
$0.14
128K
Text-only, fastest
Nova Lite
$0.06
$0.24
300K
Multimodal, balanced
Nova Pro
$0.80
$3.20
300K
Best quality, multimodal
Nova Premier
$2.00
$8.00
1M
Largest, complex tasks
Nova vs Competing Models
Comparison
Nova Model
Competitor
Price Advantage
Budget text
Nova Micro ($0.035 in)
GPT-4o mini ($0.15 in)
Nova is 77% cheaper
Mid-tier
Nova Pro ($0.80 in)
Claude 3.5 Haiku ($0.80 in)
Same price
High-end
Nova Premier ($2.00 in)
Claude 3.5 Sonnet ($3.00 in)
Nova is 33% cheaper
Nova Micro is exceptionally cheap -- $0.035 per million input tokens is among the lowest pricing in the market. For high-volume, simple text tasks (classification, extraction, routing), Nova Micro is hard to beat on cost.
The trade-off: Nova models trail Claude and GPT-4o on quality benchmarks. TokenMix.ai testing shows Nova Pro scoring approximately 78.3 on MMLU versus 88.7 for GPT-4o and 88.3 for Claude 3.5 Sonnet. For applications where cost matters more than peak quality, Nova models are compelling.
On-Demand vs Provisioned Throughput
Provisioned Throughput Pricing
Provisioned throughput on Bedrock reserves dedicated model capacity. Pricing is per model unit per hour, with commitment discounts.
Commitment
Discount vs On-Demand
Minimum
No commitment (hourly)
0% (baseline)
1 hour
1-month commitment
15-25% savings
1 model unit
6-month commitment
30-40% savings
1 model unit
When Provisioned Throughput Pays Off
The break-even point depends on model and usage:
Claude 3.5 Sonnet example:
On-demand at moderate usage (10M tokens/day): ~$45/day =
,350/month
Provisioned (1-month commitment): approximately
,050/month
Savings: ~22%
At low usage (1M tokens/day):
On-demand: ~$4.50/day =
35/month
Provisioned minimum: ~$600-800/month
Result: On-demand is cheaper
Rule of thumb tracked by TokenMix.ai: Provisioned throughput becomes cost-effective when your daily spend on a single model consistently exceeds $30-40/day on-demand. Below that, on-demand is cheaper.
Provisioned vs Direct API
Even with provisioned throughput discounts, Bedrock Llama pricing remains more expensive than direct inference providers:
Model
Bedrock Provisioned (est.)
Together AI
Gap
Llama 3.3 70B
~$2.00/1M tokens
$0.88/1M tokens
+127%
Llama 3.3 8B
~$0.17/1M tokens
$0.18/1M tokens
-6% (parity)
Only for Llama 8B does Bedrock provisioned pricing approach parity with dedicated providers. For larger models, the premium remains substantial.
Regional Pricing and Cross-Region Inference
Standard Region Pricing
AWS Bedrock pricing is the same across most US regions (us-east-1, us-west-2). However, some regions carry premium pricing:
Region
Price Modifier
US East / US West
Base price
EU (Frankfurt, Ireland)
+0-5% on select models
Asia Pacific (Tokyo, Singapore)
+0-10% on select models
GovCloud
+20-30%
Cross-Region Inference (+10%)
Cross-region inference routes requests to the nearest region with available capacity. This improves availability but adds a flat 10% surcharge on all token pricing.
Example impact on monthly costs (Claude 3.5 Sonnet, 50M tokens/month):
Standard (single region):
50/month (input) + $750/month (output) = $900/month
TokenMix.ai recommendation: Enable cross-region only if you experience regular capacity throttling in your primary region. Most workloads do not need it, and the 10% surcharge is pure overhead otherwise.
Global Inference Endpoints
AWS introduced global inference endpoints in 2026, which automatically route to the optimal region. These carry the same 10% cross-region surcharge. Use them for global applications serving users across continents; avoid them for single-region deployments.
Cost Analysis: Bedrock vs Direct API
Small Team (5M tokens/month, Claude 3.5 Sonnet)
Setup
Monthly Cost
Notes
Bedrock on-demand
$90
AWS billing integration
Anthropic direct API
$90
Same price, lower latency
TokenMix.ai
$72-81
Optimized routing
At small scale with Claude, Bedrock and direct API cost the same. TokenMix.ai can offer 10-20% savings through smart routing.
Mid-Size Team (100M tokens/month, mixed models)
Setup
Monthly Cost
Notes
Bedrock (Claude + Llama 70B, on-demand)
$3,200
Llama premium significant
Direct APIs (Anthropic + Together AI)
,900
40% cheaper
Bedrock (Claude) + TokenMix.ai (Llama)
$2,100
Optimal split
Enterprise (1B tokens/month, multi-model)
Setup
Monthly Cost
Notes
Bedrock provisioned (all models)
$22,000-28,000
Commitment discounts applied
Direct APIs (all providers)
4,000-18,000
35-40% cheaper
Hybrid (Bedrock for compliance + TokenMix.ai for cost)
6,000-20,000
Balance of cost and compliance
The pattern is consistent: Bedrock costs 30-60% more than direct API access for equivalent model usage, with the gap largest on Llama models and negligible on Claude.
When Bedrock Pricing Makes Sense
Despite the premium, Bedrock is the right choice for specific scenarios:
AWS-native infrastructure: If your stack is 100% AWS, Bedrock's IAM, VPC, and CloudWatch integration saves engineering time worth more than the pricing premium.
Compliance requirements: FedRAMP, HIPAA, SOC 2 -- Bedrock inherits AWS compliance certifications. Getting equivalent compliance from direct API providers requires additional work.
Amazon Nova models: Nova Micro ($0.035/1M input) and Nova Lite ($0.06/1M input) are Bedrock exclusives with excellent price-performance for simple tasks.
Managed RAG (Knowledge Bases): Bedrock Knowledge Bases provide turnkey RAG with vector storage, chunking, and retrieval -- no infrastructure to build.
Guardrails for content safety: Bedrock Guardrails offers built-in content filtering, PII detection, and topic blocking that would require custom development on direct APIs.
AWS Bedrock pricing carries a 30-60% premium over direct API access for most model families, with the notable exception of Claude (priced at parity with Anthropic's API) and Nova models (Bedrock exclusives with competitive pricing).
The premium is not irrational -- it buys AWS infrastructure integration, compliance certifications, and managed services like Knowledge Bases and Guardrails. For AWS-native enterprises with compliance requirements, these features justify the cost.
For cost-conscious teams, the optimal strategy is hybrid: run Claude and Nova on Bedrock for compliance and integration benefits, and route Llama and other open-source model inference through TokenMix.ai to dedicated providers at 2-3x lower cost. TokenMix.ai supports this split with a unified API that works alongside Bedrock, providing automatic failover and cost optimization across providers.
Check real-time AWS Bedrock pricing comparisons and cost calculators at TokenMix.ai.
FAQ
Is Claude cheaper on Bedrock or through Anthropic's direct API?
The per-token pricing is identical. Claude 3.5 Sonnet costs $3.00/
5.00 per million input/output tokens on both Bedrock and Anthropic's API. The difference is in added costs: Bedrock may add latency (50-150ms) and cross-region surcharges (+10% if enabled), while Anthropic's direct API is straightforward per-token billing.
How much does cross-region inference add to AWS Bedrock costs?
Cross-region inference adds a flat 10% surcharge to all token pricing. For a team spending
,000/month on model inference, that is
00/month in additional costs. Only enable cross-region if you regularly experience capacity throttling in your primary region.
Are Amazon Nova models good enough for production use?
Nova models score 10-12% lower than Claude 3.5 Sonnet and GPT-4o on general benchmarks like MMLU. However, for specific tasks like text classification, data extraction, and simple Q&A, Nova Micro ($0.035/1M tokens) and Nova Lite ($0.06/1M tokens) offer excellent price-performance. TokenMix.ai testing shows Nova Pro is competitive with Claude 3.5 Haiku for mid-complexity tasks at the same price point.
When should I use Bedrock provisioned throughput vs on-demand?
Provisioned throughput saves 15-40% but requires commitment. It becomes cost-effective when your daily spend on a single model exceeds approximately $30-40/day on-demand. Below that volume, on-demand is cheaper because provisioned has a minimum hourly charge regardless of usage.
Can I use both Bedrock and direct APIs through TokenMix.ai?
Yes. TokenMix.ai provides a unified API layer that can route requests to Bedrock, Anthropic direct API, Together AI, Groq, and other providers based on cost, latency, or availability preferences. This enables hybrid strategies where compliance-sensitive workloads run on Bedrock while cost-sensitive workloads route to cheaper providers.
Why is Llama so much more expensive on Bedrock than Together AI?
AWS charges a managed service premium for running open-source models on Bedrock infrastructure. Llama 3.3 70B costs $2.65/1M tokens on Bedrock versus $0.88 on Together AI -- a 201% premium. The premium covers AWS infrastructure (VPC, IAM, compliance), managed serving, and Bedrock platform features. Dedicated inference providers compete primarily on price and optimize specifically for inference cost efficiency.