TokenMix Research Lab · 2026-04-10

AWS Bedrock Pricing Guide: Claude, Llama, and Nova Model Costs Explained (2026)
Last Updated: 2026-04-29
Author: TokenMix Research Lab
Bedrock costs 20-35% more than direct APIs on average. Claude priced at parity with Anthropic. Llama 70B 201% premium ($2.65 vs $0.88 Together). Nova exclusives (Micro $0.035, Lite $0.06) are aggressively cheap. Cross-region adds 10%.
AWS Bedrock pricing is the most complex in the AI API market. Between on-demand pricing, provisioned throughput, cross-region inference surcharges (+10%), and model-specific billing quirks, the actual cost of running AI models on Bedrock can be 30-200% higher than direct API access depending on your configuration. TokenMix.ai cost tracking shows enterprises running Claude on Bedrock pay an average of 20-35% more than those using Anthropic's direct API -- and most do not realize it.
This guide breaks down AWS Bedrock pricing for every major model family -- Claude on Bedrock, Llama on Bedrock, Amazon Nova models -- with on-demand vs provisioned comparisons, regional pricing traps, and direct cost comparisons against native APIs.
Table of Contents
- Quick Comparison: Bedrock vs Direct API Pricing
- How AWS Bedrock Pricing Works
- Claude on Bedrock: Pricing Details
- Llama on Bedrock: Pricing Details
- Amazon Nova Models: Pricing Details
- On-Demand vs Provisioned Throughput
- Regional Pricing and Cross-Region Inference
- Cost Analysis: Bedrock vs Direct API
- When Bedrock Pricing Makes Sense
- Which Approach Should You Pick?
- What's the Bottom Line on Bedrock Pricing?
- FAQ
Quick Comparison: Bedrock vs Direct API Pricing
Claude on Bedrock = Claude direct (no premium). Llama on Bedrock = 22-201% premium vs Together. Nova models = Bedrock-exclusive at competitive prices. Cross-region = +10% surcharge.
| Model | AWS Bedrock (On-Demand) | Direct API | Bedrock Premium |
|---|---|---|---|
| Claude 3.5 Sonnet (input/1M) | $3.00 | $3.00 | 0% (same price) |
| Claude 3.5 Sonnet (output/1M) | $15.00 | $15.00 | 0% (same price) |
| Claude 3.5 Haiku (input/1M) | $0.80 | $0.80 | 0% (same price) |
| Llama 3.3 70B (input/1M) | $2.65 | $0.88 (Together) | +201% premium |
| Llama 3.3 8B (input/1M) | $0.22 | $0.18 (Together) | +22% premium |
| Amazon Nova Pro (input/1M) | $0.80 | N/A (Bedrock only) | Bedrock exclusive |
| Amazon Nova Lite (input/1M) | $0.06 | N/A (Bedrock only) | Bedrock exclusive |
Key insight: Claude pricing on Bedrock matches Anthropic's direct API. Llama pricing on Bedrock is 2-3x more expensive than dedicated inference providers. Amazon Nova models are Bedrock exclusives with competitive pricing.
How AWS Bedrock Pricing Works
Three billing models: on-demand (per token, no commitment), provisioned throughput (1-month/6-month commitment, 15-40% off), batch inference (50% off, 24-hour SLA). Plus surcharges: cross-region +10%, GovCloud +20-30%, knowledge bases + guardrails extra.
AWS Bedrock uses three pricing models, and the choice between them significantly impacts your total cost.
1. On-Demand Pricing
Pay per token processed. No commitments. This is how most teams start.
- Billed per 1,000 input tokens and per 1,000 output tokens
- No minimum spend
- Immediate access to all available models
- Subject to default throughput quotas
2. Provisioned Throughput
Reserve dedicated inference capacity for consistent performance.
- Billed per model unit per hour
- 1-month or 6-month commitments
- Guaranteed throughput (no throttling)
- 1-month commitment: ~15-25% cheaper than on-demand at sustained usage
- 6-month commitment: ~30-40% cheaper than on-demand
3. Batch Inference
Process large datasets asynchronously at reduced pricing.
- 50% discount compared to on-demand pricing
- Results delivered within 24 hours
- Available for select models (Claude, Llama, Nova)
- Ideal for evaluation pipelines, data processing, and non-real-time workloads
Billing Components
Beyond per-token charges, Bedrock bills for:
- Model evaluation jobs: Charged per model unit hour
- Knowledge Bases: Storage ($0.023/GB/month) + queries ($0.0035/query)
- Guardrails: Per 1,000 text units processed
- Agent invocations: Per session step
- Cross-region inference: +10% surcharge on all token pricing
Claude on Bedrock: Pricing Details
Same per-token price as Anthropic direct (Sonnet $3/$15, Haiku $0.80/$4, Opus $15/$75). 90% cache discount included. Three reasons to choose Bedrock: AWS ecosystem integration, compliance, consolidated billing. Cost: 50-150ms latency overhead vs direct API.
Claude models are available on Bedrock at the same per-token rates as Anthropic's direct API. This is notable -- Anthropic maintains pricing parity, meaning there is no premium for Bedrock access to Claude.
Claude Model Pricing on Bedrock (April 2026)
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context | Prompt Caching |
|---|---|---|---|---|
| Claude 3.5 Sonnet v2 | $3.00 | $15.00 | 200K | $0.30 input (90% off) |
| Claude 3.5 Haiku | $0.80 | $4.00 | 200K | $0.08 input (90% off) |
| Claude 3 Opus | $15.00 | $75.00 | 200K | $1.50 input (90% off) |
| Claude 4 Sonnet | $3.00 | $15.00 | 200K | $0.30 input (90% off) |
| Claude 4 Opus | $15.00 | $75.00 | 200K | $1.50 input (90% off) |
Why Teams Choose Claude on Bedrock vs Direct API
Same price. So why use Bedrock? Three reasons:
AWS ecosystem integration: IAM roles, VPC endpoints, CloudWatch monitoring, CloudTrail logging. If your infrastructure is on AWS, Bedrock integrates natively.
Compliance and data residency: Bedrock runs in specific AWS regions. For teams with data residency requirements (healthcare, government, financial services), Bedrock provides documented compliance frameworks.
Consolidated billing: One AWS bill for all services. No separate Anthropic billing relationship.
The trade-off: Bedrock adds latency. TokenMix.ai benchmarks show Claude on Bedrock adds 50-150ms additional latency versus direct API access due to the proxy layer. For latency-sensitive applications, this matters.
Cross-Region Inference Surcharge
If you enable cross-region inference on Bedrock (routing requests to the nearest available region for capacity), AWS adds a 10% surcharge to all token pricing. Claude 3.5 Sonnet input goes from $3.00 to $3.30 per million tokens. Output from $15.00 to $16.50.
This is a hidden cost that many teams overlook when enabling cross-region for reliability purposes.
Llama on Bedrock: Pricing Details
Llama 70B = $2.65/M (3x Together's $0.88, 4.5x Groq's $0.59). 8B = $0.22 (22% premium). For regulated AWS-native enterprises, premium buys VPC + IAM + compliance. For everyone else: route Llama to dedicated providers.
Llama on Bedrock is where the pricing premium becomes significant. AWS charges substantially more than dedicated inference providers.
Llama Model Pricing on Bedrock (April 2026)
| Model | Bedrock Input (per 1M) | Bedrock Output (per 1M) | Together AI | Groq |
|---|---|---|---|---|
| Llama 3.3 8B | $0.22 | $0.22 | $0.18 | $0.05 |
| Llama 3.3 70B | $2.65 | $2.65 | $0.88 | $0.59 |
| Llama 4 Scout | $0.35 | $1.00 | $0.18 / $0.59 | $0.11 / $0.34 |
| Llama 4 Maverick | $0.50 | $1.50 | $0.27 / $0.85 | $0.20 / $0.60 |
Llama 3.3 70B on Bedrock costs $2.65/1M tokens versus $0.88 on Together AI -- a 201% premium. For Llama 8B, the gap is smaller (22% premium) but still meaningful at scale.
Why Pay the Llama Premium on Bedrock?
The premium buys AWS infrastructure benefits:
- VPC private endpoints (no internet traffic)
- IAM-based access control
- CloudWatch metrics and logging
- AWS compliance certifications (FedRAMP, HIPAA, SOC 2)
- Guardrails integration for content filtering
- Knowledge Bases for managed RAG
For regulated enterprises that must keep data within AWS, the premium is an infrastructure cost, not a model cost. For everyone else, dedicated inference providers offer the same Llama models at a fraction of the price.
Amazon Nova Models: Pricing Details
Bedrock-exclusive AWS models, aggressively priced: Micro $0.035/$0.14, Lite $0.06/$0.24, Pro $0.80/$3.20, Premier $2.00/$8.00. Nova Pro 33% cheaper than Sonnet but 10-12 MMLU points lower. Nova Micro is one of cheapest production-quality models in market.
Amazon Nova is AWS's first-party model family, available exclusively on Bedrock. Nova models are aggressively priced to drive Bedrock adoption.
Nova Model Pricing (April 2026)
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context | Strength |
|---|---|---|---|---|
| Nova Micro | $0.035 | $0.14 | 128K | Text-only, fastest |
| Nova Lite | $0.06 | $0.24 | 300K | Multimodal, balanced |
| Nova Pro | $0.80 | $3.20 | 300K | Best quality, multimodal |
| Nova Premier | $2.00 | $8.00 | 1M | Largest, complex tasks |
Nova vs Competing Models
| Comparison | Nova Model | Competitor | Price Advantage |
|---|---|---|---|
| Budget text | Nova Micro ($0.035 in) | GPT-4o mini ($0.15 in) | Nova is 77% cheaper |
| Mid-tier | Nova Pro ($0.80 in) | Claude 3.5 Haiku ($0.80 in) | Same price |
| High-end | Nova Premier ($2.00 in) | Claude 3.5 Sonnet ($3.00 in) | Nova is 33% cheaper |
Nova Micro is exceptionally cheap -- $0.035 per million input tokens is among the lowest pricing in the market. For high-volume, simple text tasks (classification, extraction, routing), Nova Micro is hard to beat on cost.
The trade-off: Nova models trail Claude and GPT-4o on quality benchmarks. TokenMix.ai testing shows Nova Pro scoring approximately 78.3 on MMLU versus 88.7 for GPT-4o and 88.3 for Claude 3.5 Sonnet. For applications where cost matters more than peak quality, Nova models are compelling.
On-Demand vs Provisioned Throughput
Provisioned saves 15-40% but requires 1-6 month commitment. Break-even: ~$30-40/day on a single model on-demand. Below = on-demand wins. Even with provisioned, Llama 70B still 127% pricier than Together. Llama 8B reaches parity ($0.17 vs $0.18).
Provisioned Throughput Pricing
Provisioned throughput on Bedrock reserves dedicated model capacity. Pricing is per model unit per hour, with commitment discounts.
| Commitment | Discount vs On-Demand | Minimum |
|---|---|---|
| No commitment (hourly) | 0% (baseline) | 1 hour |
| 1-month commitment | 15-25% savings | 1 model unit |
| 6-month commitment | 30-40% savings | 1 model unit |
When Provisioned Throughput Pays Off
The break-even point depends on model and usage:
Claude 3.5 Sonnet example:
- On-demand at moderate usage (10M tokens/day): ~$45/day = $1,350/month
- Provisioned (1-month commitment): approximately $1,050/month
- Savings: ~22%
At low usage (1M tokens/day):
- On-demand: ~$4.50/day = $135/month
- Provisioned minimum: ~$600-800/month
- Result: On-demand is cheaper
Rule of thumb tracked by TokenMix.ai: Provisioned throughput becomes cost-effective when your daily spend on a single model consistently exceeds $30-40/day on-demand. Below that, on-demand is cheaper.
Provisioned vs Direct API
Even with provisioned throughput discounts, Bedrock Llama pricing remains more expensive than direct inference providers:
| Model | Bedrock Provisioned (est.) | Together AI | Gap |
|---|---|---|---|
| Llama 3.3 70B | ~$2.00/1M tokens | $0.88/1M tokens | +127% |
| Llama 3.3 8B | ~$0.17/1M tokens | $0.18/1M tokens | -6% (parity) |
Only for Llama 8B does Bedrock provisioned pricing approach parity with dedicated providers. For larger models, the premium remains substantial.
Regional Pricing and Cross-Region Inference
US regions = base price. EU +0-5%, Asia-Pacific +0-10%, GovCloud +20-30%. Cross-region inference adds flat 10% surcharge on all token pricing — only enable if regularly capacity-throttled. 50M tokens/month Sonnet: +$90/month for cross-region.
Standard Region Pricing
AWS Bedrock pricing is the same across most US regions (us-east-1, us-west-2). However, some regions carry premium pricing:
| Region | Price Modifier |
|---|---|
| US East / US West | Base price |
| EU (Frankfurt, Ireland) | +0-5% on select models |
| Asia Pacific (Tokyo, Singapore) | +0-10% on select models |
| GovCloud | +20-30% |
Cross-Region Inference (+10%)
Cross-region inference routes requests to the nearest region with available capacity. This improves availability but adds a flat 10% surcharge on all token pricing.
Example impact on monthly costs (Claude 3.5 Sonnet, 50M tokens/month):
- Standard (single region): $150/month (input) + $750/month (output) = $900/month
- Cross-region enabled: $165/month (input) + $825/month (output) = $990/month
- Added cost: $90/month (+10%)
TokenMix.ai recommendation: Enable cross-region only if you experience regular capacity throttling in your primary region. Most workloads do not need it, and the 10% surcharge is pure overhead otherwise.
Global Inference Endpoints
AWS introduced global inference endpoints in 2026, which automatically route to the optimal region. These carry the same 10% cross-region surcharge. Use them for global applications serving users across continents; avoid them for single-region deployments.
Cost Analysis: Bedrock vs Direct API
Mid-size 100M tokens/month mixed: Bedrock $3,200 vs Direct APIs $1,900 (Bedrock 68% pricier). Enterprise 1B/month: Bedrock $22-28K vs direct $14-18K. Hybrid (Claude on Bedrock + Llama elsewhere) = $16-20K. Bedrock premium grows with Llama share.
Small Team (5M tokens/month, Claude 3.5 Sonnet)
| Setup | Monthly Cost | Notes |
|---|---|---|
| Bedrock on-demand | $90 | AWS billing integration |
| Anthropic direct API | $90 | Same price, lower latency |
| TokenMix.ai | $72-81 | Optimized routing |
At small scale with Claude, Bedrock and direct API cost the same. TokenMix.ai can offer 10-20% savings through smart routing.
Mid-Size Team (100M tokens/month, mixed models)
| Setup | Monthly Cost | Notes |
|---|---|---|
| Bedrock (Claude + Llama 70B, on-demand) | $3,200 | Llama premium significant |
| Direct APIs (Anthropic + Together AI) | $1,900 | 40% cheaper |
| Bedrock (Claude) + TokenMix.ai (Llama) | $2,100 | Optimal split |
Enterprise (1B tokens/month, multi-model)
| Setup | Monthly Cost | Notes |
|---|---|---|
| Bedrock provisioned (all models) | $22,000-28,000 | Commitment discounts applied |
| Direct APIs (all providers) | $14,000-18,000 | 35-40% cheaper |
| Hybrid (Bedrock for compliance + TokenMix.ai for cost) | $16,000-20,000 | Balance of cost and compliance |
The pattern is consistent: Bedrock costs 30-60% more than direct API access for equivalent model usage, with the gap largest on Llama models and negligible on Claude.
When Bedrock Pricing Makes Sense
Five scenarios: AWS-native infrastructure (IAM/VPC/CloudWatch saves engineering), compliance (FedRAMP/HIPAA/SOC 2 inherited), Nova exclusives ($0.035/M Micro), managed RAG (Knowledge Bases), Guardrails for content safety. Premium = infrastructure cost not model cost.
Despite the premium, Bedrock is the right choice for specific scenarios:
AWS-native infrastructure: If your stack is 100% AWS, Bedrock's IAM, VPC, and CloudWatch integration saves engineering time worth more than the pricing premium.
Compliance requirements: FedRAMP, HIPAA, SOC 2 -- Bedrock inherits AWS compliance certifications. Getting equivalent compliance from direct API providers requires additional work.
Amazon Nova models: Nova Micro ($0.035/1M input) and Nova Lite ($0.06/1M input) are Bedrock exclusives with excellent price-performance for simple tasks.
Managed RAG (Knowledge Bases): Bedrock Knowledge Bases provide turnkey RAG with vector storage, chunking, and retrieval -- no infrastructure to build.
Guardrails for content safety: Bedrock Guardrails offers built-in content filtering, PII detection, and topic blocking that would require custom development on direct APIs.
Which Approach Should You Pick?
AWS-native + compliance: all Bedrock. AWS-native + cost-sensitive: Bedrock for Claude/Nova + TokenMix.ai for Llama. Cloud-agnostic: skip Bedrock, use TokenMix.ai (30-60% cheaper). Need Nova: must use Bedrock. Llama at scale: skip Bedrock for Together/Groq.
| Your Situation | Recommended Approach | Why |
|---|---|---|
| AWS-native, compliance-required | Bedrock for everything | Infrastructure integration worth the premium |
| AWS-native, cost-sensitive | Bedrock (Claude + Nova) + TokenMix.ai (Llama) | Route Llama to cheaper providers |
| Multi-cloud or cloud-agnostic | Direct APIs via TokenMix.ai | 30-60% cheaper than Bedrock |
| Need Nova models specifically | Bedrock (exclusive) | Only available on Bedrock |
| Budget text processing at scale | Bedrock (Nova Micro) | $0.035/1M is among cheapest available |
| Llama inference at scale | Together AI or Groq | 2-3x cheaper than Bedrock |
| Need managed RAG | Bedrock Knowledge Bases | Turnkey solution, reasonable pricing |
| Global low-latency deployment | Direct APIs | Bedrock cross-region adds 10% surcharge |
Related: Compare all model pricing in our complete LLM API pricing comparison
What's the Bottom Line on Bedrock Pricing?
30-60% premium over direct APIs. Worth it for AWS-native + compliance use cases. Hybrid strategy wins: Claude + Nova on Bedrock for compliance, Llama on Together/Groq via TokenMix.ai for cost. Don't run Llama on Bedrock at scale unless compliance forces it.
AWS Bedrock pricing carries a 30-60% premium over direct API access for most model families, with the notable exception of Claude (priced at parity with Anthropic's API) and Nova models (Bedrock exclusives with competitive pricing).
The premium is not irrational -- it buys AWS infrastructure integration, compliance certifications, and managed services like Knowledge Bases and Guardrails. For AWS-native enterprises with compliance requirements, these features justify the cost.
For cost-conscious teams, the optimal strategy is hybrid: run Claude and Nova on Bedrock for compliance and integration benefits, and route Llama and other open-source model inference through TokenMix.ai to dedicated providers at 2-3x lower cost. TokenMix.ai supports this split with a unified API that works alongside Bedrock, providing automatic failover and cost optimization across providers.
Check real-time AWS Bedrock pricing comparisons and cost calculators at TokenMix.ai.
FAQ
Is Claude cheaper on Bedrock or through Anthropic's direct API?
The per-token pricing is identical. Claude 3.5 Sonnet costs $3.00/$15.00 per million input/output tokens on both Bedrock and Anthropic's API. The difference is in added costs: Bedrock may add latency (50-150ms) and cross-region surcharges (+10% if enabled), while Anthropic's direct API is straightforward per-token billing.
How much does cross-region inference add to AWS Bedrock costs?
Cross-region inference adds a flat 10% surcharge to all token pricing. For a team spending $1,000/month on model inference, that is $100/month in additional costs. Only enable cross-region if you regularly experience capacity throttling in your primary region.
Are Amazon Nova models good enough for production use?
Nova models score 10-12% lower than Claude 3.5 Sonnet and GPT-4o on general benchmarks like MMLU. However, for specific tasks like text classification, data extraction, and simple Q&A, Nova Micro ($0.035/1M tokens) and Nova Lite ($0.06/1M tokens) offer excellent price-performance. TokenMix.ai testing shows Nova Pro is competitive with Claude 3.5 Haiku for mid-complexity tasks at the same price point.
When should I use Bedrock provisioned throughput vs on-demand?
Provisioned throughput saves 15-40% but requires commitment. It becomes cost-effective when your daily spend on a single model exceeds approximately $30-40/day on-demand. Below that volume, on-demand is cheaper because provisioned has a minimum hourly charge regardless of usage.
Can I use both Bedrock and direct APIs through TokenMix.ai?
Yes. TokenMix.ai provides a unified API layer that can route requests to Bedrock, Anthropic direct API, Together AI, Groq, and other providers based on cost, latency, or availability preferences. This enables hybrid strategies where compliance-sensitive workloads run on Bedrock while cost-sensitive workloads route to cheaper providers.
Why is Llama so much more expensive on Bedrock than Together AI?
AWS charges a managed service premium for running open-source models on Bedrock infrastructure. Llama 3.3 70B costs $2.65/1M tokens on Bedrock versus $0.88 on Together AI -- a 201% premium. The premium covers AWS infrastructure (VPC, IAM, compliance), managed serving, and Bedrock platform features. Dedicated inference providers compete primarily on price and optimize specifically for inference cost efficiency.
Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: AWS Bedrock Pricing, Anthropic Pricing, Together AI Pricing + TokenMix.ai