TokenMix Research Lab · 2026-04-10

AWS Bedrock Pricing 2026: Claude, Llama, and Nova on Bedrock — On-Demand vs Provisioned Costs

AWS Bedrock Pricing Guide: Claude, Llama, and Nova Model Costs Explained (2026)

AWS Bedrock pricing is the most complex in the AI API market. Between on-demand pricing, provisioned throughput, cross-region inference surcharges (+10%), and model-specific billing quirks, the actual cost of running AI models on Bedrock can be 30-200% higher than direct API access depending on your configuration. TokenMix.ai cost tracking shows enterprises running Claude on Bedrock pay an average of 20-35% more than those using Anthropic's direct API -- and most do not realize it.

This guide breaks down AWS Bedrock pricing for every major model family -- Claude on Bedrock, Llama on Bedrock, Amazon Nova models -- with on-demand vs provisioned comparisons, regional pricing traps, and direct cost comparisons against native APIs.

Table of Contents


Quick Comparison: Bedrock vs Direct API Pricing

Model AWS Bedrock (On-Demand) Direct API Bedrock Premium
Claude 3.5 Sonnet (input/1M) $3.00 $3.00 0% (same price)
Claude 3.5 Sonnet (output/1M) 5.00 5.00 0% (same price)
Claude 3.5 Haiku (input/1M) $0.80 $0.80 0% (same price)
Llama 3.3 70B (input/1M) $2.65 $0.88 (Together) +201% premium
Llama 3.3 8B (input/1M) $0.22 $0.18 (Together) +22% premium
Amazon Nova Pro (input/1M) $0.80 N/A (Bedrock only) Bedrock exclusive
Amazon Nova Lite (input/1M) $0.06 N/A (Bedrock only) Bedrock exclusive

Key insight: Claude pricing on Bedrock matches Anthropic's direct API. Llama pricing on Bedrock is 2-3x more expensive than dedicated inference providers. Amazon Nova models are Bedrock exclusives with competitive pricing.

How AWS Bedrock Pricing Works

AWS Bedrock uses three pricing models, and the choice between them significantly impacts your total cost.

1. On-Demand Pricing

Pay per token processed. No commitments. This is how most teams start.

2. Provisioned Throughput

Reserve dedicated inference capacity for consistent performance.

3. Batch Inference

Process large datasets asynchronously at reduced pricing.

Billing Components

Beyond per-token charges, Bedrock bills for:

Claude on Bedrock: Pricing Details

Claude models are available on Bedrock at the same per-token rates as Anthropic's direct API. This is notable -- Anthropic maintains pricing parity, meaning there is no premium for Bedrock access to Claude.

Claude Model Pricing on Bedrock (April 2026)

Model Input (per 1M tokens) Output (per 1M tokens) Context Prompt Caching
Claude 3.5 Sonnet v2 $3.00 5.00 200K $0.30 input (90% off)
Claude 3.5 Haiku $0.80 $4.00 200K $0.08 input (90% off)
Claude 3 Opus 5.00 $75.00 200K .50 input (90% off)
Claude 4 Sonnet $3.00 5.00 200K $0.30 input (90% off)
Claude 4 Opus 5.00 $75.00 200K .50 input (90% off)

Why Teams Choose Claude on Bedrock vs Direct API

Same price. So why use Bedrock? Three reasons:

  1. AWS ecosystem integration: IAM roles, VPC endpoints, CloudWatch monitoring, CloudTrail logging. If your infrastructure is on AWS, Bedrock integrates natively.

  2. Compliance and data residency: Bedrock runs in specific AWS regions. For teams with data residency requirements (healthcare, government, financial services), Bedrock provides documented compliance frameworks.

  3. Consolidated billing: One AWS bill for all services. No separate Anthropic billing relationship.

The trade-off: Bedrock adds latency. TokenMix.ai benchmarks show Claude on Bedrock adds 50-150ms additional latency versus direct API access due to the proxy layer. For latency-sensitive applications, this matters.

Cross-Region Inference Surcharge

If you enable cross-region inference on Bedrock (routing requests to the nearest available region for capacity), AWS adds a 10% surcharge to all token pricing. Claude 3.5 Sonnet input goes from $3.00 to $3.30 per million tokens. Output from 5.00 to 6.50.

This is a hidden cost that many teams overlook when enabling cross-region for reliability purposes.

Llama on Bedrock: Pricing Details

Llama on Bedrock is where the pricing premium becomes significant. AWS charges substantially more than dedicated inference providers.

Llama Model Pricing on Bedrock (April 2026)

Model Bedrock Input (per 1M) Bedrock Output (per 1M) Together AI Groq
Llama 3.3 8B $0.22 $0.22 $0.18 $0.05
Llama 3.3 70B $2.65 $2.65 $0.88 $0.59
Llama 4 Scout $0.35 .00 $0.18 / $0.59 $0.11 / $0.34
Llama 4 Maverick $0.50 .50 $0.27 / $0.85 $0.20 / $0.60

Llama 3.3 70B on Bedrock costs $2.65/1M tokens versus $0.88 on Together AI -- a 201% premium. For Llama 8B, the gap is smaller (22% premium) but still meaningful at scale.

Why Pay the Llama Premium on Bedrock?

The premium buys AWS infrastructure benefits:

For regulated enterprises that must keep data within AWS, the premium is an infrastructure cost, not a model cost. For everyone else, dedicated inference providers offer the same Llama models at a fraction of the price.

Amazon Nova Models: Pricing Details

Amazon Nova is AWS's first-party model family, available exclusively on Bedrock. Nova models are aggressively priced to drive Bedrock adoption.

Nova Model Pricing (April 2026)

Model Input (per 1M tokens) Output (per 1M tokens) Context Strength
Nova Micro $0.035 $0.14 128K Text-only, fastest
Nova Lite $0.06 $0.24 300K Multimodal, balanced
Nova Pro $0.80 $3.20 300K Best quality, multimodal
Nova Premier $2.00 $8.00 1M Largest, complex tasks

Nova vs Competing Models

Comparison Nova Model Competitor Price Advantage
Budget text Nova Micro ($0.035 in) GPT-4o mini ($0.15 in) Nova is 77% cheaper
Mid-tier Nova Pro ($0.80 in) Claude 3.5 Haiku ($0.80 in) Same price
High-end Nova Premier ($2.00 in) Claude 3.5 Sonnet ($3.00 in) Nova is 33% cheaper

Nova Micro is exceptionally cheap -- $0.035 per million input tokens is among the lowest pricing in the market. For high-volume, simple text tasks (classification, extraction, routing), Nova Micro is hard to beat on cost.

The trade-off: Nova models trail Claude and GPT-4o on quality benchmarks. TokenMix.ai testing shows Nova Pro scoring approximately 78.3 on MMLU versus 88.7 for GPT-4o and 88.3 for Claude 3.5 Sonnet. For applications where cost matters more than peak quality, Nova models are compelling.

On-Demand vs Provisioned Throughput

Provisioned Throughput Pricing

Provisioned throughput on Bedrock reserves dedicated model capacity. Pricing is per model unit per hour, with commitment discounts.

Commitment Discount vs On-Demand Minimum
No commitment (hourly) 0% (baseline) 1 hour
1-month commitment 15-25% savings 1 model unit
6-month commitment 30-40% savings 1 model unit

When Provisioned Throughput Pays Off

The break-even point depends on model and usage:

Claude 3.5 Sonnet example:

At low usage (1M tokens/day):

Rule of thumb tracked by TokenMix.ai: Provisioned throughput becomes cost-effective when your daily spend on a single model consistently exceeds $30-40/day on-demand. Below that, on-demand is cheaper.

Provisioned vs Direct API

Even with provisioned throughput discounts, Bedrock Llama pricing remains more expensive than direct inference providers:

Model Bedrock Provisioned (est.) Together AI Gap
Llama 3.3 70B ~$2.00/1M tokens $0.88/1M tokens +127%
Llama 3.3 8B ~$0.17/1M tokens $0.18/1M tokens -6% (parity)

Only for Llama 8B does Bedrock provisioned pricing approach parity with dedicated providers. For larger models, the premium remains substantial.

Regional Pricing and Cross-Region Inference

Standard Region Pricing

AWS Bedrock pricing is the same across most US regions (us-east-1, us-west-2). However, some regions carry premium pricing:

Region Price Modifier
US East / US West Base price
EU (Frankfurt, Ireland) +0-5% on select models
Asia Pacific (Tokyo, Singapore) +0-10% on select models
GovCloud +20-30%

Cross-Region Inference (+10%)

Cross-region inference routes requests to the nearest region with available capacity. This improves availability but adds a flat 10% surcharge on all token pricing.

Example impact on monthly costs (Claude 3.5 Sonnet, 50M tokens/month):

TokenMix.ai recommendation: Enable cross-region only if you experience regular capacity throttling in your primary region. Most workloads do not need it, and the 10% surcharge is pure overhead otherwise.

Global Inference Endpoints

AWS introduced global inference endpoints in 2026, which automatically route to the optimal region. These carry the same 10% cross-region surcharge. Use them for global applications serving users across continents; avoid them for single-region deployments.

Cost Analysis: Bedrock vs Direct API

Small Team (5M tokens/month, Claude 3.5 Sonnet)

Setup Monthly Cost Notes
Bedrock on-demand $90 AWS billing integration
Anthropic direct API $90 Same price, lower latency
TokenMix.ai $72-81 Optimized routing

At small scale with Claude, Bedrock and direct API cost the same. TokenMix.ai can offer 10-20% savings through smart routing.

Mid-Size Team (100M tokens/month, mixed models)

Setup Monthly Cost Notes
Bedrock (Claude + Llama 70B, on-demand) $3,200 Llama premium significant
Direct APIs (Anthropic + Together AI) ,900 40% cheaper
Bedrock (Claude) + TokenMix.ai (Llama) $2,100 Optimal split

Enterprise (1B tokens/month, multi-model)

Setup Monthly Cost Notes
Bedrock provisioned (all models) $22,000-28,000 Commitment discounts applied
Direct APIs (all providers) 4,000-18,000 35-40% cheaper
Hybrid (Bedrock for compliance + TokenMix.ai for cost) 6,000-20,000 Balance of cost and compliance

The pattern is consistent: Bedrock costs 30-60% more than direct API access for equivalent model usage, with the gap largest on Llama models and negligible on Claude.

When Bedrock Pricing Makes Sense

Despite the premium, Bedrock is the right choice for specific scenarios:

  1. AWS-native infrastructure: If your stack is 100% AWS, Bedrock's IAM, VPC, and CloudWatch integration saves engineering time worth more than the pricing premium.

  2. Compliance requirements: FedRAMP, HIPAA, SOC 2 -- Bedrock inherits AWS compliance certifications. Getting equivalent compliance from direct API providers requires additional work.

  3. Amazon Nova models: Nova Micro ($0.035/1M input) and Nova Lite ($0.06/1M input) are Bedrock exclusives with excellent price-performance for simple tasks.

  4. Managed RAG (Knowledge Bases): Bedrock Knowledge Bases provide turnkey RAG with vector storage, chunking, and retrieval -- no infrastructure to build.

  5. Guardrails for content safety: Bedrock Guardrails offers built-in content filtering, PII detection, and topic blocking that would require custom development on direct APIs.

How to Choose: Decision Guide

Your Situation Recommended Approach Why
AWS-native, compliance-required Bedrock for everything Infrastructure integration worth the premium
AWS-native, cost-sensitive Bedrock (Claude + Nova) + TokenMix.ai (Llama) Route Llama to cheaper providers
Multi-cloud or cloud-agnostic Direct APIs via TokenMix.ai 30-60% cheaper than Bedrock
Need Nova models specifically Bedrock (exclusive) Only available on Bedrock
Budget text processing at scale Bedrock (Nova Micro) $0.035/1M is among cheapest available
Llama inference at scale Together AI or Groq 2-3x cheaper than Bedrock
Need managed RAG Bedrock Knowledge Bases Turnkey solution, reasonable pricing
Global low-latency deployment Direct APIs Bedrock cross-region adds 10% surcharge

Related: Compare all model pricing in our complete LLM API pricing comparison

Conclusion

AWS Bedrock pricing carries a 30-60% premium over direct API access for most model families, with the notable exception of Claude (priced at parity with Anthropic's API) and Nova models (Bedrock exclusives with competitive pricing).

The premium is not irrational -- it buys AWS infrastructure integration, compliance certifications, and managed services like Knowledge Bases and Guardrails. For AWS-native enterprises with compliance requirements, these features justify the cost.

For cost-conscious teams, the optimal strategy is hybrid: run Claude and Nova on Bedrock for compliance and integration benefits, and route Llama and other open-source model inference through TokenMix.ai to dedicated providers at 2-3x lower cost. TokenMix.ai supports this split with a unified API that works alongside Bedrock, providing automatic failover and cost optimization across providers.

Check real-time AWS Bedrock pricing comparisons and cost calculators at TokenMix.ai.

FAQ

Is Claude cheaper on Bedrock or through Anthropic's direct API?

The per-token pricing is identical. Claude 3.5 Sonnet costs $3.00/ 5.00 per million input/output tokens on both Bedrock and Anthropic's API. The difference is in added costs: Bedrock may add latency (50-150ms) and cross-region surcharges (+10% if enabled), while Anthropic's direct API is straightforward per-token billing.

How much does cross-region inference add to AWS Bedrock costs?

Cross-region inference adds a flat 10% surcharge to all token pricing. For a team spending ,000/month on model inference, that is 00/month in additional costs. Only enable cross-region if you regularly experience capacity throttling in your primary region.

Are Amazon Nova models good enough for production use?

Nova models score 10-12% lower than Claude 3.5 Sonnet and GPT-4o on general benchmarks like MMLU. However, for specific tasks like text classification, data extraction, and simple Q&A, Nova Micro ($0.035/1M tokens) and Nova Lite ($0.06/1M tokens) offer excellent price-performance. TokenMix.ai testing shows Nova Pro is competitive with Claude 3.5 Haiku for mid-complexity tasks at the same price point.

When should I use Bedrock provisioned throughput vs on-demand?

Provisioned throughput saves 15-40% but requires commitment. It becomes cost-effective when your daily spend on a single model exceeds approximately $30-40/day on-demand. Below that volume, on-demand is cheaper because provisioned has a minimum hourly charge regardless of usage.

Can I use both Bedrock and direct APIs through TokenMix.ai?

Yes. TokenMix.ai provides a unified API layer that can route requests to Bedrock, Anthropic direct API, Together AI, Groq, and other providers based on cost, latency, or availability preferences. This enables hybrid strategies where compliance-sensitive workloads run on Bedrock while cost-sensitive workloads route to cheaper providers.

Why is Llama so much more expensive on Bedrock than Together AI?

AWS charges a managed service premium for running open-source models on Bedrock infrastructure. Llama 3.3 70B costs $2.65/1M tokens on Bedrock versus $0.88 on Together AI -- a 201% premium. The premium covers AWS infrastructure (VPC, IAM, compliance), managed serving, and Bedrock platform features. Dedicated inference providers compete primarily on price and optimize specifically for inference cost efficiency.


Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: AWS Bedrock Pricing, Anthropic Pricing, Together AI Pricing + TokenMix.ai