TokenMix Research Lab · 2026-04-25

AWS Bedrock Pricing Deep Dive: Real Per-Model Cost Analysis (2026)

AWS Bedrock's pricing is more complex than it looks. For some models (Claude, GPT), Bedrock matches direct provider pricing exactly. For others (Llama, Titan), Bedrock adds 10-70% markup. Three billing modes compound complexity: on-demand, batch (50% off), and provisioned throughput (15-40% off with commitment). This guide gives real 2026 pricing, when Bedrock is cost-effective vs direct API access, and the workload patterns where provisioned throughput breaks even. All data verified against AWS official pricing as of April 2026.

The Three Billing Modes
On-Demand Pricing by Model
The Llama Premium
Batch Pricing: 50% Discount
Provisioned Throughput: When It Makes Sense
Supported LLM Providers and Model Routing
Hidden Costs to Watch
Bedrock vs Direct Provider Access
Cost-Optimization Patterns
FAQ

The Three Billing Modes

AWS Bedrock offers three distinct billing approaches:

1. On-Demand: pay per token, no commitment. Matches direct provider pricing for most models. Best for sporadic or unpredictable usage.

2. Batch: 50% discount on on-demand rates. Requests processed asynchronously within 24 hours. Best for non-real-time bulk processing.

3. Provisioned Throughput: reserved capacity at 15-40% discount. Requires commitment. Breaks even at approximately $30-40/day on-demand spend per model.

Choose based on your traffic pattern, not blanket recommendations.

On-Demand Pricing by Model

Current Bedrock per-MTok pricing (April 2026):

Anthropic Claude family:

Model	Input	Output	Notes
Claude Opus 4.7	$5.00	$25.00	Matches Anthropic direct
Claude Opus 4.6	$5.00	$25.00	Matches direct
Claude Sonnet 4.6	$3.00	5.00	Matches direct
Claude Haiku 4.5	$0.80	$4.00	Matches direct

Meta Llama family:

Model	Input	Output	Notes
Llama 3 70B	$2.65	$3.50	Bedrock premium vs alternatives
Llama 4 Scout	varies	varies	Check current Bedrock pricing
Llama 4 Maverick	varies	varies	Similar premium pattern

Amazon Titan family (Bedrock-native):

Model	Input	Output
Titan Text Express	lower tier	lower tier
Titan Text Premier	mid tier	mid tier

Mistral family:

Model	Input	Output
Mistral Large 2	varies	varies
Mistral Small	cheap tier	cheap tier

The Llama Premium

Bedrock charges ~10-70% more for Llama models than alternative hosts. The premium pays for:

AWS integration (IAM, VPC, CloudWatch)
SOC 2 Type 2 / HIPAA / FedRAMP compliance included
Regional deployment flexibility
Unified AWS billing

Where the premium doesn't apply:

Claude (pricing matches Anthropic direct)
Amazon Titan (native AWS, already optimized pricing)

Where the premium hurts most:

High-volume Llama 3 70B workloads
Cost-sensitive workloads where AWS integration isn't critical

For Llama at scale, consider: Groq ($0.80/MTok), Together AI (~$0.88-0.9), Fireworks — all cheaper than Bedrock's $2.65/$3.50 for Llama 3 70B.

Batch Pricing: 50% Discount

Bedrock Batch API gives you 50% off on-demand rates in exchange for asynchronous processing (up to 24 hours).

When Batch wins:

One-time corpus embedding
Content generation pipelines (nightly runs)
Data labeling / classification at scale
Periodic report generation

When Batch doesn't work:

Real-time user-facing queries
Interactive chat
Agent workflows needing immediate responses

Practical Batch savings at scale:

Workload	Monthly tokens	On-demand	Batch
Large corpus embedding	10B	$50,000+	$25,000+
Mass content generation	1B	5,000	$7,500
Document classification	500M	,500	$750

For non-real-time workloads, Batch cuts costs dramatically.

Provisioned Throughput: When It Makes Sense

Reserved capacity at 15-40% discount with commitment.

Commitment tiers:

1-month commitment: smaller discount
6-month commitment: medium discount
1-year commitment: largest discount

Break-even analysis:

Provisioned becomes cost-effective when daily spend on a single model exceeds ~$30-40 on-demand. Below that, on-demand flexibility wins.

Example calculation:

On-demand spend: $50/day = ,500/month per model
Provisioned equivalent: ~$900-1,275/month (15-40% discount)
Savings: $225-600/month per model

For stable, predictable workloads on a single model, Provisioned saves meaningfully. For dynamic multi-model routing, on-demand is more flexible.

Supported LLM Providers and Model Routing

Bedrock is one option; alternatives:

AWS Bedrock: comprehensive AWS integration, matches some providers' pricing
Azure OpenAI: for GPT family
Google Vertex AI: for Gemini
Direct provider APIs: often matches Bedrock pricing
OpenAI-compatible aggregators — TokenMix.ai, OpenRouter, Together AI

Through TokenMix.ai, you get access to Claude Opus 4.7, Claude Sonnet 4.6, GPT-5.5, DeepSeek V4-Pro, Kimi K2.6, Gemini 3.1 Pro, Llama 4, plus 300+ other models through a single OpenAI-compatible API key. For teams that don't need AWS-specific integrations, aggregators often provide more flexibility and pay-per-token pricing across all providers without the Llama premium.

Basic aggregator usage:

from openai import OpenAI

client = OpenAI(
    api_key="your-tokenmix-key",
    base_url="https://api.tokenmix.ai/v1",
)

response = client.chat.completions.create(
    model="claude-opus-4-7",  # or any 300+ available models
    messages=[{"role": "user", "content": "..."}],
)

Hidden Costs to Watch

Beyond per-token pricing:

1. Data transfer (egress). If your Bedrock region differs from your application, you pay for cross-region data transfer.

2. Knowledge Bases / RAG. Bedrock Knowledge Bases has separate vector DB costs on top of LLM invocations.

3. Agents for Bedrock. Bedrock Agents adds session, action, and orchestration costs.

4. Guardrails. Bedrock Guardrails for content filtering adds per-character fees.

5. Continuous pre-training / fine-tuning. Per-hour training fees plus resulting model hosting costs.

6. Provisioned throughput over-commitment. If you over-estimate capacity needs, you pay for unused reserved capacity.

Bedrock vs Direct Provider Access

When Bedrock wins:

AWS ecosystem integration (IAM, VPC, CloudWatch) is critical
Compliance certifications (FedRAMP, HIPAA) required in AWS
Team already on AWS for infrastructure
Provisioned capacity needed for SLA-critical workloads
Multi-model through one AWS billing makes sense

When direct provider access wins:

Want latest model versions immediately (Bedrock sometimes lags)
Don't need AWS integration
Pricing-sensitive on Llama or non-Anthropic models
Want provider-specific features not exposed via Bedrock

When aggregators win:

Want one API key for many providers (including Claude, GPT, DeepSeek, Kimi, etc.)
Multi-cloud neutrality
Pay-per-token without per-provider contracts
Unified billing across providers

Cost-Optimization Patterns

Pattern 1 — Tier routing:

Route requests by task complexity:

if is_simple_classification(task):
    model = "claude-haiku-4-5"  # cheap tier
elif is_standard_reasoning(task):
    model = "claude-sonnet-4-6"  # mid tier
else:
    model = "claude-opus-4-7"  # frontier

Saves 40-70% vs always using frontier.

Pattern 2 — Batch for non-real-time:

Identify workloads that can accept 24-hour latency. Move to Batch API. Immediate 50% savings.

Pattern 3 — Provisioned for predictable:

If you have a single model consuming >$30/day on-demand consistently, evaluate Provisioned Throughput. Usually 15-25% savings.

Pattern 4 — Avoid Bedrock Llama premium:

For Llama workloads at scale, use direct Meta partners (Together AI, Fireworks) or Groq for speed.

Pattern 5 — Mix providers via aggregator:

If AWS integration isn't critical, TokenMix.ai or OpenRouter let you route per-task to the cheapest provider, without Bedrock's Llama premium or AWS integration overhead.

FAQ

Does Bedrock match Anthropic direct pricing exactly?

Yes for Claude models. Pricing is identical per-token. Bedrock adds value via AWS integration, not cost savings.

Why is Bedrock more expensive for Llama?

AWS pays Meta royalties for Llama hosting under their partnership terms, which they pass to customers. Direct Meta partners (Together AI, etc.) have different cost structures.

Is Bedrock Provisioned worth it?

Only if you have stable, predictable daily spend on a single model exceeding ~$30-40. Below that, on-demand is more flexible.

Can I use Bedrock with Knowledge Bases for RAG?

Yes, but watch the costs. Knowledge Bases includes vector DB, retrieval, and orchestration fees on top of LLM inference. Often cheaper to run your own vector DB (Qdrant self-hosted) + Bedrock for LLM only.

Does Bedrock have GPT-5.5?

No. GPT models are not available on Bedrock — AWS and OpenAI don't have that partnership. GPT is on Azure OpenAI, direct OpenAI, or aggregators like TokenMix.ai.

Does Bedrock charge for failed requests?

Only partially. Requests that return errors due to your configuration are typically charged for the tokens already processed. Server errors (AWS side) usually aren't charged.

What about cross-region Bedrock?

Some regions have different pricing. Cross-region data transfer adds cost. For cost optimization, deploy Bedrock in the region closest to your application.

How does Bedrock Batch compare to OpenAI Batch?

Similar concept: async processing at 50% discount, up to 24-hour latency. OpenAI's Batch may have different limits; check current offerings for comparison.

Can I use Bedrock and aggregators together?

Yes. Some teams use Bedrock for Claude (SLA-critical AWS integration) and aggregators for everything else (cost flexibility on Llama, GPT, DeepSeek, Kimi, etc.).

Where can I compare Bedrock vs aggregator pricing for my specific workload?

Use Bedrock's cost calculator alongside an aggregator's pricing page. For direct comparison on actual API calls, TokenMix.ai provides per-model pricing and usage analytics across Claude, GPT, DeepSeek, Kimi, and more through one account.

Author: TokenMix Research Lab | Last Updated: April 25, 2026 | Data Sources: AWS Bedrock Pricing, Amazon Bedrock Pricing Documentation, PE Collective AWS Bedrock Pricing April 2026, nOps Amazon Bedrock Pricing Explained 2026, CloudChipr Bedrock Pricing 2026, TokenMix.ai unified multi-provider pricing