AWS Bedrock Pricing Deep Dive: Real Per-Model Cost Analysis (2026)
AWS Bedrock's pricing is more complex than it looks. For some models (Claude, GPT), Bedrock matches direct provider pricing exactly. For others (Llama, Titan), Bedrock adds 10-70% markup. Three billing modes compound complexity: on-demand, batch (50% off), and provisioned throughput (15-40% off with commitment). This guide gives real 2026 pricing, when Bedrock is cost-effective vs direct API access, and the workload patterns where provisioned throughput breaks even. All data verified against AWS official pricing as of April 2026.
AWS Bedrock offers three distinct billing approaches:
1. On-Demand: pay per token, no commitment. Matches direct provider pricing for most models. Best for sporadic or unpredictable usage.
2. Batch: 50% discount on on-demand rates. Requests processed asynchronously within 24 hours. Best for non-real-time bulk processing.
3. Provisioned Throughput: reserved capacity at 15-40% discount. Requires commitment. Breaks even at approximately $30-40/day on-demand spend per model.
Choose based on your traffic pattern, not blanket recommendations.
On-Demand Pricing by Model
Current Bedrock per-MTok pricing (April 2026):
Anthropic Claude family:
Model
Input
Output
Notes
Claude Opus 4.7
$5.00
$25.00
Matches Anthropic direct
Claude Opus 4.6
$5.00
$25.00
Matches direct
Claude Sonnet 4.6
$3.00
5.00
Matches direct
Claude Haiku 4.5
$0.80
$4.00
Matches direct
Meta Llama family:
Model
Input
Output
Notes
Llama 3 70B
$2.65
$3.50
Bedrock premium vs alternatives
Llama 4 Scout
varies
varies
Check current Bedrock pricing
Llama 4 Maverick
varies
varies
Similar premium pattern
Amazon Titan family (Bedrock-native):
Model
Input
Output
Titan Text Express
lower tier
lower tier
Titan Text Premier
mid tier
mid tier
Mistral family:
Model
Input
Output
Mistral Large 2
varies
varies
Mistral Small
cheap tier
cheap tier
The Llama Premium
Bedrock charges ~10-70% more for Llama models than alternative hosts. The premium pays for:
AWS integration (IAM, VPC, CloudWatch)
SOC 2 Type 2 / HIPAA / FedRAMP compliance included
For stable, predictable workloads on a single model, Provisioned saves meaningfully. For dynamic multi-model routing, on-demand is more flexible.
Supported LLM Providers and Model Routing
Bedrock is one option; alternatives:
AWS Bedrock: comprehensive AWS integration, matches some providers' pricing
Azure OpenAI: for GPT family
Google Vertex AI: for Gemini
Direct provider APIs: often matches Bedrock pricing
OpenAI-compatible aggregators — TokenMix.ai, OpenRouter, Together AI
Through TokenMix.ai, you get access to Claude Opus 4.7, Claude Sonnet 4.6, GPT-5.5, DeepSeek V4-Pro, Kimi K2.6, Gemini 3.1 Pro, Llama 4, plus 300+ other models through a single OpenAI-compatible API key. For teams that don't need AWS-specific integrations, aggregators often provide more flexibility and pay-per-token pricing across all providers without the Llama premium.
Basic aggregator usage:
from openai import OpenAI
client = OpenAI(
api_key="your-tokenmix-key",
base_url="https://api.tokenmix.ai/v1",
)
response = client.chat.completions.create(
model="claude-opus-4-7", # or any 300+ available models
messages=[{"role": "user", "content": "..."}],
)
Hidden Costs to Watch
Beyond per-token pricing:
1. Data transfer (egress). If your Bedrock region differs from your application, you pay for cross-region data transfer.
2. Knowledge Bases / RAG. Bedrock Knowledge Bases has separate vector DB costs on top of LLM invocations.
3. Agents for Bedrock. Bedrock Agents adds session, action, and orchestration costs.
4. Guardrails. Bedrock Guardrails for content filtering adds per-character fees.
5. Continuous pre-training / fine-tuning. Per-hour training fees plus resulting model hosting costs.
6. Provisioned throughput over-commitment. If you over-estimate capacity needs, you pay for unused reserved capacity.
Bedrock vs Direct Provider Access
When Bedrock wins:
AWS ecosystem integration (IAM, VPC, CloudWatch) is critical
Compliance certifications (FedRAMP, HIPAA) required in AWS
Team already on AWS for infrastructure
Provisioned capacity needed for SLA-critical workloads
Multi-model through one AWS billing makes sense
When direct provider access wins:
Want latest model versions immediately (Bedrock sometimes lags)
Don't need AWS integration
Pricing-sensitive on Llama or non-Anthropic models
Want provider-specific features not exposed via Bedrock
When aggregators win:
Want one API key for many providers (including Claude, GPT, DeepSeek, Kimi, etc.)
Multi-cloud neutrality
Pay-per-token without per-provider contracts
Unified billing across providers
Cost-Optimization Patterns
Pattern 1 — Tier routing:
Route requests by task complexity:
if is_simple_classification(task):
model = "claude-haiku-4-5" # cheap tier
elif is_standard_reasoning(task):
model = "claude-sonnet-4-6" # mid tier
else:
model = "claude-opus-4-7" # frontier
Saves 40-70% vs always using frontier.
Pattern 2 — Batch for non-real-time:
Identify workloads that can accept 24-hour latency. Move to Batch API. Immediate 50% savings.
Pattern 3 — Provisioned for predictable:
If you have a single model consuming >$30/day on-demand consistently, evaluate Provisioned Throughput. Usually 15-25% savings.
Pattern 4 — Avoid Bedrock Llama premium:
For Llama workloads at scale, use direct Meta partners (Together AI, Fireworks) or Groq for speed.
Pattern 5 — Mix providers via aggregator:
If AWS integration isn't critical, TokenMix.ai or OpenRouter let you route per-task to the cheapest provider, without Bedrock's Llama premium or AWS integration overhead.
FAQ
Does Bedrock match Anthropic direct pricing exactly?
Yes for Claude models. Pricing is identical per-token. Bedrock adds value via AWS integration, not cost savings.
Why is Bedrock more expensive for Llama?
AWS pays Meta royalties for Llama hosting under their partnership terms, which they pass to customers. Direct Meta partners (Together AI, etc.) have different cost structures.
Is Bedrock Provisioned worth it?
Only if you have stable, predictable daily spend on a single model exceeding ~$30-40. Below that, on-demand is more flexible.
Can I use Bedrock with Knowledge Bases for RAG?
Yes, but watch the costs. Knowledge Bases includes vector DB, retrieval, and orchestration fees on top of LLM inference. Often cheaper to run your own vector DB (Qdrant self-hosted) + Bedrock for LLM only.
Does Bedrock have GPT-5.5?
No. GPT models are not available on Bedrock — AWS and OpenAI don't have that partnership. GPT is on Azure OpenAI, direct OpenAI, or aggregators like TokenMix.ai.
Does Bedrock charge for failed requests?
Only partially. Requests that return errors due to your configuration are typically charged for the tokens already processed. Server errors (AWS side) usually aren't charged.
What about cross-region Bedrock?
Some regions have different pricing. Cross-region data transfer adds cost. For cost optimization, deploy Bedrock in the region closest to your application.
How does Bedrock Batch compare to OpenAI Batch?
Similar concept: async processing at 50% discount, up to 24-hour latency. OpenAI's Batch may have different limits; check current offerings for comparison.
Can I use Bedrock and aggregators together?
Yes. Some teams use Bedrock for Claude (SLA-critical AWS integration) and aggregators for everything else (cost flexibility on Llama, GPT, DeepSeek, Kimi, etc.).
Where can I compare Bedrock vs aggregator pricing for my specific workload?
Use Bedrock's cost calculator alongside an aggregator's pricing page. For direct comparison on actual API calls, TokenMix.ai provides per-model pricing and usage analytics across Claude, GPT, DeepSeek, Kimi, and more through one account.