TokenMix Research Lab · 2026-04-12

Claude API Tutorial 2026: Sonnet 4.6, Cache, Tools, Routing
Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30
Start new Claude API projects on Sonnet 4.6. Use Haiku 4.5 for cheap routing, Opus 4.7 for hard tasks, prompt caching for repeated context, and batch for async jobs.
The Claude API is no longer a three-model "Haiku 3.5 / Sonnet 4 / Opus 4" stack. Current developer work should start from the live Claude Platform: Messages API, model deprecations, pricing, prompt caching, tool use, streaming, OpenAI SDK compatibility, and rate limits. This tutorial gives the minimum production path without old model IDs.
For broader budget planning, compare Claude against OpenAI, Gemini, DeepSeek, Grok, and Kimi in the LLM API Pricing 2026 guide.
Table of Contents
- Quick Verdict
- Current Claude API Model Table
- Step 1: Create A Console Project And API Key
- Step 2: First Python Call
- Step 3: First TypeScript Call
- Step 4: Messages API Format
- Step 5: Streaming
- Step 6: Tool Use
- Step 7: Prompt Caching
- Step 8: OpenAI SDK Compatibility
- Step 9: Use TokenMix.ai For Routing
- Cost Control Checklist
- Common Errors
- Final Recommendation
- FAQ
- Related Articles
- Sources
Quick Verdict
Use native Anthropic SDK when you need Claude-specific features. Use OpenAI SDK compatibility or TokenMix.ai when migration speed, multi-model routing, and fallback matter more.
| Need | Best path |
|---|---|
| Best default Claude model | claude-sonnet-4-6 |
| Cheapest Claude smoke tests | claude-haiku-4-5 |
| Hard reasoning and code review | claude-opus-4-7 |
| Repeated long context | Prompt caching |
| Async bulk jobs | Message Batches API |
| Long user-facing answers | Streaming |
| Function/tool calls | Claude tool use |
| Existing OpenAI SDK app | Claude OpenAI SDK compatibility or TokenMix.ai |
| Multi-provider fallback | TokenMix.ai gateway |
Current Claude API Model Table
| Model | Current role | API price signal | Best use |
|---|---|---|---|
| Claude Opus 4.7 | Premium reasoning | $5/MTok input, $25/MTok output | Hard reasoning, code review, high-value agents |
| Claude Opus 4.6 | Stable premium route | $5/MTok input, $25/MTok output | Opus workloads where 4.7 migration risk matters |
| Claude Sonnet 4.6 | Default workhorse | $3/MTok input, $15/MTok output | Coding, analysis, writing, agents |
| Claude Sonnet 4.5 | Regression bridge | $3/MTok input, $15/MTok output | Older Sonnet workflow migration |
| Claude Haiku 4.5 | Fast low-cost route | $1/MTok input, $5/MTok output | Classification, extraction, simple routing |
Avoid retired/deprecated defaults: Claude 3.7 Sonnet is retired, Sonnet 4 is deprecated, and Haiku 3.5 is retired. If an old tutorial tells you to start with those models, update it before shipping.
Step 1: Create A Console Project And API Key
| Step | Action |
|---|---|
| 1 | Go to Claude Console |
| 2 | Create or select an organization/workspace |
| 3 | Open billing/usage and confirm your credit or billing state |
| 4 | Create an API key and store it safely |
| 5 | Set ANTHROPIC_API_KEY in your environment |
| 6 | Start with Haiku or Sonnet, not Opus |
Do not assume every new account receives a fixed free credit amount. Check the balance shown in Console. For more on credits, see our Claude API free tier guide.
Install SDKs:
pip install anthropic
npm install @anthropic-ai/sdk
Set the key:
export ANTHROPIC_API_KEY="sk-ant-your-key"
Step 2: First Python Call
from anthropic import Anthropic
client = Anthropic()
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
messages=[
{"role": "user", "content": "Explain prompt caching in two practical bullets."}
],
)
print(message.content[0].text)
Use Sonnet 4.6 for the first real quality test. Use Haiku 4.5 for basic integration checks where answer quality does not matter.
Step 3: First TypeScript Call
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const message = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 512,
messages: [
{ role: "user", content: "Write a concise API onboarding checklist." },
],
});
const first = message.content[0];
if (first.type === "text") {
console.log(first.text);
}
Use environment variables in production. Do not commit API keys or paste them into client-side code.
Step 4: Messages API Format
Claude's native Messages API is close to chat-completions patterns, but not identical.
| Concept | Claude Messages API | Common OpenAI Chat pattern |
|---|---|---|
| System prompt | Separate system parameter |
System role message |
| User/assistant turns | messages array |
messages array |
| Output budget | max_tokens |
Often max_tokens or max_completion_tokens |
| Response text | message.content[0].text for text blocks |
choices[0].message.content |
| Tool calls | tool_use and tool_result blocks |
Tool calls in assistant message |
| Versioning | anthropic-version header for raw HTTP |
Provider-specific |
| Rate limit headers | Anthropic-specific headers | Provider-specific |
System prompt example:
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=700,
system="You are a senior backend engineer. Answer with implementation details.",
messages=[{"role": "user", "content": "How should I retry 429 errors?"}],
)
Step 5: Streaming
Use streaming when answers may be long or user-facing latency matters.
from anthropic import Anthropic
client = Anthropic()
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1200,
messages=[{"role": "user", "content": "Explain API gateway fallback design."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
final = stream.get_final_message()
print(f"\nOutput tokens: {final.usage.output_tokens}")
| Use streaming when... | Avoid streaming when... |
|---|---|
| User is waiting in a UI | You need only a tiny classification |
| Response can exceed a few seconds | The job is async and can be batched |
| You want partial output | You need one atomic JSON object |
| Long outputs may timeout | You can process later via batch |
Step 6: Tool Use
Claude can choose tools by emitting tool_use blocks. Your application executes the tool, then returns a tool_result.
from anthropic import Anthropic
client = Anthropic()
tools = [
{
"name": "get_order_status",
"description": "Look up an order by ID.",
"input_schema": {
"type": "object",
"properties": {
"order_id": {"type": "string", "description": "Order ID"}
},
"required": ["order_id"],
},
}
]
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=600,
tools=tools,
messages=[{"role": "user", "content": "Where is order A123?"}],
)
for block in response.content:
if block.type == "tool_use":
print(block.name, block.input)
| Tool pattern | Best practice |
|---|---|
| Tool names | Use clear verbs and nouns |
| Input schema | Keep required fields explicit |
| Tool results | Return compact structured data |
| Safety | Validate all tool inputs server-side |
| Logging | Store tool name, tool input, and model ID |
| Retries | Keep tool retries outside the model loop where possible |
Step 7: Prompt Caching
For exact cache-read, 5-minute write, and 1-hour write cost math, use the Claude API cache pricing guide.
Prompt caching is one of Claude's best cost controls. Use it when a large prefix repeats across requests: policy text, tool schemas, long documents, project context, or retrieval bundles.
POLICY = "Long repeated policy or product documentation..."
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=600,
system=[
{
"type": "text",
"text": POLICY,
"cache_control": {"type": "ephemeral"},
}
],
messages=[{"role": "user", "content": "Answer using the policy."}],
)
usage = message.usage
print("cache write", usage.cache_creation_input_tokens)
print("cache read", usage.cache_read_input_tokens)
| Cache term | Meaning | Cost impact |
|---|---|---|
| Cache write | First time a cacheable block is stored | Higher than base input |
| Cache read | Reusing cached content | 10% of base input price |
| Cache hit | Repeated prefix matched | Lower cost and lower ITPM pressure |
| Cache miss | Prefix changed or expired | You pay write/base input again |
| Good cache block | Stable, large, reused context | Best savings |
| Bad cache block | Tiny or constantly changing text | Little value |
For deeper economics, use our Anthropic API pricing guide.
Step 8: OpenAI SDK Compatibility
Anthropic now documents an OpenAI SDK compatibility layer for testing Claude with the OpenAI SDK. This is useful for migration experiments, but the native Anthropic SDK still exposes Claude-specific behavior more directly.
| Option | Use when |
|---|---|
| Native Anthropic SDK | You need Claude-native messages, caching, tools, headers, and docs parity |
| Anthropic OpenAI SDK compatibility | You want quick migration tests with OpenAI-style code |
| TokenMix.ai OpenAI-compatible gateway | You want Claude plus many other models behind one interface |
If you need multi-provider routing, gateway observability, and fallback, TokenMix.ai is the more complete production pattern.
Step 9: Use TokenMix.ai For Routing
TokenMix.ai lets you call Claude and 300+ other models through an OpenAI-compatible endpoint. Use it when the product needs model choice, not one-provider purity.
from openai import OpenAI
client = OpenAI(
api_key="your-tokenmix-key",
base_url="https://api.tokenmix.ai/v1",
)
response = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[
{"role": "system", "content": "You are a concise API tutor."},
{"role": "user", "content": "Explain retries for 529 errors."},
],
)
print(response.choices[0].message.content)
| TokenMix.ai helps when... | Why |
|---|---|
| You need Claude plus GPT/Gemini/DeepSeek/Kimi | One API surface |
| Claude hits 429 or 529 | Add fallback policy |
| Some tasks are too cheap for Sonnet | Route to Haiku or non-Claude models |
| Finance wants one bill | Unified usage tracking |
| You are migrating from OpenAI SDK | OpenAI-compatible call shape |
Cost Control Checklist
| Control | Why it matters |
|---|---|
| Start with Sonnet 4.6 | Best default balance |
| Use Haiku 4.5 for simple tasks | Cuts cost on routing/classification |
| Reserve Opus 4.7 for hard tasks | Avoid premium price on routine calls |
| Use prompt caching | Repeated context gets cheaper |
| Use Batch API | Async jobs get 50% token discount |
| Lower unnecessary output budgets | Output tokens dominate cost |
| Log model ID and token usage | Debug cost drift |
| Monitor 429 and 529 separately | Rate limits and overload are different |
| Add fallback chains | Prevent one provider from becoming downtime |
Example monthly cost for 100M input and 30M output:
| Route | Approx cost |
|---|---|
| All Sonnet 4.6 | $750 |
| All Haiku 4.5 | $250 |
| All Opus 4.7 | $1,250 |
| 70% Haiku + 25% Sonnet + 5% Opus | About $425 |
| Same routed mix with half async batch | About $320 |
Common Errors
| Error | Likely cause | Fix |
|---|---|---|
401 authentication_error |
Bad or missing key | Regenerate key and check environment |
400 invalid_request_error |
Wrong schema or unsupported field | Compare with Messages API docs |
404 not_found_error for model |
Old or unavailable model ID | Check model deprecations and current model catalog |
413 request_too_large |
Body exceeds endpoint size | Split files or use Files API where appropriate |
429 rate_limit_error |
RPM/ITPM/OTPM/spend/workspace limit | Backoff, cache, batch, route, or raise limits |
500 api_error |
Server error | Bounded retry and request ID logging |
504 timeout_error |
Long blocking request | Stream or use Message Batches |
529 overloaded_error |
Anthropic overload | Retry and fail over |
Use our Claude Rate Exceeded guide and Claude 529 guide for production retry design.
Final Recommendation
Build new Claude API apps on Sonnet 4.6, cache repeated context, batch async work, and route simple tasks away from Sonnet. Use TokenMix.ai when fallback and multi-model cost control matter.
FAQ
What Claude model should I use first?
Use Sonnet 4.6 for real quality tests. Use Haiku 4.5 for cheap SDK smoke tests. Use Opus 4.7 only when Sonnet fails or the task is high-value.
Does Claude API have free credits?
Sometimes an account may show onboarding or promotional credits, but there is no permanent free Claude API tier. Check your actual Claude Console balance.
Is claude-sonnet-4-20250514 still a good tutorial model?
No. It is deprecated and scheduled to retire. Use claude-sonnet-4-6 for new examples.
Can I use the OpenAI SDK with Claude?
Yes for compatibility testing through Anthropic's documented compatibility layer, and also through TokenMix.ai's OpenAI-compatible gateway. Use the native Anthropic SDK for the most Claude-specific features.
How much does prompt caching save?
Cache reads are priced at 10% of base input tokens. Actual savings depend on cache hit rate, context size, and how often the repeated prefix changes.
Should I use Batch API?
Yes for async workloads such as bulk summarization, enrichment, evaluation, and offline report generation. Batch pricing gives a 50% token discount.
What is the biggest Claude API cost mistake?
Sending every task to Sonnet or Opus. Route simple work to Haiku or cheaper non-Claude models, and reserve premium models for tasks where quality changes the outcome.
How does TokenMix.ai fit into a Claude API stack?
TokenMix.ai is the routing layer. Use it to call Claude, compare alternatives, track spend, and fail over when Claude is rate-limited or overloaded.
Related Articles
- Claude API Cache Pricing 2026: 90% Input Savings Explained
- Claude API Pricing 2026: Opus, Sonnet, Haiku Costs Compared
- Anthropic API Pricing 2026: Cache, Batch, Data Residency Fees
- Claude Rate Exceeded Error 2026: 5 Fixes for 429 Limits
- Claude API Error 529 2026: Overload Retry and Failover Guide
- Claude Sonnet vs Opus 2026: Pricing, Quality, Routing Guide
- Claude Haiku vs Sonnet 2026: Cost, Quality, Routing Rules
- OpenAI-Compatible API Gateway: 9 Providers, One SDK Guide