TokenMix Research Lab · 2026-04-12

Claude API Tutorial 2026: Sonnet 4.6, Cache, Tools, Routing

Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30

Start new Claude API projects on Sonnet 4.6. Use Haiku 4.5 for cheap routing, Opus 4.7 for hard tasks, prompt caching for repeated context, and batch for async jobs.

The Claude API is no longer a three-model "Haiku 3.5 / Sonnet 4 / Opus 4" stack. Current developer work should start from the live Claude Platform: Messages API, model deprecations, pricing, prompt caching, tool use, streaming, OpenAI SDK compatibility, and rate limits. This tutorial gives the minimum production path without old model IDs.

For broader budget planning, compare Claude against OpenAI, Gemini, DeepSeek, Grok, and Kimi in the LLM API Pricing 2026 guide.

Quick Verdict
Current Claude API Model Table
Step 1: Create A Console Project And API Key
Step 2: First Python Call
Step 3: First TypeScript Call
Step 4: Messages API Format
Step 5: Streaming
Step 6: Tool Use
Step 7: Prompt Caching
Step 8: OpenAI SDK Compatibility
Step 9: Use TokenMix.ai For Routing
Cost Control Checklist
Common Errors
Final Recommendation
FAQ
Related Articles
Sources

Quick Verdict

Use native Anthropic SDK when you need Claude-specific features. Use OpenAI SDK compatibility or TokenMix.ai when migration speed, multi-model routing, and fallback matter more.

Need	Best path
Best default Claude model	`claude-sonnet-4-6`
Cheapest Claude smoke tests	`claude-haiku-4-5`
Hard reasoning and code review	`claude-opus-4-7`
Repeated long context	Prompt caching
Async bulk jobs	Message Batches API
Long user-facing answers	Streaming
Function/tool calls	Claude tool use
Existing OpenAI SDK app	Claude OpenAI SDK compatibility or TokenMix.ai
Multi-provider fallback	TokenMix.ai gateway

Current Claude API Model Table

Model	Current role	API price signal	Best use
Claude Opus 4.7	Premium reasoning	$5/MTok input, $25/MTok output	Hard reasoning, code review, high-value agents
Claude Opus 4.6	Stable premium route	$5/MTok input, $25/MTok output	Opus workloads where 4.7 migration risk matters
Claude Sonnet 4.6	Default workhorse	$3/MTok input, $15/MTok output	Coding, analysis, writing, agents
Claude Sonnet 4.5	Regression bridge	$3/MTok input, $15/MTok output	Older Sonnet workflow migration
Claude Haiku 4.5	Fast low-cost route	$1/MTok input, $5/MTok output	Classification, extraction, simple routing

Avoid retired/deprecated defaults: Claude 3.7 Sonnet is retired, Sonnet 4 is deprecated, and Haiku 3.5 is retired. If an old tutorial tells you to start with those models, update it before shipping.

Step 1: Create A Console Project And API Key

Step	Action
1	Go to Claude Console
2	Create or select an organization/workspace
3	Open billing/usage and confirm your credit or billing state
4	Create an API key and store it safely
5	Set `ANTHROPIC_API_KEY` in your environment
6	Start with Haiku or Sonnet, not Opus

Do not assume every new account receives a fixed free credit amount. Check the balance shown in Console. For more on credits, see our Claude API free tier guide.

Install SDKs:

pip install anthropic
npm install @anthropic-ai/sdk

Set the key:

export ANTHROPIC_API_KEY="sk-ant-your-key"

Step 2: First Python Call

from anthropic import Anthropic

client = Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=512,
    messages=[
        {"role": "user", "content": "Explain prompt caching in two practical bullets."}
    ],
)

print(message.content[0].text)

Use Sonnet 4.6 for the first real quality test. Use Haiku 4.5 for basic integration checks where answer quality does not matter.

Step 3: First TypeScript Call

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const message = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 512,
  messages: [
    { role: "user", content: "Write a concise API onboarding checklist." },
  ],
});

const first = message.content[0];
if (first.type === "text") {
  console.log(first.text);
}

Use environment variables in production. Do not commit API keys or paste them into client-side code.

Step 4: Messages API Format

Claude's native Messages API is close to chat-completions patterns, but not identical.

Concept	Claude Messages API	Common OpenAI Chat pattern
System prompt	Separate `system` parameter	System role message
User/assistant turns	`messages` array	`messages` array
Output budget	`max_tokens`	Often `max_tokens` or `max_completion_tokens`
Response text	`message.content[0].text` for text blocks	`choices[0].message.content`
Tool calls	`tool_use` and `tool_result` blocks	Tool calls in assistant message
Versioning	`anthropic-version` header for raw HTTP	Provider-specific
Rate limit headers	Anthropic-specific headers	Provider-specific

System prompt example:

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=700,
    system="You are a senior backend engineer. Answer with implementation details.",
    messages=[{"role": "user", "content": "How should I retry 429 errors?"}],
)

Step 5: Streaming

Use streaming when answers may be long or user-facing latency matters.

from anthropic import Anthropic

client = Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1200,
    messages=[{"role": "user", "content": "Explain API gateway fallback design."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

    final = stream.get_final_message()
    print(f"\nOutput tokens: {final.usage.output_tokens}")

Use streaming when...	Avoid streaming when...
User is waiting in a UI	You need only a tiny classification
Response can exceed a few seconds	The job is async and can be batched
You want partial output	You need one atomic JSON object
Long outputs may timeout	You can process later via batch

Step 6: Tool Use

Claude can choose tools by emitting tool_use blocks. Your application executes the tool, then returns a tool_result.

from anthropic import Anthropic

client = Anthropic()

tools = [
    {
        "name": "get_order_status",
        "description": "Look up an order by ID.",
        "input_schema": {
            "type": "object",
            "properties": {
                "order_id": {"type": "string", "description": "Order ID"}
            },
            "required": ["order_id"],
        },
    }
]

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=600,
    tools=tools,
    messages=[{"role": "user", "content": "Where is order A123?"}],
)

for block in response.content:
    if block.type == "tool_use":
        print(block.name, block.input)

Tool pattern	Best practice
Tool names	Use clear verbs and nouns
Input schema	Keep required fields explicit
Tool results	Return compact structured data
Safety	Validate all tool inputs server-side
Logging	Store tool name, tool input, and model ID
Retries	Keep tool retries outside the model loop where possible

Step 7: Prompt Caching

For exact cache-read, 5-minute write, and 1-hour write cost math, use the Claude API cache pricing guide.

Prompt caching is one of Claude's best cost controls. Use it when a large prefix repeats across requests: policy text, tool schemas, long documents, project context, or retrieval bundles.

POLICY = "Long repeated policy or product documentation..."

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=600,
    system=[
        {
            "type": "text",
            "text": POLICY,
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[{"role": "user", "content": "Answer using the policy."}],
)

usage = message.usage
print("cache write", usage.cache_creation_input_tokens)
print("cache read", usage.cache_read_input_tokens)

Cache term	Meaning	Cost impact
Cache write	First time a cacheable block is stored	Higher than base input
Cache read	Reusing cached content	10% of base input price
Cache hit	Repeated prefix matched	Lower cost and lower ITPM pressure
Cache miss	Prefix changed or expired	You pay write/base input again
Good cache block	Stable, large, reused context	Best savings
Bad cache block	Tiny or constantly changing text	Little value

For deeper economics, use our Anthropic API pricing guide.

Step 8: OpenAI SDK Compatibility

Anthropic now documents an OpenAI SDK compatibility layer for testing Claude with the OpenAI SDK. This is useful for migration experiments, but the native Anthropic SDK still exposes Claude-specific behavior more directly.

Option	Use when
Native Anthropic SDK	You need Claude-native messages, caching, tools, headers, and docs parity
Anthropic OpenAI SDK compatibility	You want quick migration tests with OpenAI-style code
TokenMix.ai OpenAI-compatible gateway	You want Claude plus many other models behind one interface

If you need multi-provider routing, gateway observability, and fallback, TokenMix.ai is the more complete production pattern.

Step 9: Use TokenMix.ai For Routing

TokenMix.ai lets you call Claude and 300+ other models through an OpenAI-compatible endpoint. Use it when the product needs model choice, not one-provider purity.

from openai import OpenAI

client = OpenAI(
    api_key="your-tokenmix-key",
    base_url="https://api.tokenmix.ai/v1",
)

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[
        {"role": "system", "content": "You are a concise API tutor."},
        {"role": "user", "content": "Explain retries for 529 errors."},
    ],
)

print(response.choices[0].message.content)

TokenMix.ai helps when...	Why
You need Claude plus GPT/Gemini/DeepSeek/Kimi	One API surface
Claude hits 429 or 529	Add fallback policy
Some tasks are too cheap for Sonnet	Route to Haiku or non-Claude models
Finance wants one bill	Unified usage tracking
You are migrating from OpenAI SDK	OpenAI-compatible call shape

Cost Control Checklist

Control	Why it matters
Start with Sonnet 4.6	Best default balance
Use Haiku 4.5 for simple tasks	Cuts cost on routing/classification
Reserve Opus 4.7 for hard tasks	Avoid premium price on routine calls
Use prompt caching	Repeated context gets cheaper
Use Batch API	Async jobs get 50% token discount
Lower unnecessary output budgets	Output tokens dominate cost
Log model ID and token usage	Debug cost drift
Monitor 429 and 529 separately	Rate limits and overload are different
Add fallback chains	Prevent one provider from becoming downtime

Example monthly cost for 100M input and 30M output:

Route	Approx cost
All Sonnet 4.6	$750
All Haiku 4.5	$250
All Opus 4.7	$1,250
70% Haiku + 25% Sonnet + 5% Opus	About $425
Same routed mix with half async batch	About $320

Common Errors

Error	Likely cause	Fix
`401 authentication_error`	Bad or missing key	Regenerate key and check environment
`400 invalid_request_error`	Wrong schema or unsupported field	Compare with Messages API docs
`404 not_found_error` for model	Old or unavailable model ID	Check model deprecations and current model catalog
`413 request_too_large`	Body exceeds endpoint size	Split files or use Files API where appropriate
`429 rate_limit_error`	RPM/ITPM/OTPM/spend/workspace limit	Backoff, cache, batch, route, or raise limits
`500 api_error`	Server error	Bounded retry and request ID logging
`504 timeout_error`	Long blocking request	Stream or use Message Batches
`529 overloaded_error`	Anthropic overload	Retry and fail over

Use our Claude Rate Exceeded guide and Claude 529 guide for production retry design.

Final Recommendation

Build new Claude API apps on Sonnet 4.6, cache repeated context, batch async work, and route simple tasks away from Sonnet. Use TokenMix.ai when fallback and multi-model cost control matter.

FAQ

What Claude model should I use first?

Use Sonnet 4.6 for real quality tests. Use Haiku 4.5 for cheap SDK smoke tests. Use Opus 4.7 only when Sonnet fails or the task is high-value.

Does Claude API have free credits?

Sometimes an account may show onboarding or promotional credits, but there is no permanent free Claude API tier. Check your actual Claude Console balance.

Is `claude-sonnet-4-20250514` still a good tutorial model?

No. It is deprecated and scheduled to retire. Use claude-sonnet-4-6 for new examples.

Can I use the OpenAI SDK with Claude?

Yes for compatibility testing through Anthropic's documented compatibility layer, and also through TokenMix.ai's OpenAI-compatible gateway. Use the native Anthropic SDK for the most Claude-specific features.

How much does prompt caching save?

Cache reads are priced at 10% of base input tokens. Actual savings depend on cache hit rate, context size, and how often the repeated prefix changes.

Should I use Batch API?

Yes for async workloads such as bulk summarization, enrichment, evaluation, and offline report generation. Batch pricing gives a 50% token discount.

What is the biggest Claude API cost mistake?

Sending every task to Sonnet or Opus. Route simple work to Haiku or cheaper non-Claude models, and reserve premium models for tasks where quality changes the outcome.

How does TokenMix.ai fit into a Claude API stack?

TokenMix.ai is the routing layer. Use it to call Claude, compare alternatives, track spend, and fail over when Claude is rate-limited or overloaded.