TokenMix Research Lab · 2026-04-12

Claude API Tutorial 2026: Sonnet 4.6, Cache, Tools, Routing

Claude API Tutorial 2026: Sonnet 4.6, Cache, Tools, Routing

Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30

Start new Claude API projects on Sonnet 4.6. Use Haiku 4.5 for cheap routing, Opus 4.7 for hard tasks, prompt caching for repeated context, and batch for async jobs.

The Claude API is no longer a three-model "Haiku 3.5 / Sonnet 4 / Opus 4" stack. Current developer work should start from the live Claude Platform: Messages API, model deprecations, pricing, prompt caching, tool use, streaming, OpenAI SDK compatibility, and rate limits. This tutorial gives the minimum production path without old model IDs.

For broader budget planning, compare Claude against OpenAI, Gemini, DeepSeek, Grok, and Kimi in the LLM API Pricing 2026 guide.

Table of Contents

Quick Verdict

Use native Anthropic SDK when you need Claude-specific features. Use OpenAI SDK compatibility or TokenMix.ai when migration speed, multi-model routing, and fallback matter more.

Need Best path
Best default Claude model claude-sonnet-4-6
Cheapest Claude smoke tests claude-haiku-4-5
Hard reasoning and code review claude-opus-4-7
Repeated long context Prompt caching
Async bulk jobs Message Batches API
Long user-facing answers Streaming
Function/tool calls Claude tool use
Existing OpenAI SDK app Claude OpenAI SDK compatibility or TokenMix.ai
Multi-provider fallback TokenMix.ai gateway

Current Claude API Model Table

Model Current role API price signal Best use
Claude Opus 4.7 Premium reasoning $5/MTok input, $25/MTok output Hard reasoning, code review, high-value agents
Claude Opus 4.6 Stable premium route $5/MTok input, $25/MTok output Opus workloads where 4.7 migration risk matters
Claude Sonnet 4.6 Default workhorse $3/MTok input, $15/MTok output Coding, analysis, writing, agents
Claude Sonnet 4.5 Regression bridge $3/MTok input, $15/MTok output Older Sonnet workflow migration
Claude Haiku 4.5 Fast low-cost route $1/MTok input, $5/MTok output Classification, extraction, simple routing

Avoid retired/deprecated defaults: Claude 3.7 Sonnet is retired, Sonnet 4 is deprecated, and Haiku 3.5 is retired. If an old tutorial tells you to start with those models, update it before shipping.

Step 1: Create A Console Project And API Key

Step Action
1 Go to Claude Console
2 Create or select an organization/workspace
3 Open billing/usage and confirm your credit or billing state
4 Create an API key and store it safely
5 Set ANTHROPIC_API_KEY in your environment
6 Start with Haiku or Sonnet, not Opus

Do not assume every new account receives a fixed free credit amount. Check the balance shown in Console. For more on credits, see our Claude API free tier guide.

Install SDKs:

pip install anthropic
npm install @anthropic-ai/sdk

Set the key:

export ANTHROPIC_API_KEY="sk-ant-your-key"

Step 2: First Python Call

from anthropic import Anthropic

client = Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=512,
    messages=[
        {"role": "user", "content": "Explain prompt caching in two practical bullets."}
    ],
)

print(message.content[0].text)

Use Sonnet 4.6 for the first real quality test. Use Haiku 4.5 for basic integration checks where answer quality does not matter.

Step 3: First TypeScript Call

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const message = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 512,
  messages: [
    { role: "user", content: "Write a concise API onboarding checklist." },
  ],
});

const first = message.content[0];
if (first.type === "text") {
  console.log(first.text);
}

Use environment variables in production. Do not commit API keys or paste them into client-side code.

Step 4: Messages API Format

Claude's native Messages API is close to chat-completions patterns, but not identical.

Concept Claude Messages API Common OpenAI Chat pattern
System prompt Separate system parameter System role message
User/assistant turns messages array messages array
Output budget max_tokens Often max_tokens or max_completion_tokens
Response text message.content[0].text for text blocks choices[0].message.content
Tool calls tool_use and tool_result blocks Tool calls in assistant message
Versioning anthropic-version header for raw HTTP Provider-specific
Rate limit headers Anthropic-specific headers Provider-specific

System prompt example:

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=700,
    system="You are a senior backend engineer. Answer with implementation details.",
    messages=[{"role": "user", "content": "How should I retry 429 errors?"}],
)

Step 5: Streaming

Use streaming when answers may be long or user-facing latency matters.

from anthropic import Anthropic

client = Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1200,
    messages=[{"role": "user", "content": "Explain API gateway fallback design."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

    final = stream.get_final_message()
    print(f"\nOutput tokens: {final.usage.output_tokens}")
Use streaming when... Avoid streaming when...
User is waiting in a UI You need only a tiny classification
Response can exceed a few seconds The job is async and can be batched
You want partial output You need one atomic JSON object
Long outputs may timeout You can process later via batch

Step 6: Tool Use

Claude can choose tools by emitting tool_use blocks. Your application executes the tool, then returns a tool_result.

from anthropic import Anthropic

client = Anthropic()

tools = [
    {
        "name": "get_order_status",
        "description": "Look up an order by ID.",
        "input_schema": {
            "type": "object",
            "properties": {
                "order_id": {"type": "string", "description": "Order ID"}
            },
            "required": ["order_id"],
        },
    }
]

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=600,
    tools=tools,
    messages=[{"role": "user", "content": "Where is order A123?"}],
)

for block in response.content:
    if block.type == "tool_use":
        print(block.name, block.input)
Tool pattern Best practice
Tool names Use clear verbs and nouns
Input schema Keep required fields explicit
Tool results Return compact structured data
Safety Validate all tool inputs server-side
Logging Store tool name, tool input, and model ID
Retries Keep tool retries outside the model loop where possible

Step 7: Prompt Caching

For exact cache-read, 5-minute write, and 1-hour write cost math, use the Claude API cache pricing guide.

Prompt caching is one of Claude's best cost controls. Use it when a large prefix repeats across requests: policy text, tool schemas, long documents, project context, or retrieval bundles.

POLICY = "Long repeated policy or product documentation..."

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=600,
    system=[
        {
            "type": "text",
            "text": POLICY,
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[{"role": "user", "content": "Answer using the policy."}],
)

usage = message.usage
print("cache write", usage.cache_creation_input_tokens)
print("cache read", usage.cache_read_input_tokens)
Cache term Meaning Cost impact
Cache write First time a cacheable block is stored Higher than base input
Cache read Reusing cached content 10% of base input price
Cache hit Repeated prefix matched Lower cost and lower ITPM pressure
Cache miss Prefix changed or expired You pay write/base input again
Good cache block Stable, large, reused context Best savings
Bad cache block Tiny or constantly changing text Little value

For deeper economics, use our Anthropic API pricing guide.

Step 8: OpenAI SDK Compatibility

Anthropic now documents an OpenAI SDK compatibility layer for testing Claude with the OpenAI SDK. This is useful for migration experiments, but the native Anthropic SDK still exposes Claude-specific behavior more directly.

Option Use when
Native Anthropic SDK You need Claude-native messages, caching, tools, headers, and docs parity
Anthropic OpenAI SDK compatibility You want quick migration tests with OpenAI-style code
TokenMix.ai OpenAI-compatible gateway You want Claude plus many other models behind one interface

If you need multi-provider routing, gateway observability, and fallback, TokenMix.ai is the more complete production pattern.

Step 9: Use TokenMix.ai For Routing

TokenMix.ai lets you call Claude and 300+ other models through an OpenAI-compatible endpoint. Use it when the product needs model choice, not one-provider purity.

from openai import OpenAI

client = OpenAI(
    api_key="your-tokenmix-key",
    base_url="https://api.tokenmix.ai/v1",
)

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[
        {"role": "system", "content": "You are a concise API tutor."},
        {"role": "user", "content": "Explain retries for 529 errors."},
    ],
)

print(response.choices[0].message.content)
TokenMix.ai helps when... Why
You need Claude plus GPT/Gemini/DeepSeek/Kimi One API surface
Claude hits 429 or 529 Add fallback policy
Some tasks are too cheap for Sonnet Route to Haiku or non-Claude models
Finance wants one bill Unified usage tracking
You are migrating from OpenAI SDK OpenAI-compatible call shape

Cost Control Checklist

Control Why it matters
Start with Sonnet 4.6 Best default balance
Use Haiku 4.5 for simple tasks Cuts cost on routing/classification
Reserve Opus 4.7 for hard tasks Avoid premium price on routine calls
Use prompt caching Repeated context gets cheaper
Use Batch API Async jobs get 50% token discount
Lower unnecessary output budgets Output tokens dominate cost
Log model ID and token usage Debug cost drift
Monitor 429 and 529 separately Rate limits and overload are different
Add fallback chains Prevent one provider from becoming downtime

Example monthly cost for 100M input and 30M output:

Route Approx cost
All Sonnet 4.6 $750
All Haiku 4.5 $250
All Opus 4.7 $1,250
70% Haiku + 25% Sonnet + 5% Opus About $425
Same routed mix with half async batch About $320

Common Errors

Error Likely cause Fix
401 authentication_error Bad or missing key Regenerate key and check environment
400 invalid_request_error Wrong schema or unsupported field Compare with Messages API docs
404 not_found_error for model Old or unavailable model ID Check model deprecations and current model catalog
413 request_too_large Body exceeds endpoint size Split files or use Files API where appropriate
429 rate_limit_error RPM/ITPM/OTPM/spend/workspace limit Backoff, cache, batch, route, or raise limits
500 api_error Server error Bounded retry and request ID logging
504 timeout_error Long blocking request Stream or use Message Batches
529 overloaded_error Anthropic overload Retry and fail over

Use our Claude Rate Exceeded guide and Claude 529 guide for production retry design.

Final Recommendation

Build new Claude API apps on Sonnet 4.6, cache repeated context, batch async work, and route simple tasks away from Sonnet. Use TokenMix.ai when fallback and multi-model cost control matter.

FAQ

What Claude model should I use first?

Use Sonnet 4.6 for real quality tests. Use Haiku 4.5 for cheap SDK smoke tests. Use Opus 4.7 only when Sonnet fails or the task is high-value.

Does Claude API have free credits?

Sometimes an account may show onboarding or promotional credits, but there is no permanent free Claude API tier. Check your actual Claude Console balance.

Is claude-sonnet-4-20250514 still a good tutorial model?

No. It is deprecated and scheduled to retire. Use claude-sonnet-4-6 for new examples.

Can I use the OpenAI SDK with Claude?

Yes for compatibility testing through Anthropic's documented compatibility layer, and also through TokenMix.ai's OpenAI-compatible gateway. Use the native Anthropic SDK for the most Claude-specific features.

How much does prompt caching save?

Cache reads are priced at 10% of base input tokens. Actual savings depend on cache hit rate, context size, and how often the repeated prefix changes.

Should I use Batch API?

Yes for async workloads such as bulk summarization, enrichment, evaluation, and offline report generation. Batch pricing gives a 50% token discount.

What is the biggest Claude API cost mistake?

Sending every task to Sonnet or Opus. Route simple work to Haiku or cheaper non-Claude models, and reserve premium models for tasks where quality changes the outcome.

How does TokenMix.ai fit into a Claude API stack?

TokenMix.ai is the routing layer. Use it to call Claude, compare alternatives, track spend, and fail over when Claude is rate-limited or overloaded.

Related Articles

Sources