TokenMix Research Lab ยท 2026-04-12

Claude API Tutorial 2026: 200K Context + 90% Cache Discount

Claude API Tutorial: Getting Started With Anthropic's API, Prompt Caching, and Tool Use (2026)

The Claude API gives you programmatic access to Anthropic's Claude models -- the same models behind Claude.ai but with full control over prompts, parameters, and outputs. Claude excels at three things competitors struggle with: long-context analysis (200K tokens), prompt caching (90% cost reduction on repeated content), and reliable tool use. This tutorial takes you from zero to production-ready: console signup, API key, first call, prompt caching implementation, streaming, and tool use. Python and TypeScript examples for every step. All code tested against the live Anthropic API by TokenMix.ai in April 2026.

Table of Contents


Quick Reference: Claude API Models and Pricing

Model Model ID Input $/M Output $/M Cache Write $/M Cache Read $/M Context Best For
Claude Opus 4.6 claude-opus-4-20250514 5.00 $75.00 8.75 .50 200K Complex analysis
Claude Sonnet 4 claude-sonnet-4-20250514 $3.00 5.00 $3.75 $0.30 200K Balanced quality/cost
Claude Haiku 3.5 claude-haiku-3-5-20241022 $0.80 $4.00 .00 $0.08 200K Fast, affordable

Why Choose the Claude API

Three technical advantages set Claude apart from competitors.

Prompt caching. Anthropic's prompt caching is the most powerful cost optimization in the API market. Mark sections of your prompt for caching. On subsequent requests, cached tokens cost 90% less. For applications with long system prompts, RAG contexts, or repeated instructions, this is transformative. TokenMix.ai data shows Claude with caching often costs less than GPT-4.1 mini despite higher base pricing.

Long context. All Claude models support 200K tokens of context. Unlike some providers that charge extra for long inputs, Claude's pricing is flat regardless of context length. Process entire codebases, long documents, or multi-hour transcripts in a single call.

Reliable tool use. Claude's tool calling implementation is among the most reliable in the industry. It follows tool schemas precisely, handles complex multi-tool scenarios well, and provides clean structured outputs.


Getting Started: Console Signup and API Key

Step 1: Create an Anthropic Account

Go to console.anthropic.com. Click "Sign up." You can register with email or Google account.

Step 2: Add Payment Method

Anthropic requires a payment method before issuing API keys. New accounts receive $5 in free credits. Add a credit card on the "Billing" page.

Step 3: Generate Your API Key

Navigate to "API Keys" in the console sidebar. Click "Create Key." Give it a name (e.g., "my-project"). Copy the key immediately -- it starts with sk-ant- and is shown only once.

Step 4: Install the SDK

# Python
pip install anthropic

# Node.js / TypeScript
npm install @anthropic-ai/sdk

Step 5: Set Your API Key

export ANTHROPIC_API_KEY="sk-ant-your-key-here"

Step 6: Verify With curl

curl https://api.anthropic.com/v1/messages \
  -H "content-type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-haiku-3-5-20241022",
    "max_tokens": 256,
    "messages": [{"role": "user", "content": "Hello, Claude."}]
  }'

If you receive a JSON response with a "content" array, your setup is complete.


Your First Claude API Call in Python

Basic Message

from anthropic import Anthropic

client = Anthropic()  # Uses ANTHROPIC_API_KEY env var

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is a REST API? Explain in 3 sentences."}
    ]
)

print(response.content[0].text)

Key difference from OpenAI: max_tokens is required. Anthropic does not have a default -- you must specify how many tokens the model can generate.

With System Prompt

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are an expert Python developer. Give concise, practical answers with code examples.",
    messages=[
        {"role": "user", "content": "How do I read a JSON file in Python?"}
    ]
)

print(response.content[0].text)

In Claude's API, the system prompt is a separate parameter, not a message in the conversation array. This is different from OpenAI where system is a message role.

Multi-Turn Conversation

messages = [
    {"role": "user", "content": "What is Docker?"},
]

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=messages
)

# Add assistant response to continue the conversation
messages.append({"role": "assistant", "content": response.content[0].text})
messages.append({"role": "user", "content": "How do I create a Dockerfile for a Python app?"})

response2 = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=messages
)

print(response2.content[0].text)

Async Usage

import asyncio
from anthropic import AsyncAnthropic

async def main():
    client = AsyncAnthropic()
    
    response = await client.messages.create(
        model="claude-haiku-3-5-20241022",
        max_tokens=256,
        messages=[{"role": "user", "content": "Hello, Claude."}]
    )
    
    print(response.content[0].text)

asyncio.run(main())

Your First Claude API Call in TypeScript

Basic Message

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic(); // Uses ANTHROPIC_API_KEY env var

const response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [
    { role: "user", content: "What is a REST API? Explain in 3 sentences." },
  ],
});

if (response.content[0].type === "text") {
  console.log(response.content[0].text);
}

With System Prompt

const response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  system:
    "You are an expert TypeScript developer. Give concise answers with code.",
  messages: [
    { role: "user", content: "How do I make an HTTP request in Node.js?" },
  ],
});

Type-Safe Content Blocks

Anthropic's TypeScript SDK uses discriminated unions for content blocks. This means TypeScript knows exactly what properties are available based on the block type:

for (const block of response.content) {
  switch (block.type) {
    case "text":
      console.log(block.text); // TypeScript knows this is string
      break;
    case "tool_use":
      console.log(block.name); // TypeScript knows this is tool name
      console.log(block.input); // TypeScript knows this is tool input
      break;
  }
}

Understanding Claude's Message Format

Claude's API format differs from OpenAI's in several important ways.

Request Format Comparison

# OpenAI format
response = openai_client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {"role": "system", "content": "You are helpful."},
        {"role": "user", "content": "Hello"}
    ]
    # max_tokens is optional
)
text = response.choices[0].message.content

# Claude format
response = anthropic_client.messages.create(
    model="claude-sonnet-4-20250514",
    system="You are helpful.",           # Separate parameter
    max_tokens=1024,                     # Required
    messages=[
        {"role": "user", "content": "Hello"}
    ]
)
text = response.content[0].text          # Different response path

Key Format Differences

Aspect OpenAI Claude
System prompt Message with role "system" Separate system parameter
max_tokens Optional (has default) Required
Response text response.choices[0].message.content response.content[0].text
Streaming stream=True parameter .stream() method or stream=True
Token usage response.usage.prompt_tokens response.usage.input_tokens
API header Authorization: Bearer sk-... x-api-key: sk-ant-...
Versioning URL-based anthropic-version header

Prompt Caching: Cut Costs by 90%

Prompt caching is Claude's most powerful cost optimization feature. It is the single best reason to choose Claude for applications with long, repeated prompts.

How It Works

  1. Mark sections of your prompt with cache_control: {"type": "ephemeral"}
  2. On the first request, Anthropic caches the marked content (cache write cost: 1.25x base price)
  3. On subsequent requests with the same prefix, cached tokens are read at 10% of base price
  4. Cache TTL is approximately 5 minutes (refreshed on each use)

Python Implementation

from anthropic import Anthropic

client = Anthropic()

# Long system prompt that stays the same across requests
KNOWLEDGE_BASE = """[Insert your 5,000-token knowledge base here]
Product documentation, FAQ answers, company policies, etc.
The longer this is, the more you save from caching."""

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": KNOWLEDGE_BASE,
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "What is your return policy?"}
    ]
)

# Check cache performance
usage = response.usage
print(f"Input tokens: {usage.input_tokens}")
print(f"Cache write tokens: {usage.cache_creation_input_tokens}")
print(f"Cache read tokens: {usage.cache_read_input_tokens}")

First request: Cache miss. You pay 1.25x for the cached section (cache write). Subsequent requests: Cache hit. You pay 0.10x for the cached section (cache read).

TypeScript Implementation

const response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: KNOWLEDGE_BASE,
      cache_control: { type: "ephemeral" },
    },
  ],
  messages: [{ role: "user", content: "What is your return policy?" }],
});

console.log(`Cache read: ${response.usage.cache_read_input_tokens} tokens`);
console.log(
  `Cache write: ${response.usage.cache_creation_input_tokens} tokens`
);

Cost Savings Example

Scenario: Customer support bot with a 5,000-token knowledge base, 10,000 requests/day.

Without Caching With Caching (90% hit)
System prompt tokens/month 1.5B input tokens 150M cached read + 150M cache write
System prompt cost (Sonnet 4) $4,500 $495
User message cost (avg 200 tok) 80 80
Total monthly cost $4,680 $675
Savings -- $4,005 (86%)

This is why TokenMix.ai tracks prompt caching as the single highest-impact cost optimization available in the API market.


Streaming Responses

Python Streaming

from anthropic import Anthropic

client = Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain machine learning in detail."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

TypeScript Streaming

const stream = client.messages.stream({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [
    { role: "user", content: "Explain machine learning in detail." },
  ],
});

for await (const event of stream) {
  if (
    event.type === "content_block_delta" &&
    event.delta.type === "text_delta"
  ) {
    process.stdout.write(event.delta.text);
  }
}

Streaming With Usage Data

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
    
    # Access final message after stream completes
    final = stream.get_final_message()
    print(f"\nTokens used: {final.usage.input_tokens} in, {final.usage.output_tokens} out")

Tool Use: Let Claude Call Your Functions

Claude's tool use is among the most reliable in the industry. It follows schemas precisely and handles multi-tool scenarios well.

Python Tool Use

from anthropic import Anthropic
import json

client = Anthropic()

tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a city. Use this when the user asks about weather.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name, e.g., 'London'"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit"}
            },
            "required": ["city"]
        }
    }
]

# Step 1: Send request with tools
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)

# Step 2: Check if Claude wants to use a tool
for block in response.content:
    if block.type == "tool_use":
        print(f"Tool: {block.name}")
        print(f"Input: {block.input}")
        # {"city": "Tokyo"}
        
        # Step 3: Execute the tool and return results
        tool_result = {"temperature": 22, "condition": "Partly cloudy", "unit": "celsius"}
        
        # Step 4: Send tool result back to Claude
        final_response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            tools=tools,
            messages=[
                {"role": "user", "content": "What's the weather in Tokyo?"},
                {"role": "assistant", "content": response.content},
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": json.dumps(tool_result)
                        }
                    ]
                }
            ]
        )
        
        print(final_response.content[0].text)
        # "The weather in Tokyo is currently 22C and partly cloudy."

TypeScript Tool Use

const response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  tools: [
    {
      name: "get_weather",
      description: "Get weather for a city",
      input_schema: {
        type: "object" as const,
        properties: {
          city: { type: "string", description: "City name" },
        },
        required: ["city"],
      },
    },
  ],
  messages: [{ role: "user", content: "Weather in Tokyo?" }],
});

for (const block of response.content) {
  if (block.type === "tool_use") {
    console.log(`Tool: ${block.name}, Input: ${JSON.stringify(block.input)}`);
  }
}

Structured Output With Claude

Using tool_choice for Guaranteed Structure

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "format_country",
            "description": "Format country information into structured data",
            "input_schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "capital": {"type": "string"},
                    "population": {"type": "integer"},
                    "continent": {"type": "string"}
                },
                "required": ["name", "capital", "population", "continent"]
            }
        }
    ],
    tool_choice={"type": "tool", "name": "format_country"},
    messages=[{"role": "user", "content": "Tell me about Japan"}]
)

for block in response.content:
    if block.type == "tool_use":
        data = block.input
        print(data)
        # {"name": "Japan", "capital": "Tokyo", "population": 125000000, "continent": "Asia"}

This guarantees structured JSON output matching your schema -- more reliable than asking the model to output JSON in plain text.


Using Claude Through TokenMix.ai

TokenMix.ai provides access to Claude models through the OpenAI-compatible format, simplifying integration for teams already using the OpenAI SDK.

from openai import OpenAI

client = OpenAI(
    api_key="tmx-your-key",
    base_url="https://api.tokenmix.ai/v1"
)

# Access Claude through OpenAI-format API
response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello from TokenMix.ai"}
    ]
)

print(response.choices[0].message.content)

Benefits of using Claude through TokenMix.ai:

For applications that need Claude-specific features like prompt caching, use the native Anthropic SDK alongside the TokenMix.ai connection.


Cost Optimization From Day One

Strategy 1: Choose the Right Model

Task Best Claude Model Cost per 100M tokens
Simple Q&A, classification Haiku 3.5 $208
General analysis, coding, writing Sonnet 4 $780
Complex reasoning, research Opus 4.6 $3,750

Start with Haiku 3.5. Move up to Sonnet only when Haiku fails on your task.

Strategy 2: Implement Prompt Caching Immediately

Do not wait for production. Set up caching in development. The 90% savings on cached tokens is the largest single optimization available.

Strategy 3: Use Batch API for Async Workloads

Anthropic's Message Batches API processes requests asynchronously at 50% discount. Any workload that does not need real-time responses qualifies.

Strategy 4: Optimize max_tokens

Do not set max_tokens=4096 by default. Match it to your expected output length. While you only pay for tokens actually generated, lower max_tokens helps the model produce more concise outputs.

Strategy 5: Track Costs With TokenMix.ai

TokenMix.ai's real-time cost dashboard shows your Claude spending alongside other providers. Compare Claude costs to alternatives in real-time and identify workloads that could be served by cheaper models.


Common Errors and Troubleshooting

Error Cause Fix
401 authentication_error Invalid API key Check key starts with sk-ant-, regenerate if needed
400 invalid_request_error: max_tokens Missing max_tokens Always provide max_tokens parameter
400 invalid_request_error: messages Wrong message format System prompt must be separate parameter, not message
429 rate_limit_error Too many requests Implement backoff, check your tier limits
529 overloaded_error Anthropic servers busy Retry after 30-60 seconds
400 invalid_request_error: model Wrong model ID Use full ID: claude-sonnet-4-20250514
Empty content array Content filtered Rephrase prompt, check Anthropic's usage policy

Retry Pattern

import time
from anthropic import Anthropic, RateLimitError, APIStatusError

client = Anthropic()

def call_claude(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=1024,
                messages=messages
            )
        except RateLimitError:
            wait = 2 ** attempt
            time.sleep(wait)
        except APIStatusError as e:
            if e.status_code == 529:  # Overloaded
                time.sleep(10)
            else:
                raise
    raise Exception("Max retries exceeded")

Claude API vs OpenAI API: Key Differences

Feature Claude API OpenAI API
System prompt Separate parameter Message with "system" role
max_tokens Required Optional
Response format content[0].text choices[0].message.content
Prompt caching Manual, 90% savings Automatic, 50% savings
Context window 200K (all models) Varies (128K GPT-4.1)
Batch API 50% off 50% off
Auth header x-api-key Authorization: Bearer
API format Anthropic Messages API OpenAI Chat Completions
OpenAI SDK compatible No (own SDK) Yes (native)
Via TokenMix.ai OpenAI format supported Native

Conclusion

The Claude API offers three genuine advantages: prompt caching that cuts costs by up to 90%, reliable tool use for agent applications, and a 200K context window for long-document processing. The TypeScript SDK is the best-typed AI SDK in any language.

Getting started takes under 10 minutes: sign up, add payment, generate a key, install the SDK, make your first call. Implement prompt caching on day one -- the cost savings compound immediately.

For teams using multiple providers, TokenMix.ai provides access to Claude alongside all other models through a single endpoint. Use the native Anthropic SDK for Claude-specific features like advanced caching, and TokenMix.ai for unified billing and failover.

Start with Haiku 3.5 for cost-efficient prototyping. Move to Sonnet 4 for production quality. Reserve Opus 4.6 for tasks that genuinely need frontier reasoning. This tiered approach keeps costs manageable while accessing Claude's full capabilities.


FAQ

How do I get started with the Claude API?

Sign up at console.anthropic.com, add a payment method (you get $5 free credit), generate an API key, and install the SDK (pip install anthropic for Python or npm install @anthropic-ai/sdk for Node.js). Set your key as the ANTHROPIC_API_KEY environment variable. Your first API call takes under 10 minutes from signup. See the step-by-step guide in this article.

How much does the Claude API cost?

Claude Haiku 3.5 costs $0.80/M input and $4.00/M output tokens. Claude Sonnet 4 costs $3.00/ 5.00. Claude Opus 4.6 costs 5.00/$75.00. With prompt caching (90% discount on cached tokens), effective costs drop dramatically -- a Sonnet 4 application with 80% cache hit rate pays roughly .20/M effective input cost. Track actual costs in real-time on TokenMix.ai.

What is Claude prompt caching and how much does it save?

Prompt caching lets you mark parts of your prompt for server-side caching. On subsequent requests, cached tokens cost 90% less (e.g., Sonnet 4 cached reads cost $0.30/M instead of $3.00/M). For applications with long system prompts or repeated context, this saves 50-90% on input costs. Cache TTL is approximately 5 minutes, refreshed on each use.

Can I use the Claude API with the OpenAI SDK?

Not directly. Claude uses a different API format (the Messages API). You need the anthropic SDK for native access. However, TokenMix.ai provides an OpenAI-compatible endpoint for Claude models, letting you use the openai SDK with base_url="https://api.tokenmix.ai/v1" and model="claude-sonnet-4". This simplifies multi-provider architectures.

What is the difference between Claude Haiku, Sonnet, and Opus?

Haiku 3.5 is the fastest and cheapest -- use it for simple tasks, classification, and high-volume workloads. Sonnet 4 is the balanced option for general-purpose coding, analysis, and writing. Opus 4.6 is the most capable model for complex reasoning, research, and tasks requiring deep analysis. Start with Haiku, upgrade to Sonnet when quality demands it.

How does Claude tool use work?

Define tools with a name, description, and JSON schema for inputs. Send the tool definitions with your request. Claude responds with a tool_use content block containing the tool name and arguments. Execute the tool in your code, then send the result back as a tool_result message. Claude incorporates the result into its final response. See the complete code examples in the Tool Use section above.


Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: Anthropic API Documentation, Anthropic Pricing, Anthropic Cookbook + TokenMix.ai