Claude API Tutorial: Getting Started With Anthropic's API, Prompt Caching, and Tool Use (2026)
The Claude API gives you programmatic access to Anthropic's Claude models -- the same models behind Claude.ai but with full control over prompts, parameters, and outputs. Claude excels at three things competitors struggle with: long-context analysis (200K tokens), prompt caching (90% cost reduction on repeated content), and reliable tool use. This tutorial takes you from zero to production-ready: console signup, API key, first call, prompt caching implementation, streaming, and tool use. Python and TypeScript examples for every step. All code tested against the live Anthropic API by TokenMix.ai in April 2026.
Table of Contents
[Quick Reference: Claude API Models and Pricing]
[Why Choose the Claude API]
[Getting Started: Console Signup and API Key]
[Your First Claude API Call in Python]
[Your First Claude API Call in TypeScript]
[Understanding Claude's Message Format]
[Prompt Caching: Cut Costs by 90%]
[Streaming Responses]
[Tool Use: Let Claude Call Your Functions]
[Structured Output With Claude]
[Using Claude Through TokenMix.ai]
[Cost Optimization From Day One]
[Common Errors and Troubleshooting]
[Claude API vs OpenAI API: Key Differences]
[Conclusion]
[FAQ]
Quick Reference: Claude API Models and Pricing
Model
Model ID
Input $/M
Output $/M
Cache Write $/M
Cache Read $/M
Context
Best For
Claude Opus 4.6
claude-opus-4-20250514
5.00
$75.00
8.75
.50
200K
Complex analysis
Claude Sonnet 4
claude-sonnet-4-20250514
$3.00
5.00
$3.75
$0.30
200K
Balanced quality/cost
Claude Haiku 3.5
claude-haiku-3-5-20241022
$0.80
$4.00
.00
$0.08
200K
Fast, affordable
Why Choose the Claude API
Three technical advantages set Claude apart from competitors.
Prompt caching. Anthropic's prompt caching is the most powerful cost optimization in the API market. Mark sections of your prompt for caching. On subsequent requests, cached tokens cost 90% less. For applications with long system prompts, RAG contexts, or repeated instructions, this is transformative. TokenMix.ai data shows Claude with caching often costs less than GPT-4.1 mini despite higher base pricing.
Long context. All Claude models support 200K tokens of context. Unlike some providers that charge extra for long inputs, Claude's pricing is flat regardless of context length. Process entire codebases, long documents, or multi-hour transcripts in a single call.
Reliable tool use. Claude's tool calling implementation is among the most reliable in the industry. It follows tool schemas precisely, handles complex multi-tool scenarios well, and provides clean structured outputs.
Getting Started: Console Signup and API Key
Step 1: Create an Anthropic Account
Go to console.anthropic.com. Click "Sign up." You can register with email or Google account.
Step 2: Add Payment Method
Anthropic requires a payment method before issuing API keys. New accounts receive $5 in free credits. Add a credit card on the "Billing" page.
Step 3: Generate Your API Key
Navigate to "API Keys" in the console sidebar. Click "Create Key." Give it a name (e.g., "my-project"). Copy the key immediately -- it starts with sk-ant- and is shown only once.
If you receive a JSON response with a "content" array, your setup is complete.
Your First Claude API Call in Python
Basic Message
from anthropic import Anthropic
client = Anthropic() # Uses ANTHROPIC_API_KEY env var
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "What is a REST API? Explain in 3 sentences."}
]
)
print(response.content[0].text)
Key difference from OpenAI:max_tokens is required. Anthropic does not have a default -- you must specify how many tokens the model can generate.
With System Prompt
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are an expert Python developer. Give concise, practical answers with code examples.",
messages=[
{"role": "user", "content": "How do I read a JSON file in Python?"}
]
)
print(response.content[0].text)
In Claude's API, the system prompt is a separate parameter, not a message in the conversation array. This is different from OpenAI where system is a message role.
Multi-Turn Conversation
messages = [
{"role": "user", "content": "What is Docker?"},
]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=messages
)
# Add assistant response to continue the conversation
messages.append({"role": "assistant", "content": response.content[0].text})
messages.append({"role": "user", "content": "How do I create a Dockerfile for a Python app?"})
response2 = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=messages
)
print(response2.content[0].text)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic(); // Uses ANTHROPIC_API_KEY env var
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [
{ role: "user", content: "What is a REST API? Explain in 3 sentences." },
],
});
if (response.content[0].type === "text") {
console.log(response.content[0].text);
}
With System Prompt
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
system:
"You are an expert TypeScript developer. Give concise answers with code.",
messages: [
{ role: "user", content: "How do I make an HTTP request in Node.js?" },
],
});
Type-Safe Content Blocks
Anthropic's TypeScript SDK uses discriminated unions for content blocks. This means TypeScript knows exactly what properties are available based on the block type:
for (const block of response.content) {
switch (block.type) {
case "text":
console.log(block.text); // TypeScript knows this is string
break;
case "tool_use":
console.log(block.name); // TypeScript knows this is tool name
console.log(block.input); // TypeScript knows this is tool input
break;
}
}
Understanding Claude's Message Format
Claude's API format differs from OpenAI's in several important ways.
Request Format Comparison
# OpenAI format
response = openai_client.chat.completions.create(
model="gpt-4.1",
messages=[
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello"}
]
# max_tokens is optional
)
text = response.choices[0].message.content
# Claude format
response = anthropic_client.messages.create(
model="claude-sonnet-4-20250514",
system="You are helpful.", # Separate parameter
max_tokens=1024, # Required
messages=[
{"role": "user", "content": "Hello"}
]
)
text = response.content[0].text # Different response path
Key Format Differences
Aspect
OpenAI
Claude
System prompt
Message with role "system"
Separate system parameter
max_tokens
Optional (has default)
Required
Response text
response.choices[0].message.content
response.content[0].text
Streaming
stream=True parameter
.stream() method or stream=True
Token usage
response.usage.prompt_tokens
response.usage.input_tokens
API header
Authorization: Bearer sk-...
x-api-key: sk-ant-...
Versioning
URL-based
anthropic-version header
Prompt Caching: Cut Costs by 90%
Prompt caching is Claude's most powerful cost optimization feature. It is the single best reason to choose Claude for applications with long, repeated prompts.
How It Works
Mark sections of your prompt with cache_control: {"type": "ephemeral"}
On the first request, Anthropic caches the marked content (cache write cost: 1.25x base price)
On subsequent requests with the same prefix, cached tokens are read at 10% of base price
Cache TTL is approximately 5 minutes (refreshed on each use)
Python Implementation
from anthropic import Anthropic
client = Anthropic()
# Long system prompt that stays the same across requests
KNOWLEDGE_BASE = """[Insert your 5,000-token knowledge base here]
Product documentation, FAQ answers, company policies, etc.
The longer this is, the more you save from caching."""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": KNOWLEDGE_BASE,
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "What is your return policy?"}
]
)
# Check cache performance
usage = response.usage
print(f"Input tokens: {usage.input_tokens}")
print(f"Cache write tokens: {usage.cache_creation_input_tokens}")
print(f"Cache read tokens: {usage.cache_read_input_tokens}")
First request: Cache miss. You pay 1.25x for the cached section (cache write).
Subsequent requests: Cache hit. You pay 0.10x for the cached section (cache read).
Scenario: Customer support bot with a 5,000-token knowledge base, 10,000 requests/day.
Without Caching
With Caching (90% hit)
System prompt tokens/month
1.5B input tokens
150M cached read + 150M cache write
System prompt cost (Sonnet 4)
$4,500
$495
User message cost (avg 200 tok)
80
80
Total monthly cost
$4,680
$675
Savings
--
$4,005 (86%)
This is why TokenMix.ai tracks prompt caching as the single highest-impact cost optimization available in the API market.
Streaming Responses
Python Streaming
from anthropic import Anthropic
client = Anthropic()
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain machine learning in detail."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
# Access final message after stream completes
final = stream.get_final_message()
print(f"\nTokens used: {final.usage.input_tokens} in, {final.usage.output_tokens} out")
Tool Use: Let Claude Call Your Functions
Claude's tool use is among the most reliable in the industry. It follows schemas precisely and handles multi-tool scenarios well.
Python Tool Use
from anthropic import Anthropic
import json
client = Anthropic()
tools = [
{
"name": "get_weather",
"description": "Get current weather for a city. Use this when the user asks about weather.",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name, e.g., 'London'"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit"}
},
"required": ["city"]
}
}
]
# Step 1: Send request with tools
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)
# Step 2: Check if Claude wants to use a tool
for block in response.content:
if block.type == "tool_use":
print(f"Tool: {block.name}")
print(f"Input: {block.input}")
# {"city": "Tokyo"}
# Step 3: Execute the tool and return results
tool_result = {"temperature": 22, "condition": "Partly cloudy", "unit": "celsius"}
# Step 4: Send tool result back to Claude
final_response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"},
{"role": "assistant", "content": response.content},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(tool_result)
}
]
}
]
)
print(final_response.content[0].text)
# "The weather in Tokyo is currently 22C and partly cloudy."
TypeScript Tool Use
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
tools: [
{
name: "get_weather",
description: "Get weather for a city",
input_schema: {
type: "object" as const,
properties: {
city: { type: "string", description: "City name" },
},
required: ["city"],
},
},
],
messages: [{ role: "user", content: "Weather in Tokyo?" }],
});
for (const block of response.content) {
if (block.type === "tool_use") {
console.log(`Tool: ${block.name}, Input: ${JSON.stringify(block.input)}`);
}
}
Structured Output With Claude
Using tool_choice for Guaranteed Structure
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"name": "format_country",
"description": "Format country information into structured data",
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"capital": {"type": "string"},
"population": {"type": "integer"},
"continent": {"type": "string"}
},
"required": ["name", "capital", "population", "continent"]
}
}
],
tool_choice={"type": "tool", "name": "format_country"},
messages=[{"role": "user", "content": "Tell me about Japan"}]
)
for block in response.content:
if block.type == "tool_use":
data = block.input
print(data)
# {"name": "Japan", "capital": "Tokyo", "population": 125000000, "continent": "Asia"}
This guarantees structured JSON output matching your schema -- more reliable than asking the model to output JSON in plain text.
Using Claude Through TokenMix.ai
TokenMix.ai provides access to Claude models through the OpenAI-compatible format, simplifying integration for teams already using the OpenAI SDK.
from openai import OpenAI
client = OpenAI(
api_key="tmx-your-key",
base_url="https://api.tokenmix.ai/v1"
)
# Access Claude through OpenAI-format API
response = client.chat.completions.create(
model="claude-sonnet-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello from TokenMix.ai"}
]
)
print(response.choices[0].message.content)
Benefits of using Claude through TokenMix.ai:
OpenAI-compatible format: no SDK swap needed
Unified billing across Claude + other providers
Automatic failover if Anthropic API is down
Cost tracking and spend management in one dashboard
For applications that need Claude-specific features like prompt caching, use the native Anthropic SDK alongside the TokenMix.ai connection.
Cost Optimization From Day One
Strategy 1: Choose the Right Model
Task
Best Claude Model
Cost per 100M tokens
Simple Q&A, classification
Haiku 3.5
$208
General analysis, coding, writing
Sonnet 4
$780
Complex reasoning, research
Opus 4.6
$3,750
Start with Haiku 3.5. Move up to Sonnet only when Haiku fails on your task.
Strategy 2: Implement Prompt Caching Immediately
Do not wait for production. Set up caching in development. The 90% savings on cached tokens is the largest single optimization available.
Strategy 3: Use Batch API for Async Workloads
Anthropic's Message Batches API processes requests asynchronously at 50% discount. Any workload that does not need real-time responses qualifies.
Strategy 4: Optimize max_tokens
Do not set max_tokens=4096 by default. Match it to your expected output length. While you only pay for tokens actually generated, lower max_tokens helps the model produce more concise outputs.
Strategy 5: Track Costs With TokenMix.ai
TokenMix.ai's real-time cost dashboard shows your Claude spending alongside other providers. Compare Claude costs to alternatives in real-time and identify workloads that could be served by cheaper models.
Common Errors and Troubleshooting
Error
Cause
Fix
401 authentication_error
Invalid API key
Check key starts with sk-ant-, regenerate if needed
400 invalid_request_error: max_tokens
Missing max_tokens
Always provide max_tokens parameter
400 invalid_request_error: messages
Wrong message format
System prompt must be separate parameter, not message
429 rate_limit_error
Too many requests
Implement backoff, check your tier limits
529 overloaded_error
Anthropic servers busy
Retry after 30-60 seconds
400 invalid_request_error: model
Wrong model ID
Use full ID: claude-sonnet-4-20250514
Empty content array
Content filtered
Rephrase prompt, check Anthropic's usage policy
Retry Pattern
import time
from anthropic import Anthropic, RateLimitError, APIStatusError
client = Anthropic()
def call_claude(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=messages
)
except RateLimitError:
wait = 2 ** attempt
time.sleep(wait)
except APIStatusError as e:
if e.status_code == 529: # Overloaded
time.sleep(10)
else:
raise
raise Exception("Max retries exceeded")
Claude API vs OpenAI API: Key Differences
Feature
Claude API
OpenAI API
System prompt
Separate parameter
Message with "system" role
max_tokens
Required
Optional
Response format
content[0].text
choices[0].message.content
Prompt caching
Manual, 90% savings
Automatic, 50% savings
Context window
200K (all models)
Varies (128K GPT-4.1)
Batch API
50% off
50% off
Auth header
x-api-key
Authorization: Bearer
API format
Anthropic Messages API
OpenAI Chat Completions
OpenAI SDK compatible
No (own SDK)
Yes (native)
Via TokenMix.ai
OpenAI format supported
Native
Conclusion
The Claude API offers three genuine advantages: prompt caching that cuts costs by up to 90%, reliable tool use for agent applications, and a 200K context window for long-document processing. The TypeScript SDK is the best-typed AI SDK in any language.
Getting started takes under 10 minutes: sign up, add payment, generate a key, install the SDK, make your first call. Implement prompt caching on day one -- the cost savings compound immediately.
For teams using multiple providers, TokenMix.ai provides access to Claude alongside all other models through a single endpoint. Use the native Anthropic SDK for Claude-specific features like advanced caching, and TokenMix.ai for unified billing and failover.
Start with Haiku 3.5 for cost-efficient prototyping. Move to Sonnet 4 for production quality. Reserve Opus 4.6 for tasks that genuinely need frontier reasoning. This tiered approach keeps costs manageable while accessing Claude's full capabilities.
FAQ
How do I get started with the Claude API?
Sign up at console.anthropic.com, add a payment method (you get $5 free credit), generate an API key, and install the SDK (pip install anthropic for Python or npm install @anthropic-ai/sdk for Node.js). Set your key as the ANTHROPIC_API_KEY environment variable. Your first API call takes under 10 minutes from signup. See the step-by-step guide in this article.
How much does the Claude API cost?
Claude Haiku 3.5 costs $0.80/M input and $4.00/M output tokens. Claude Sonnet 4 costs $3.00/
5.00. Claude Opus 4.6 costs
5.00/$75.00. With prompt caching (90% discount on cached tokens), effective costs drop dramatically -- a Sonnet 4 application with 80% cache hit rate pays roughly
.20/M effective input cost. Track actual costs in real-time on TokenMix.ai.
What is Claude prompt caching and how much does it save?
Prompt caching lets you mark parts of your prompt for server-side caching. On subsequent requests, cached tokens cost 90% less (e.g., Sonnet 4 cached reads cost $0.30/M instead of $3.00/M). For applications with long system prompts or repeated context, this saves 50-90% on input costs. Cache TTL is approximately 5 minutes, refreshed on each use.
Can I use the Claude API with the OpenAI SDK?
Not directly. Claude uses a different API format (the Messages API). You need the anthropic SDK for native access. However, TokenMix.ai provides an OpenAI-compatible endpoint for Claude models, letting you use the openai SDK with base_url="https://api.tokenmix.ai/v1" and model="claude-sonnet-4". This simplifies multi-provider architectures.
What is the difference between Claude Haiku, Sonnet, and Opus?
Haiku 3.5 is the fastest and cheapest -- use it for simple tasks, classification, and high-volume workloads. Sonnet 4 is the balanced option for general-purpose coding, analysis, and writing. Opus 4.6 is the most capable model for complex reasoning, research, and tasks requiring deep analysis. Start with Haiku, upgrade to Sonnet when quality demands it.
How does Claude tool use work?
Define tools with a name, description, and JSON schema for inputs. Send the tool definitions with your request. Claude responds with a tool_use content block containing the tool name and arguments. Execute the tool in your code, then send the result back as a tool_result message. Claude incorporates the result into its final response. See the complete code examples in the Tool Use section above.