TokenMix Research Lab · 2026-04-12

AI API for Python Developers: Complete Guide to OpenAI, Anthropic, and Google SDKs (2026)
Python is the default language for AI API integration. Every major provider ships a Python SDK, and the openai package alone works with five or more providers through OpenAI-compatible endpoints. This tutorial covers every Python SDK you need: openai (works with OpenAI, DeepSeek, Groq, Mistral, and TokenMix.ai), anthropic (for Claude models), and google-genai (for Gemini models). Code examples for each, feature comparison, and a clear recommendation for which SDK to use when. All examples tested and verified by TokenMix.ai as of April 2026.
Table of Contents
- [Quick SDK Comparison for Python]
- [Prerequisites and Setup]
- [The openai Python SDK: One SDK, Five+ Providers]
- [The anthropic Python SDK: Claude Models]
- [The google-genai Python SDK: Gemini Models]
- [Streaming Responses in Python]
- [Tool Calling and Function Execution]
- [Structured Output: Getting JSON From Models]
- [Using TokenMix.ai With the openai Python SDK]
- [Full SDK Feature Comparison Table]
- [Cost Comparison for Python Developers]
- [Decision Guide: Which Python AI SDK Should You Use]
- [Conclusion]
- [FAQ]
Quick SDK Comparison for Python
| Feature | openai | anthropic | google-genai |
|---|---|---|---|
| Install | pip install openai |
pip install anthropic |
pip install google-genai |
| Providers Supported | OpenAI, DeepSeek, Groq, Mistral, TokenMix.ai, Together, Perplexity | Anthropic only | Google only |
| Type Hints | Excellent | Excellent | Good |
| Async Support | Yes (AsyncOpenAI) | Yes (AsyncAnthropic) | Yes |
| Streaming | Async iterators | Event stream | Async iterators |
| Auto-Retries | Yes (configurable) | Yes (configurable) | Limited |
| Latest Version | 1.x | 0.49+ | 1.x |
| Python Minimum | 3.8+ | 3.8+ | 3.9+ |
Prerequisites and Setup
Before writing any code, you need:
- Python 3.9+ (3.11+ recommended for best async performance)
- An API key from at least one provider
- A virtual environment (always isolate AI dependencies)
# Create project and virtual environment
mkdir ai-api-project && cd ai-api-project
python -m venv venv
source venv/bin/activate # Linux/Mac
# venv\Scripts\activate # Windows
# Install all three SDKs
pip install openai anthropic google-genai
# Set API keys as environment variables
export OPENAI_API_KEY="sk-your-key"
export ANTHROPIC_API_KEY="sk-ant-your-key"
export GOOGLE_API_KEY="your-google-key"
Security rule: Never hardcode API keys in source files. Always use environment variables or a secrets manager.
The openai Python SDK: One SDK, Five+ Providers
The openai package is the most versatile Python AI SDK. It works with any provider that implements the OpenAI chat completions API format.
Basic Chat Completion
from openai import OpenAI
client = OpenAI() # Uses OPENAI_API_KEY env var
response = client.chat.completions.create(
model="gpt-4.1",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.choices[0].message.content)
# Output: The capital of France is Paris.
Using the Same SDK With DeepSeek
from openai import OpenAI
# Only the base_url and api_key change
client = OpenAI(
api_key="dsk-your-deepseek-key",
base_url="https://api.deepseek.com"
)
response = client.chat.completions.create(
model="deepseek-chat", # [DeepSeek V4](https://tokenmix.ai/blog/deepseek-api-pricing)
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.choices[0].message.content)
Using the Same SDK With Groq
from openai import OpenAI
client = OpenAI(
api_key="gsk-your-groq-key",
base_url="https://api.groq.com/openai/v1"
)
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.choices[0].message.content)
Async Usage
import asyncio
from openai import AsyncOpenAI
async def main():
client = AsyncOpenAI()
response = await client.chat.completions.create(
model="gpt-4.1-mini",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
asyncio.run(main())
Error Handling
from openai import OpenAI, RateLimitError, APIError, AuthenticationError
client = OpenAI()
try:
response = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "Hello"}]
)
except AuthenticationError:
print("Invalid API key. Check your OPENAI_API_KEY.")
except RateLimitError as e:
print(f"Rate limited. Retry after: {e.response.headers.get('retry-after')}s")
except APIError as e:
print(f"API error: {e.status_code} - {e.message}")
The anthropic Python SDK: Claude Models
Anthropic's SDK has a different API design. It is not OpenAI-compatible, but it is clean and well-typed.
Basic Message
from anthropic import Anthropic
client = Anthropic() # Uses ANTHROPIC_API_KEY env var
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.content[0].text)
Key differences from OpenAI SDK:
max_tokensis required (OpenAI defaults to model max)- System prompt is a separate parameter, not a message
- Response structure uses
response.content[0].textnotresponse.choices[0].message.content
System Prompt
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are an expert Python developer. Give concise, practical answers.",
messages=[
{"role": "user", "content": "How do I read a CSV file?"}
]
)
Prompt Caching (Claude's Unique Advantage)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are an AI assistant with access to a large knowledge base...",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "Summarize the key points."}
]
)
# Check cache performance
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")
Prompt caching reduces costs by up to 90% on cached tokens. For applications with long system prompts, this is the single biggest cost optimization available.
Async Usage
import asyncio
from anthropic import AsyncAnthropic
async def main():
client = AsyncAnthropic()
response = await client.messages.create(
model="claude-haiku-3-5-20241022",
max_tokens=256,
messages=[{"role": "user", "content": "Hello"}]
)
print(response.content[0].text)
asyncio.run(main())
The google-genai Python SDK: Gemini Models
Google's SDK takes a different approach with its own API design.
Basic Generation
from google import genai
client = genai.Client() # Uses GOOGLE_API_KEY env var
response = client.models.generate_content(
model="gemini-2.0-flash",
contents="What is the capital of France?"
)
print(response.text)
With System Instruction
from google.genai import types
response = client.models.generate_content(
model="gemini-2.0-flash",
config=types.GenerateContentConfig(
system_instruction="You are an expert Python developer.",
temperature=0.7,
max_output_tokens=1024
),
contents="How do I read a CSV file?"
)
print(response.text)
Multi-Turn Conversation
chat = client.chats.create(model="gemini-2.0-flash")
response1 = chat.send_message("What is Python?")
print(response1.text)
response2 = chat.send_message("What are its main advantages?")
print(response2.text) # Retains context from previous turn
Free Tier Usage
Google Gemini's free tier is the most generous in the industry. No credit card required. Gemini 2.0 Flash allows 15 requests per minute and 1,500 requests per day at zero cost. This is enough to build and test complete applications.
Streaming Responses in Python
Streaming is essential for chat interfaces. Here is how to stream with each SDK.
Streaming With openai SDK
from openai import OpenAI
client = OpenAI()
stream = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "Write a haiku about Python."}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Streaming With anthropic SDK
from anthropic import Anthropic
client = Anthropic()
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a haiku about Python."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Streaming With google-genai SDK
from google import genai
client = genai.Client()
response = client.models.generate_content_stream(
model="gemini-2.0-flash",
contents="Write a haiku about Python."
)
for chunk in response:
print(chunk.text, end="", flush=True)
Tool Calling and Function Execution
Tool calling lets models invoke your Python functions. Each SDK handles it differently.
Tool Calling With openai SDK
import json
from openai import OpenAI
client = OpenAI()
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
]
response = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools
)
# Check if the model wants to call a function
if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
print(f"Model wants to call: {tool_call.function.name}({args})")
Tool Calling With anthropic SDK
from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
],
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)
for block in response.content:
if block.type == "tool_use":
print(f"Tool: {block.name}, Input: {block.input}")
Structured Output: Getting JSON From Models
JSON Mode With openai SDK
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4.1",
messages=[
{"role": "system", "content": "Return JSON with keys: name, capital, population"},
{"role": "user", "content": "Tell me about Japan"}
],
response_format={"type": "json_object"}
)
import json
data = json.loads(response.choices[0].message.content)
print(data) # {"name": "Japan", "capital": "Tokyo", "population": 125000000}
Structured Output With anthropic SDK
from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"name": "format_country_info",
"description": "Format country information",
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"capital": {"type": "string"},
"population": {"type": "integer"}
},
"required": ["name", "capital", "population"]
}
}
],
tool_choice={"type": "tool", "name": "format_country_info"},
messages=[{"role": "user", "content": "Tell me about Japan"}]
)
# Extract structured data from tool use
for block in response.content:
if block.type == "tool_use":
print(block.input) # {"name": "Japan", "capital": "Tokyo", "population": 125000000}
Using TokenMix.ai With the openai Python SDK
TokenMix.ai works with the standard openai Python SDK. Change the base URL and API key to access 300+ models from all providers through a single endpoint.
from openai import OpenAI
# One client, all providers
client = OpenAI(
api_key="tmx-your-key",
base_url="https://api.tokenmix.ai/v1"
)
# Use OpenAI models
response = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)
# Use Claude models (via OpenAI-format endpoint)
response = client.chat.completions.create(
model="claude-sonnet-4",
messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)
# Use DeepSeek models
response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)
The advantage: one SDK installation, one API key, one billing dashboard, 300+ models. TokenMix.ai handles the provider translation behind the scenes.
Full SDK Feature Comparison Table
| Feature | openai (Python) | anthropic (Python) | google-genai (Python) |
|---|---|---|---|
| Chat completions | Yes | Yes (messages API) | Yes (generate_content) |
| Streaming | Yes (async iterator) | Yes (event stream) | Yes (generate_content_stream) |
| Tool calling | Yes | Yes | Yes |
| JSON mode | Yes (response_format) | Via tool_choice | Yes (response_schema) |
| Vision/images | Yes | Yes | Yes |
| Embeddings | Yes | No (use Voyage) | Yes |
| Prompt caching | Automatic | Manual (powerful) | Context caching |
| Batch API | Yes (50% discount) | Yes (50% discount) | No |
| Auto-retry | Yes (2 retries default) | Yes (2 retries default) | Limited |
| Timeout config | Yes | Yes | Yes |
| Type hints | Excellent | Excellent | Good |
| Async client | AsyncOpenAI | AsyncAnthropic | Async methods |
| Min Python | 3.8 | 3.8 | 3.9 |
| Multi-provider | Yes (via base_url) | No | No |
Cost Comparison for Python Developers
Typical Python development patterns and their costs:
| Use Case | Requests/Month | Avg Tokens/Req | Best Model | Monthly Cost |
|---|---|---|---|---|
| Personal project/learning | 1,000 | 500 | Gemini Flash (free) | $0 |
| Prototype/MVP | 10,000 | 1,000 | GPT-4.1 mini | $8.80 |
| Side project in production | 50,000 | 1,500 | DeepSeek V4 | $41 |
| Small SaaS product | 200,000 | 2,000 | GPT-4.1 mini |