TokenMix Research Lab · 2026-04-12

AI API for Python Developers: Complete Guide to OpenAI, Anthropic, and Google SDKs (2026)
Last Updated: 2026-04-29
Author: TokenMix Research Lab
Python AI SDK landscape: openai works with 5+ providers (OpenAI/DeepSeek/Groq/Mistral/TokenMix.ai/Together/Perplexity) — most versatile. anthropic for Claude (90% cache savings unique). google-genai for Gemini (most generous free tier — 15 RPM/1500 RPD no card). All have async support + type hints + streaming. Best architecture: openai SDK + TokenMix.ai endpoint = one installation, 300+ models, single billing.
Python is the default language for AI API integration. Every major provider ships a Python SDK, and the openai package alone works with five or more providers through OpenAI-compatible endpoints. This tutorial covers every Python SDK you need: openai (works with OpenAI, DeepSeek, Groq, Mistral, and TokenMix.ai), anthropic (for Claude models), and google-genai (for Gemini models). Code examples for each, feature comparison, and a clear recommendation for which SDK to use when. All examples tested and verified by TokenMix.ai as of April 2026.
Table of Contents
- Quick SDK Comparison for Python
- Prerequisites and Setup
- The openai Python SDK: One SDK, Five+ Providers
- The anthropic Python SDK: Claude Models
- The google-genai Python SDK: Gemini Models
- Streaming Responses in Python
- Tool Calling and Function Execution
- Structured Output: Getting JSON From Models
- Using TokenMix.ai With the openai Python SDK
- Full SDK Feature Comparison Table
- Cost Comparison for Python Developers
- Which Python AI SDK Should You Use?
- What's the Bottom Line on Python AI SDKs?
- FAQ
Quick SDK Comparison for Python
Three official SDKs ranked by versatility: openai (works with 7+ providers via base_url switch — most flexible). anthropic (Claude only — best caching). google-genai (Gemini only — best free tier). All support: async, streaming, type hints, auto-retries (limited on Google). Python minimum: openai/anthropic 3.8+, google-genai 3.9+. SDK choice less important than architecture: openai SDK + TokenMix.ai = one SDK + 300+ models.
| Feature | openai | anthropic | google-genai |
|---|---|---|---|
| Install | pip install openai |
pip install anthropic |
pip install google-genai |
| Providers Supported | OpenAI, DeepSeek, Groq, Mistral, TokenMix.ai, Together, Perplexity | Anthropic only | Google only |
| Type Hints | Excellent | Excellent | Good |
| Async Support | Yes (AsyncOpenAI) | Yes (AsyncAnthropic) | Yes |
| Streaming | Async iterators | Event stream | Async iterators |
| Auto-Retries | Yes (configurable) | Yes (configurable) | Limited |
| Latest Version | 1.x | 0.49+ | 1.x |
| Python Minimum | 3.8+ | 3.8+ | 3.9+ |
Prerequisites and Setup
Three requirements: Python 3.9+ (3.11+ recommended for async perf), at least one provider API key, virtual environment to isolate AI dependencies. Install all three SDKs in one venv: pip install openai anthropic google-genai. Set API keys as env vars (NEVER hardcode in source). Setup time: 5-10 minutes from zero to first API call. Use os.getenv() for portable config across environments.
Before writing any code, you need:
- Python 3.9+ (3.11+ recommended for best async performance)
- An API key from at least one provider
- A virtual environment (always isolate AI dependencies)
# Create project and virtual environment
mkdir ai-api-project && cd ai-api-project
python -m venv venv
source venv/bin/activate # Linux/Mac
# venv\Scripts\activate # Windows
# Install all three SDKs
pip install openai anthropic google-genai
# Set API keys as environment variables
export OPENAI_API_KEY="sk-your-key"
export ANTHROPIC_API_KEY="sk-ant-your-key"
export GOOGLE_API_KEY="your-google-key"
Security rule: Never hardcode API keys in source files. Always use environment variables or a secrets manager.
The openai Python SDK: One SDK, Five+ Providers
Most versatile SDK — works with any OpenAI-compatible endpoint. Switch providers by changing base_url only: OpenAI → DeepSeek (api.deepseek.com) → Groq (api.groq.com/openai/v1) → Mistral → TokenMix.ai. Same client.chat.completions.create() call across all. AsyncOpenAI for concurrent calls. Specific error classes (RateLimitError, AuthenticationError, APIError) enable precise exception handling. Auto-retry with exponential backoff (2 retries default).
The openai package is the most versatile Python AI SDK. It works with any provider that implements the OpenAI chat completions API format.
Basic Chat Completion
from openai import OpenAI
client = OpenAI() # Uses OPENAI_API_KEY env var
response = client.chat.completions.create(
model="gpt-4.1",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.choices[0].message.content)
# Output: The capital of France is Paris.
Using the Same SDK With DeepSeek
from openai import OpenAI
# Only the base_url and api_key change
client = OpenAI(
api_key="dsk-your-deepseek-key",
base_url="https://api.deepseek.com"
)
response = client.chat.completions.create(
model="deepseek-chat", # [DeepSeek V4](https://tokenmix.ai/blog/deepseek-api-pricing)
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.choices[0].message.content)
Using the Same SDK With Groq
from openai import OpenAI
client = OpenAI(
api_key="gsk-your-groq-key",
base_url="https://api.groq.com/openai/v1"
)
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.choices[0].message.content)
Async Usage
import asyncio
from openai import AsyncOpenAI
async def main():
client = AsyncOpenAI()
response = await client.chat.completions.create(
model="gpt-4.1-mini",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
asyncio.run(main())
Error Handling
from openai import OpenAI, RateLimitError, APIError, AuthenticationError
client = OpenAI()
try:
response = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "Hello"}]
)
except AuthenticationError:
print("Invalid API key. Check your OPENAI_API_KEY.")
except RateLimitError as e:
print(f"Rate limited. Retry after: {e.response.headers.get('retry-after')}s")
except APIError as e:
print(f"API error: {e.status_code} - {e.message}")
The anthropic Python SDK: Claude Models
Three key API differences from OpenAI SDK: (1) max_tokens required (OpenAI defaults). (2) System prompt is separate parameter, not a message role. (3) Response structure: response.content[0].text (not response.choices[0].message.content). Unique advantage: prompt caching via cache_control: {"type": "ephemeral"} saves up to 90% on cached tokens — single biggest cost optimization for long system prompts. AsyncAnthropic for concurrent calls.
Anthropic's SDK has a different API design. It is not OpenAI-compatible, but it is clean and well-typed.
Basic Message
from anthropic import Anthropic
client = Anthropic() # Uses ANTHROPIC_API_KEY env var
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.content[0].text)
Key differences from OpenAI SDK:
max_tokensis required (OpenAI defaults to model max)- System prompt is a separate parameter, not a message
- Response structure uses
response.content[0].textnotresponse.choices[0].message.content
System Prompt
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are an expert Python developer. Give concise, practical answers.",
messages=[
{"role": "user", "content": "How do I read a CSV file?"}
]
)
Prompt Caching (Claude's Unique Advantage)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are an AI assistant with access to a large knowledge base...",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "Summarize the key points."}
]
)
# Check cache performance
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")
Prompt caching reduces costs by up to 90% on cached tokens. For applications with long system prompts, this is the single biggest cost optimization available.
Async Usage
import asyncio
from anthropic import AsyncAnthropic
async def main():
client = AsyncAnthropic()
response = await client.messages.create(
model="claude-haiku-3-5-20241022",
max_tokens=256,
messages=[{"role": "user", "content": "Hello"}]
)
print(response.content[0].text)
asyncio.run(main())
The google-genai Python SDK: Gemini Models
API patterns: client.models.generate_content() (single call), client.chats.create() for multi-turn (retains context automatically), client.models.generate_content_stream() for streaming. System instruction via types.GenerateContentConfig(system_instruction=...). Free tier: Gemini 2.0 Flash 15 RPM/1,500 RPD/no card — sufficient to build + test complete applications. Most generous free tier in industry — best SDK for prototyping/learning before paying.
Google's SDK takes a different approach with its own API design.
Basic Generation
from google import genai
client = genai.Client() # Uses GOOGLE_API_KEY env var
response = client.models.generate_content(
model="gemini-2.0-flash",
contents="What is the capital of France?"
)
print(response.text)
With System Instruction
from google.genai import types
response = client.models.generate_content(
model="gemini-2.0-flash",
config=types.GenerateContentConfig(
system_instruction="You are an expert Python developer.",
temperature=0.7,
max_output_tokens=1024
),
contents="How do I read a CSV file?"
)
print(response.text)
Multi-Turn Conversation
chat = client.chats.create(model="gemini-2.0-flash")
response1 = chat.send_message("What is Python?")
print(response1.text)
response2 = chat.send_message("What are its main advantages?")
print(response2.text) # Retains context from previous turn
Free Tier Usage
Google Gemini's free tier is the most generous in the industry. No credit card required. Gemini 2.0 Flash allows 15 requests per minute and 1,500 requests per day at zero cost. This is enough to build and test complete applications.
Streaming Responses in Python
Three streaming patterns. openai: stream=True parameter, iterate chunk.choices[0].delta.content. anthropic: context manager with client.messages.stream(...) then iterate stream.text_stream. google-genai: client.models.generate_content_stream(), iterate chunk.text. All async-compatible. Critical for chat interfaces — without streaming, users wait full generation time. With streaming, first tokens appear within TTFT (80ms-300ms) creating instant feel.
Streaming is essential for chat interfaces. Here is how to stream with each SDK.
Streaming With openai SDK
from openai import OpenAI
client = OpenAI()
stream = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "Write a haiku about Python."}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Streaming With anthropic SDK
from anthropic import Anthropic
client = Anthropic()
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a haiku about Python."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Streaming With google-genai SDK
from google import genai
client = genai.Client()
response = client.models.generate_content_stream(
model="gemini-2.0-flash",
contents="Write a haiku about Python."
)
for chunk in response:
print(chunk.text, end="", flush=True)
Tool Calling and Function Execution
Tool calling lets models invoke Python functions. openai: define tools array with JSON Schema, model returns tool_calls array (parse function.arguments as JSON). anthropic: tools array with input_schema, response contains tool_use blocks (already parsed input). Both 95%+ reliability on simple schemas. Complex multi-tool sequences: GPT 94% accuracy vs Claude similar. Test edge case inputs — argument formatting can vary between providers.
Tool calling lets models invoke your Python functions. Each SDK handles it differently.
Tool Calling With openai SDK
import json
from openai import OpenAI
client = OpenAI()
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
]
response = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools
)
# Check if the model wants to call a function
if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
print(f"Model wants to call: {tool_call.function.name}({args})")
Tool Calling With anthropic SDK
from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
],
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)
for block in response.content:
if block.type == "tool_use":
print(f"Tool: {block.name}, Input: {block.input}")
Structured Output: Getting JSON From Models
Three approaches. openai: response_format={"type": "json_object"} JSON mode + system prompt with desired schema. JSON validity 99.5% with structured output mode (token-level constraint). anthropic: use tool_choice with input_schema defining structure — model "calls" your tool with structured data. Both reliable for production. google-genai supports response_schema parameter. Always parse JSON output before downstream use to catch occasional invalid responses.
JSON Mode With openai SDK
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4.1",
messages=[
{"role": "system", "content": "Return JSON with keys: name, capital, population"},
{"role": "user", "content": "Tell me about Japan"}
],
response_format={"type": "json_object"}
)
import json
data = json.loads(response.choices[0].message.content)
print(data) # {"name": "Japan", "capital": "Tokyo", "population": 125000000}
Structured Output With anthropic SDK
from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"name": "format_country_info",
"description": "Format country information",
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"capital": {"type": "string"},
"population": {"type": "integer"}
},
"required": ["name", "capital", "population"]
}
}
],
tool_choice={"type": "tool", "name": "format_country_info"},
messages=[{"role": "user", "content": "Tell me about Japan"}]
)
# Extract structured data from tool use
for block in response.content:
if block.type == "tool_use":
print(block.input) # {"name": "Japan", "capital": "Tokyo", "population": 125000000}
Using TokenMix.ai With the openai Python SDK
Standard openai Python SDK + base_url change = access 300+ models from all providers. Single OpenAI() client with base_url="https://api.tokenmix.ai/v1". Use any model name as parameter: gpt-4.1, claude-sonnet-4, deepseek-chat, gemini-2.0-flash — TokenMix.ai handles provider translation behind the scenes. One pip install, one API key, one billing dashboard. Eliminates managing 3-5 SDK installations + 3-5 separate API keys.
TokenMix.ai works with the standard openai Python SDK. Change the base URL and API key to access 300+ models from all providers through a single endpoint.
from openai import OpenAI
# One client, all providers
client = OpenAI(
api_key="tmx-your-key",
base_url="https://api.tokenmix.ai/v1"
)
# Use OpenAI models
response = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)
# Use Claude models (via OpenAI-format endpoint)
response = client.chat.completions.create(
model="claude-sonnet-4",
messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)
# Use DeepSeek models
response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)
The advantage: one SDK installation, one API key, one billing dashboard, 300+ models. TokenMix.ai handles the provider translation behind the scenes.
Full SDK Feature Comparison Table
3 SDKs × 14 dimensions. Multi-provider via base_url: openai only. Prompt caching: anthropic manual but powerful (90% off), openai automatic, google context caching. Batch API 50% off: openai + anthropic only (google none). Auto-retry: openai/anthropic 2 retries default, google limited. Embeddings: openai + google (anthropic uses Voyage). Async client: AsyncOpenAI/AsyncAnthropic/google async methods. Min Python: openai/anthropic 3.8, google 3.9.
| Feature | openai (Python) | anthropic (Python) | google-genai (Python) |
|---|---|---|---|
| Chat completions | Yes | Yes (messages API) | Yes (generate_content) |
| Streaming | Yes (async iterator) | Yes (event stream) | Yes (generate_content_stream) |
| Tool calling | Yes | Yes | Yes |
| JSON mode | Yes (response_format) | Via tool_choice | Yes (response_schema) |
| Vision/images | Yes | Yes | Yes |
| Embeddings | Yes | No (use Voyage) | Yes |
| Prompt caching | Automatic | Manual (powerful) | Context caching |
| Batch API | Yes (50% discount) | Yes (50% discount) | No |
| Auto-retry | Yes (2 retries default) | Yes (2 retries default) | Limited |
| Timeout config | Yes | Yes | Yes |
| Type hints | Excellent | Excellent | Good |
| Async client | AsyncOpenAI | AsyncAnthropic | Async methods |
| Min Python | 3.8 | 3.8 | 3.9 |
| Multi-provider | Yes (via base_url) | No | No |
Cost Comparison for Python Developers
5 use case tiers: Personal/learning (1K req/mo, 500 tokens) = Gemini Flash free $0. Prototype/MVP (10K req/mo, 1K tokens) = GPT-4.1 mini $8.80. Side project production (50K req/mo) = DeepSeek V4 $41. Small SaaS (200K req/mo) = GPT-4.1 mini $176. High-traffic (1M req/mo) = mixed via TokenMix.ai $400-800. Free tier sufficient for learning + light prototyping. Production starts at $10-200/mo for most apps.
Typical Python development patterns and their costs:
| Use Case | Requests/Month | Avg Tokens/Req | Best Model | Monthly Cost |
|---|---|---|---|---|
| Personal project/learning | 1,000 | 500 | Gemini Flash (free) | $0 |
| Prototype/MVP | 10,000 | 1,000 | GPT-4.1 mini | $8.80 |
| Side project in production | 50,000 | 1,500 | DeepSeek V4 | $41 |
| Small SaaS product | 200,000 | 2,000 | GPT-4.1 mini | $176 |
| High-traffic application | 1,000,000 | 2,000 | Mixed via TokenMix.ai | $400-$800 |
Which Python AI SDK Should You Use?
Want one SDK for everything: openai (via TokenMix.ai) — works with all OpenAI-compatible providers. Need prompt caching: anthropic (90% savings on long prompts). Free tier prototyping: google-genai (most generous, no card). Building agent with tools: openai or anthropic (most mature tool calling). Fastest setup: openai (best docs, largest community). Multi-provider production: openai SDK + TokenMix.ai endpoint = one SDK + 300+ models.
| Your Situation | Recommended SDK | Why |
|---|---|---|
| Want one SDK for everything | openai (via TokenMix.ai) | Works with all OpenAI-compatible providers |
| Need prompt caching | anthropic | Best caching implementation |
| Free tier prototyping | google-genai | Most generous free tier |
| Building agent with tools | openai or anthropic | Most mature tool calling |
| Need fastest setup | openai | Best docs, largest community |
| Multi-provider production app | openai (via TokenMix.ai) | One SDK, 300+ models |
What's the Bottom Line on Python AI SDKs?
Most efficient architecture: openai SDK pointed at TokenMix.ai endpoint = one installation, one API key, one billing account, 300+ models. Add anthropic SDK only when you need prompt caching for long system prompts. Add google-genai only for free tier development. Start with one SDK, add others as needs grow. The code examples in this guide are production-ready starting points — copy them, change API keys, ship features.
For Python developers, the openai SDK is the most practical starting point. It works with OpenAI directly and with five or more additional providers through base_url configuration. The anthropic SDK is essential if you need Claude's prompt caching or prefer Claude's response quality. The google-genai SDK is your best option for free-tier development.
The most efficient architecture: use the openai SDK pointed at TokenMix.ai's endpoint. You get access to every major model through one installation, one API key, and one billing account. When you need provider-specific features like Anthropic's prompt caching, use the native SDK alongside.
Start with one SDK. Add others as your needs grow. The code examples in this guide are production-ready starting points.
FAQ
What is the best Python SDK for AI API development?
The openai Python package is the most versatile SDK because it works with OpenAI and any OpenAI-compatible provider (DeepSeek, Groq, Mistral, TokenMix.ai). For Claude-specific features like prompt caching, use the anthropic SDK. For free-tier development, use google-genai. Most production Python applications use two SDKs: openai as the primary and one provider-specific SDK for specialized features.
Can I use the openai Python SDK with non-OpenAI providers?
Yes. Any provider implementing the OpenAI chat completions API format works with the openai SDK. Set base_url to the provider's endpoint. DeepSeek, Groq, Mistral, Together AI, Perplexity, and TokenMix.ai all support this. TokenMix.ai's endpoint gives you access to 300+ models from all providers through the openai SDK.
How do I handle rate limits in Python AI SDKs?
The openai and anthropic SDKs include automatic retry with exponential backoff (2 retries by default). For custom handling, catch RateLimitError and implement your own backoff logic. Monitor rate limits headers in responses: x-ratelimit-remaining-requests and x-ratelimit-remaining-tokens. TokenMix.ai pools rate limits across providers, reducing the likelihood of hitting limits.
What Python version do I need for AI API development?
Python 3.9 or higher is recommended. The openai and anthropic SDKs support Python 3.8+, but google-genai requires 3.9+. Python 3.11+ offers the best async performance, which matters for streaming responses and concurrent API calls.
How do I reduce AI API costs in Python applications?
Five strategies: use prompt caching (Anthropic saves up to 90%), use batch API for non-real-time work (50% discount on OpenAI and Anthropic), route simple queries to cheaper models, implement response caching for repeated queries, and use TokenMix.ai's cost-optimized routing. The biggest savings come from model selection -- using GPT-4.1 mini instead of GPT-4.1 cuts costs by 80%.
Is async important for Python AI API calls?
Yes, for production applications. Async allows concurrent API calls without blocking your server. A synchronous Flask app handles one API call at a time per worker. An async FastAPI app handles dozens concurrently. Use AsyncOpenAI or AsyncAnthropic for web applications that serve multiple users simultaneously.
Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: OpenAI Python SDK, Anthropic Python SDK, Google GenAI SDK + TokenMix.ai