TokenMix Research Lab · 2026-04-12

AI API for Python Developers 2026: SDK Quick Start in 5 Minutes

AI API for Python Developers: Complete Guide to OpenAI, Anthropic, and Google SDKs (2026)

Python is the default language for AI API integration. Every major provider ships a Python SDK, and the openai package alone works with five or more providers through OpenAI-compatible endpoints. This tutorial covers every Python SDK you need: openai (works with OpenAI, DeepSeek, Groq, Mistral, and TokenMix.ai), anthropic (for Claude models), and google-genai (for Gemini models). Code examples for each, feature comparison, and a clear recommendation for which SDK to use when. All examples tested and verified by TokenMix.ai as of April 2026.

[Quick SDK Comparison for Python]
[Prerequisites and Setup]
[The openai Python SDK: One SDK, Five+ Providers]
[The anthropic Python SDK: Claude Models]
[The google-genai Python SDK: Gemini Models]
[Streaming Responses in Python]
[Tool Calling and Function Execution]
[Structured Output: Getting JSON From Models]
[Using TokenMix.ai With the openai Python SDK]
[Full SDK Feature Comparison Table]
[Cost Comparison for Python Developers]
[Decision Guide: Which Python AI SDK Should You Use]
[Conclusion]
[FAQ]

Quick SDK Comparison for Python

Feature	openai	anthropic	google-genai
Install	`pip install openai`	`pip install anthropic`	`pip install google-genai`
Providers Supported	OpenAI, DeepSeek, Groq, Mistral, TokenMix.ai, Together, Perplexity	Anthropic only	Google only
Type Hints	Excellent	Excellent	Good
Async Support	Yes (AsyncOpenAI)	Yes (AsyncAnthropic)	Yes
Streaming	Async iterators	Event stream	Async iterators
Auto-Retries	Yes (configurable)	Yes (configurable)	Limited
Latest Version	1.x	0.49+	1.x
Python Minimum	3.8+	3.8+	3.9+

Prerequisites and Setup

Before writing any code, you need:

Python 3.9+ (3.11+ recommended for best async performance)
An API key from at least one provider
A virtual environment (always isolate AI dependencies)

# Create project and virtual environment
mkdir ai-api-project && cd ai-api-project
python -m venv venv
source venv/bin/activate  # Linux/Mac
# venv\Scripts\activate   # Windows

# Install all three SDKs
pip install openai anthropic google-genai

# Set API keys as environment variables
export OPENAI_API_KEY="sk-your-key"
export ANTHROPIC_API_KEY="sk-ant-your-key"
export GOOGLE_API_KEY="your-google-key"

Security rule: Never hardcode API keys in source files. Always use environment variables or a secrets manager.

The openai Python SDK: One SDK, Five+ Providers

The openai package is the most versatile Python AI SDK. It works with any provider that implements the OpenAI chat completions API format.

Basic Chat Completion

from openai import OpenAI

client = OpenAI()  # Uses OPENAI_API_KEY env var

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)
# Output: The capital of France is Paris.

Using the Same SDK With DeepSeek

from openai import OpenAI

# Only the base_url and api_key change
client = OpenAI(
    api_key="dsk-your-deepseek-key",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-chat",  # [DeepSeek V4](https://tokenmix.ai/blog/deepseek-api-pricing)
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)

Using the Same SDK With Groq

from openai import OpenAI

client = OpenAI(
    api_key="gsk-your-groq-key",
    base_url="https://api.groq.com/openai/v1"
)

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)

Async Usage

import asyncio
from openai import AsyncOpenAI

async def main():
    client = AsyncOpenAI()
    
    response = await client.chat.completions.create(
        model="gpt-4.1-mini",
        messages=[{"role": "user", "content": "Hello"}]
    )
    
    print(response.choices[0].message.content)

asyncio.run(main())

Error Handling

from openai import OpenAI, RateLimitError, APIError, AuthenticationError

client = OpenAI()

try:
    response = client.chat.completions.create(
        model="gpt-4.1",
        messages=[{"role": "user", "content": "Hello"}]
    )
except AuthenticationError:
    print("Invalid API key. Check your OPENAI_API_KEY.")
except RateLimitError as e:
    print(f"Rate limited. Retry after: {e.response.headers.get('retry-after')}s")
except APIError as e:
    print(f"API error: {e.status_code} - {e.message}")

The anthropic Python SDK: Claude Models

Anthropic's SDK has a different API design. It is not OpenAI-compatible, but it is clean and well-typed.

Basic Message

from anthropic import Anthropic

client = Anthropic()  # Uses ANTHROPIC_API_KEY env var

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.content[0].text)

Key differences from OpenAI SDK:

max_tokens is required (OpenAI defaults to model max)
System prompt is a separate parameter, not a message
Response structure uses response.content[0].text not response.choices[0].message.content

System Prompt

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are an expert Python developer. Give concise, practical answers.",
    messages=[
        {"role": "user", "content": "How do I read a CSV file?"}
    ]
)

Prompt Caching (Claude's Unique Advantage)

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an AI assistant with access to a large knowledge base...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "Summarize the key points."}
    ]
)

# Check cache performance
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")

Prompt caching reduces costs by up to 90% on cached tokens. For applications with long system prompts, this is the single biggest cost optimization available.

Async Usage

import asyncio
from anthropic import AsyncAnthropic

async def main():
    client = AsyncAnthropic()
    
    response = await client.messages.create(
        model="claude-haiku-3-5-20241022",
        max_tokens=256,
        messages=[{"role": "user", "content": "Hello"}]
    )
    
    print(response.content[0].text)

asyncio.run(main())

The google-genai Python SDK: Gemini Models

Google's SDK takes a different approach with its own API design.

Basic Generation

from google import genai

client = genai.Client()  # Uses GOOGLE_API_KEY env var

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="What is the capital of France?"
)

print(response.text)

With System Instruction

from google.genai import types

response = client.models.generate_content(
    model="gemini-2.0-flash",
    config=types.GenerateContentConfig(
        system_instruction="You are an expert Python developer.",
        temperature=0.7,
        max_output_tokens=1024
    ),
    contents="How do I read a CSV file?"
)

print(response.text)

Multi-Turn Conversation

chat = client.chats.create(model="gemini-2.0-flash")

response1 = chat.send_message("What is Python?")
print(response1.text)

response2 = chat.send_message("What are its main advantages?")
print(response2.text)  # Retains context from previous turn

Free Tier Usage

Google Gemini's free tier is the most generous in the industry. No credit card required. Gemini 2.0 Flash allows 15 requests per minute and 1,500 requests per day at zero cost. This is enough to build and test complete applications.

Streaming Responses in Python

Streaming is essential for chat interfaces. Here is how to stream with each SDK.

Streaming With openai SDK

from openai import OpenAI

client = OpenAI()

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Write a haiku about Python."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Streaming With anthropic SDK

from anthropic import Anthropic

client = Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about Python."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Streaming With google-genai SDK

from google import genai

client = genai.Client()

response = client.models.generate_content_stream(
    model="gemini-2.0-flash",
    contents="Write a haiku about Python."
)

for chunk in response:
    print(chunk.text, end="", flush=True)

Tool Calling and Function Execution

Tool calling lets models invoke your Python functions. Each SDK handles it differently.

Tool Calling With openai SDK

import json
from openai import OpenAI

client = OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)

# Check if the model wants to call a function
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)
    print(f"Model wants to call: {tool_call.function.name}({args})")

Tool Calling With anthropic SDK

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "input_schema": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    ],
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)

for block in response.content:
    if block.type == "tool_use":
        print(f"Tool: {block.name}, Input: {block.input}")

Structured Output: Getting JSON From Models

JSON Mode With openai SDK

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {"role": "system", "content": "Return JSON with keys: name, capital, population"},
        {"role": "user", "content": "Tell me about Japan"}
    ],
    response_format={"type": "json_object"}
)

import json
data = json.loads(response.choices[0].message.content)
print(data)  # {"name": "Japan", "capital": "Tokyo", "population": 125000000}

Structured Output With anthropic SDK

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "format_country_info",
            "description": "Format country information",
            "input_schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "capital": {"type": "string"},
                    "population": {"type": "integer"}
                },
                "required": ["name", "capital", "population"]
            }
        }
    ],
    tool_choice={"type": "tool", "name": "format_country_info"},
    messages=[{"role": "user", "content": "Tell me about Japan"}]
)

# Extract structured data from tool use
for block in response.content:
    if block.type == "tool_use":
        print(block.input)  # {"name": "Japan", "capital": "Tokyo", "population": 125000000}

Using TokenMix.ai With the openai Python SDK

TokenMix.ai works with the standard openai Python SDK. Change the base URL and API key to access 300+ models from all providers through a single endpoint.

from openai import OpenAI

# One client, all providers
client = OpenAI(
    api_key="tmx-your-key",
    base_url="https://api.tokenmix.ai/v1"
)

# Use OpenAI models
response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)

# Use Claude models (via OpenAI-format endpoint)
response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)

# Use DeepSeek models
response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)

The advantage: one SDK installation, one API key, one billing dashboard, 300+ models. TokenMix.ai handles the provider translation behind the scenes.

Full SDK Feature Comparison Table

Feature	openai (Python)	anthropic (Python)	google-genai (Python)
Chat completions	Yes	Yes (messages API)	Yes (generate_content)
Streaming	Yes (async iterator)	Yes (event stream)	Yes (generate_content_stream)
Tool calling	Yes	Yes	Yes
JSON mode	Yes (response_format)	Via tool_choice	Yes (response_schema)
Vision/images	Yes	Yes	Yes
Embeddings	Yes	No (use Voyage)	Yes
Prompt caching	Automatic	Manual (powerful)	Context caching
Batch API	Yes (50% discount)	Yes (50% discount)	No
Auto-retry	Yes (2 retries default)	Yes (2 retries default)	Limited
Timeout config	Yes	Yes	Yes
Type hints	Excellent	Excellent	Good
Async client	AsyncOpenAI	AsyncAnthropic	Async methods
Min Python	3.8	3.8	3.9
Multi-provider	Yes (via base_url)	No	No

Cost Comparison for Python Developers

Typical Python development patterns and their costs:

Use Case	Requests/Month	Avg Tokens/Req	Best Model	Monthly Cost
Personal project/learning	1,000	500	Gemini Flash (free)	$0
Prototype/MVP	10,000	1,000	GPT-4.1 mini	$8.80
Side project in production	50,000	1,500	DeepSeek V4	$41
Small SaaS product	200,000	2,000	GPT-4.1 mini	76
High-traffic application	1,000,000	2,000	Mixed via TokenMix.ai	$400-$800

Decision Guide: Which Python AI SDK Should You Use

Your Situation	Recommended SDK	Why
Want one SDK for everything	openai (via TokenMix.ai)	Works with all OpenAI-compatible providers
Need prompt caching	anthropic	Best caching implementation
Free tier prototyping	google-genai	Most generous free tier
Building agent with tools	openai or anthropic	Most mature tool calling
Need fastest setup	openai	Best docs, largest community
Multi-provider production app	openai (via TokenMix.ai)	One SDK, 300+ models

Conclusion

For Python developers, the openai SDK is the most practical starting point. It works with OpenAI directly and with five or more additional providers through base_url configuration. The anthropic SDK is essential if you need Claude's prompt caching or prefer Claude's response quality. The google-genai SDK is your best option for free-tier development.

The most efficient architecture: use the openai SDK pointed at TokenMix.ai's endpoint. You get access to every major model through one installation, one API key, and one billing account. When you need provider-specific features like Anthropic's prompt caching, use the native SDK alongside.

Start with one SDK. Add others as your needs grow. The code examples in this guide are production-ready starting points.

FAQ

What is the best Python SDK for AI API development?

The openai Python package is the most versatile SDK because it works with OpenAI and any OpenAI-compatible provider (DeepSeek, Groq, Mistral, TokenMix.ai). For Claude-specific features like prompt caching, use the anthropic SDK. For free-tier development, use google-genai. Most production Python applications use two SDKs: openai as the primary and one provider-specific SDK for specialized features.

Can I use the openai Python SDK with non-OpenAI providers?

Yes. Any provider implementing the OpenAI chat completions API format works with the openai SDK. Set base_url to the provider's endpoint. DeepSeek, Groq, Mistral, Together AI, Perplexity, and TokenMix.ai all support this. TokenMix.ai's endpoint gives you access to 300+ models from all providers through the openai SDK.

How do I handle rate limits in Python AI SDKs?

The openai and anthropic SDKs include automatic retry with exponential backoff (2 retries by default). For custom handling, catch RateLimitError and implement your own backoff logic. Monitor rate limits headers in responses: x-ratelimit-remaining-requests and x-ratelimit-remaining-tokens. TokenMix.ai pools rate limits across providers, reducing the likelihood of hitting limits.

What Python version do I need for AI API development?

Python 3.9 or higher is recommended. The openai and anthropic SDKs support Python 3.8+, but google-genai requires 3.9+. Python 3.11+ offers the best async performance, which matters for streaming responses and concurrent API calls.

How do I reduce AI API costs in Python applications?

Five strategies: use prompt caching (Anthropic saves up to 90%), use batch API for non-real-time work (50% discount on OpenAI and Anthropic), route simple queries to cheaper models, implement response caching for repeated queries, and use TokenMix.ai's cost-optimized routing. The biggest savings come from model selection -- using GPT-4.1 mini instead of GPT-4.1 cuts costs by 80%.

Is async important for Python AI API calls?

Yes, for production applications. Async allows concurrent API calls without blocking your server. A synchronous Flask app handles one API call at a time per worker. An async FastAPI app handles dozens concurrently. Use AsyncOpenAI or AsyncAnthropic for web applications that serve multiple users simultaneously.

Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: OpenAI Python SDK, Anthropic Python SDK, Google GenAI SDK + TokenMix.ai

AI API for Python Developers: Complete Guide to OpenAI, Anthropic, and Google SDKs (2026)

Table of Contents

Quick SDK Comparison for Python

Prerequisites and Setup

The openai Python SDK: One SDK, Five+ Providers

Basic Chat Completion

Using the Same SDK With DeepSeek

Using the Same SDK With Groq

Async Usage

Error Handling

The anthropic Python SDK: Claude Models

Basic Message

System Prompt

Prompt Caching (Claude's Unique Advantage)

Async Usage

The google-genai Python SDK: Gemini Models

Basic Generation

With System Instruction

Multi-Turn Conversation

Free Tier Usage

Streaming Responses in Python

Streaming With openai SDK

Streaming With anthropic SDK

Streaming With google-genai SDK

Tool Calling and Function Execution

Tool Calling With openai SDK

Tool Calling With anthropic SDK

Structured Output: Getting JSON From Models

JSON Mode With openai SDK

Structured Output With anthropic SDK

Using TokenMix.ai With the openai Python SDK

Full SDK Feature Comparison Table

Cost Comparison for Python Developers

Decision Guide: Which Python AI SDK Should You Use

Conclusion

FAQ

What is the best Python SDK for AI API development?

Can I use the openai Python SDK with non-OpenAI providers?

How do I handle rate limits in Python AI SDKs?

What Python version do I need for AI API development?

How do I reduce AI API costs in Python applications?

Is async important for Python AI API calls?