TokenMix Research Lab · 2026-04-12

AI API for Python Developers 2026: SDK Quick Start in 5 Minutes

AI API for Python Developers: Complete Guide to OpenAI, Anthropic, and Google SDKs (2026)

Last Updated: 2026-04-29
Author: TokenMix Research Lab

Python AI SDK landscape: openai works with 5+ providers (OpenAI/DeepSeek/Groq/Mistral/TokenMix.ai/Together/Perplexity) — most versatile. anthropic for Claude (90% cache savings unique). google-genai for Gemini (most generous free tier — 15 RPM/1500 RPD no card). All have async support + type hints + streaming. Best architecture: openai SDK + TokenMix.ai endpoint = one installation, 300+ models, single billing.

Python is the default language for AI API integration. Every major provider ships a Python SDK, and the openai package alone works with five or more providers through OpenAI-compatible endpoints. This tutorial covers every Python SDK you need: openai (works with OpenAI, DeepSeek, Groq, Mistral, and TokenMix.ai), anthropic (for Claude models), and google-genai (for Gemini models). Code examples for each, feature comparison, and a clear recommendation for which SDK to use when. All examples tested and verified by TokenMix.ai as of April 2026.

Quick SDK Comparison for Python
Prerequisites and Setup
The openai Python SDK: One SDK, Five+ Providers
The anthropic Python SDK: Claude Models
The google-genai Python SDK: Gemini Models
Streaming Responses in Python
Tool Calling and Function Execution
Structured Output: Getting JSON From Models
Using TokenMix.ai With the openai Python SDK
Full SDK Feature Comparison Table
Cost Comparison for Python Developers
Which Python AI SDK Should You Use?
What's the Bottom Line on Python AI SDKs?
FAQ

Quick SDK Comparison for Python

Three official SDKs ranked by versatility: openai (works with 7+ providers via base_url switch — most flexible). anthropic (Claude only — best caching). google-genai (Gemini only — best free tier). All support: async, streaming, type hints, auto-retries (limited on Google). Python minimum: openai/anthropic 3.8+, google-genai 3.9+. SDK choice less important than architecture: openai SDK + TokenMix.ai = one SDK + 300+ models.

Feature	openai	anthropic	google-genai
Install	`pip install openai`	`pip install anthropic`	`pip install google-genai`
Providers Supported	OpenAI, DeepSeek, Groq, Mistral, TokenMix.ai, Together, Perplexity	Anthropic only	Google only
Type Hints	Excellent	Excellent	Good
Async Support	Yes (AsyncOpenAI)	Yes (AsyncAnthropic)	Yes
Streaming	Async iterators	Event stream	Async iterators
Auto-Retries	Yes (configurable)	Yes (configurable)	Limited
Latest Version	1.x	0.49+	1.x
Python Minimum	3.8+	3.8+	3.9+

Prerequisites and Setup

Three requirements: Python 3.9+ (3.11+ recommended for async perf), at least one provider API key, virtual environment to isolate AI dependencies. Install all three SDKs in one venv: pip install openai anthropic google-genai. Set API keys as env vars (NEVER hardcode in source). Setup time: 5-10 minutes from zero to first API call. Use os.getenv() for portable config across environments.

Before writing any code, you need:

Python 3.9+ (3.11+ recommended for best async performance)
An API key from at least one provider
A virtual environment (always isolate AI dependencies)

# Create project and virtual environment
mkdir ai-api-project && cd ai-api-project
python -m venv venv
source venv/bin/activate  # Linux/Mac
# venv\Scripts\activate   # Windows

# Install all three SDKs
pip install openai anthropic google-genai

# Set API keys as environment variables
export OPENAI_API_KEY="sk-your-key"
export ANTHROPIC_API_KEY="sk-ant-your-key"
export GOOGLE_API_KEY="your-google-key"

Security rule: Never hardcode API keys in source files. Always use environment variables or a secrets manager.

The openai Python SDK: One SDK, Five+ Providers

Most versatile SDK — works with any OpenAI-compatible endpoint. Switch providers by changing base_url only: OpenAI → DeepSeek (api.deepseek.com) → Groq (api.groq.com/openai/v1) → Mistral → TokenMix.ai. Same client.chat.completions.create() call across all. AsyncOpenAI for concurrent calls. Specific error classes (RateLimitError, AuthenticationError, APIError) enable precise exception handling. Auto-retry with exponential backoff (2 retries default).

The openai package is the most versatile Python AI SDK. It works with any provider that implements the OpenAI chat completions API format.

Basic Chat Completion

from openai import OpenAI

client = OpenAI()  # Uses OPENAI_API_KEY env var

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)
# Output: The capital of France is Paris.

Using the Same SDK With DeepSeek

from openai import OpenAI

# Only the base_url and api_key change
client = OpenAI(
    api_key="dsk-your-deepseek-key",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-chat",  # [DeepSeek V4](https://tokenmix.ai/blog/deepseek-api-pricing)
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)

Using the Same SDK With Groq

from openai import OpenAI

client = OpenAI(
    api_key="gsk-your-groq-key",
    base_url="https://api.groq.com/openai/v1"
)

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)

Async Usage

import asyncio
from openai import AsyncOpenAI

async def main():
    client = AsyncOpenAI()
    
    response = await client.chat.completions.create(
        model="gpt-4.1-mini",
        messages=[{"role": "user", "content": "Hello"}]
    )
    
    print(response.choices[0].message.content)

asyncio.run(main())

Error Handling

from openai import OpenAI, RateLimitError, APIError, AuthenticationError

client = OpenAI()

try:
    response = client.chat.completions.create(
        model="gpt-4.1",
        messages=[{"role": "user", "content": "Hello"}]
    )
except AuthenticationError:
    print("Invalid API key. Check your OPENAI_API_KEY.")
except RateLimitError as e:
    print(f"Rate limited. Retry after: {e.response.headers.get('retry-after')}s")
except APIError as e:
    print(f"API error: {e.status_code} - {e.message}")

The anthropic Python SDK: Claude Models

Three key API differences from OpenAI SDK: (1) max_tokens required (OpenAI defaults). (2) System prompt is separate parameter, not a message role. (3) Response structure: response.content[0].text (not response.choices[0].message.content). Unique advantage: prompt caching via cache_control: {"type": "ephemeral"} saves up to 90% on cached tokens — single biggest cost optimization for long system prompts. AsyncAnthropic for concurrent calls.

Anthropic's SDK has a different API design. It is not OpenAI-compatible, but it is clean and well-typed.

Basic Message

from anthropic import Anthropic

client = Anthropic()  # Uses ANTHROPIC_API_KEY env var

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.content[0].text)

Key differences from OpenAI SDK:

max_tokens is required (OpenAI defaults to model max)
System prompt is a separate parameter, not a message
Response structure uses response.content[0].text not response.choices[0].message.content

System Prompt

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are an expert Python developer. Give concise, practical answers.",
    messages=[
        {"role": "user", "content": "How do I read a CSV file?"}
    ]
)

Prompt Caching (Claude's Unique Advantage)

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an AI assistant with access to a large knowledge base...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "Summarize the key points."}
    ]
)

# Check cache performance
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")

Prompt caching reduces costs by up to 90% on cached tokens. For applications with long system prompts, this is the single biggest cost optimization available.

Async Usage

import asyncio
from anthropic import AsyncAnthropic

async def main():
    client = AsyncAnthropic()
    
    response = await client.messages.create(
        model="claude-haiku-3-5-20241022",
        max_tokens=256,
        messages=[{"role": "user", "content": "Hello"}]
    )
    
    print(response.content[0].text)

asyncio.run(main())

The google-genai Python SDK: Gemini Models

API patterns: client.models.generate_content() (single call), client.chats.create() for multi-turn (retains context automatically), client.models.generate_content_stream() for streaming. System instruction via types.GenerateContentConfig(system_instruction=...). Free tier: Gemini 2.0 Flash 15 RPM/1,500 RPD/no card — sufficient to build + test complete applications. Most generous free tier in industry — best SDK for prototyping/learning before paying.

Google's SDK takes a different approach with its own API design.

Basic Generation

from google import genai

client = genai.Client()  # Uses GOOGLE_API_KEY env var

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="What is the capital of France?"
)

print(response.text)

With System Instruction

from google.genai import types

response = client.models.generate_content(
    model="gemini-2.0-flash",
    config=types.GenerateContentConfig(
        system_instruction="You are an expert Python developer.",
        temperature=0.7,
        max_output_tokens=1024
    ),
    contents="How do I read a CSV file?"
)

print(response.text)

Multi-Turn Conversation

chat = client.chats.create(model="gemini-2.0-flash")

response1 = chat.send_message("What is Python?")
print(response1.text)

response2 = chat.send_message("What are its main advantages?")
print(response2.text)  # Retains context from previous turn

Free Tier Usage

Google Gemini's free tier is the most generous in the industry. No credit card required. Gemini 2.0 Flash allows 15 requests per minute and 1,500 requests per day at zero cost. This is enough to build and test complete applications.

Streaming Responses in Python

Three streaming patterns. openai: stream=True parameter, iterate chunk.choices[0].delta.content. anthropic: context manager with client.messages.stream(...) then iterate stream.text_stream. google-genai: client.models.generate_content_stream(), iterate chunk.text. All async-compatible. Critical for chat interfaces — without streaming, users wait full generation time. With streaming, first tokens appear within TTFT (80ms-300ms) creating instant feel.

Streaming is essential for chat interfaces. Here is how to stream with each SDK.

Streaming With openai SDK

from openai import OpenAI

client = OpenAI()

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Write a haiku about Python."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Streaming With anthropic SDK

from anthropic import Anthropic

client = Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about Python."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Streaming With google-genai SDK

from google import genai

client = genai.Client()

response = client.models.generate_content_stream(
    model="gemini-2.0-flash",
    contents="Write a haiku about Python."
)

for chunk in response:
    print(chunk.text, end="", flush=True)

Tool Calling and Function Execution

Tool calling lets models invoke Python functions. openai: define tools array with JSON Schema, model returns tool_calls array (parse function.arguments as JSON). anthropic: tools array with input_schema, response contains tool_use blocks (already parsed input). Both 95%+ reliability on simple schemas. Complex multi-tool sequences: GPT 94% accuracy vs Claude similar. Test edge case inputs — argument formatting can vary between providers.

Tool calling lets models invoke your Python functions. Each SDK handles it differently.

Tool Calling With openai SDK

import json
from openai import OpenAI

client = OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)

# Check if the model wants to call a function
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)
    print(f"Model wants to call: {tool_call.function.name}({args})")

Tool Calling With anthropic SDK

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "input_schema": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    ],
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)

for block in response.content:
    if block.type == "tool_use":
        print(f"Tool: {block.name}, Input: {block.input}")

Structured Output: Getting JSON From Models

Three approaches. openai: response_format={"type": "json_object"} JSON mode + system prompt with desired schema. JSON validity 99.5% with structured output mode (token-level constraint). anthropic: use tool_choice with input_schema defining structure — model "calls" your tool with structured data. Both reliable for production. google-genai supports response_schema parameter. Always parse JSON output before downstream use to catch occasional invalid responses.

JSON Mode With openai SDK

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {"role": "system", "content": "Return JSON with keys: name, capital, population"},
        {"role": "user", "content": "Tell me about Japan"}
    ],
    response_format={"type": "json_object"}
)

import json
data = json.loads(response.choices[0].message.content)
print(data)  # {"name": "Japan", "capital": "Tokyo", "population": 125000000}

Structured Output With anthropic SDK

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "format_country_info",
            "description": "Format country information",
            "input_schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "capital": {"type": "string"},
                    "population": {"type": "integer"}
                },
                "required": ["name", "capital", "population"]
            }
        }
    ],
    tool_choice={"type": "tool", "name": "format_country_info"},
    messages=[{"role": "user", "content": "Tell me about Japan"}]
)

# Extract structured data from tool use
for block in response.content:
    if block.type == "tool_use":
        print(block.input)  # {"name": "Japan", "capital": "Tokyo", "population": 125000000}

Using TokenMix.ai With the openai Python SDK

Standard openai Python SDK + base_url change = access 300+ models from all providers. Single OpenAI() client with base_url="https://api.tokenmix.ai/v1". Use any model name as parameter: gpt-4.1, claude-sonnet-4, deepseek-chat, gemini-2.0-flash — TokenMix.ai handles provider translation behind the scenes. One pip install, one API key, one billing dashboard. Eliminates managing 3-5 SDK installations + 3-5 separate API keys.

TokenMix.ai works with the standard openai Python SDK. Change the base URL and API key to access 300+ models from all providers through a single endpoint.

from openai import OpenAI

# One client, all providers
client = OpenAI(
    api_key="tmx-your-key",
    base_url="https://api.tokenmix.ai/v1"
)

# Use OpenAI models
response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)

# Use Claude models (via OpenAI-format endpoint)
response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)

# Use DeepSeek models
response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)

The advantage: one SDK installation, one API key, one billing dashboard, 300+ models. TokenMix.ai handles the provider translation behind the scenes.

Full SDK Feature Comparison Table

3 SDKs × 14 dimensions. Multi-provider via base_url: openai only. Prompt caching: anthropic manual but powerful (90% off), openai automatic, google context caching. Batch API 50% off: openai + anthropic only (google none). Auto-retry: openai/anthropic 2 retries default, google limited. Embeddings: openai + google (anthropic uses Voyage). Async client: AsyncOpenAI/AsyncAnthropic/google async methods. Min Python: openai/anthropic 3.8, google 3.9.

Feature	openai (Python)	anthropic (Python)	google-genai (Python)
Chat completions	Yes	Yes (messages API)	Yes (generate_content)
Streaming	Yes (async iterator)	Yes (event stream)	Yes (generate_content_stream)
Tool calling	Yes	Yes	Yes
JSON mode	Yes (response_format)	Via tool_choice	Yes (response_schema)
Vision/images	Yes	Yes	Yes
Embeddings	Yes	No (use Voyage)	Yes
Prompt caching	Automatic	Manual (powerful)	Context caching
Batch API	Yes (50% discount)	Yes (50% discount)	No
Auto-retry	Yes (2 retries default)	Yes (2 retries default)	Limited
Timeout config	Yes	Yes	Yes
Type hints	Excellent	Excellent	Good
Async client	AsyncOpenAI	AsyncAnthropic	Async methods
Min Python	3.8	3.8	3.9
Multi-provider	Yes (via base_url)	No	No

Cost Comparison for Python Developers

5 use case tiers: Personal/learning (1K req/mo, 500 tokens) = Gemini Flash free $0. Prototype/MVP (10K req/mo, 1K tokens) = GPT-4.1 mini $8.80. Side project production (50K req/mo) = DeepSeek V4 $41. Small SaaS (200K req/mo) = GPT-4.1 mini $176. High-traffic (1M req/mo) = mixed via TokenMix.ai $400-800. Free tier sufficient for learning + light prototyping. Production starts at $10-200/mo for most apps.

Typical Python development patterns and their costs:

Use Case	Requests/Month	Avg Tokens/Req	Best Model	Monthly Cost
Personal project/learning	1,000	500	Gemini Flash (free)	$0
Prototype/MVP	10,000	1,000	GPT-4.1 mini	$8.80
Side project in production	50,000	1,500	DeepSeek V4	$41
Small SaaS product	200,000	2,000	GPT-4.1 mini	$176
High-traffic application	1,000,000	2,000	Mixed via TokenMix.ai	$400-$800

Which Python AI SDK Should You Use?

Want one SDK for everything: openai (via TokenMix.ai) — works with all OpenAI-compatible providers. Need prompt caching: anthropic (90% savings on long prompts). Free tier prototyping: google-genai (most generous, no card). Building agent with tools: openai or anthropic (most mature tool calling). Fastest setup: openai (best docs, largest community). Multi-provider production: openai SDK + TokenMix.ai endpoint = one SDK + 300+ models.

Your Situation	Recommended SDK	Why
Want one SDK for everything	openai (via TokenMix.ai)	Works with all OpenAI-compatible providers
Need prompt caching	anthropic	Best caching implementation
Free tier prototyping	google-genai	Most generous free tier
Building agent with tools	openai or anthropic	Most mature tool calling
Need fastest setup	openai	Best docs, largest community
Multi-provider production app	openai (via TokenMix.ai)	One SDK, 300+ models

What's the Bottom Line on Python AI SDKs?

Most efficient architecture: openai SDK pointed at TokenMix.ai endpoint = one installation, one API key, one billing account, 300+ models. Add anthropic SDK only when you need prompt caching for long system prompts. Add google-genai only for free tier development. Start with one SDK, add others as needs grow. The code examples in this guide are production-ready starting points — copy them, change API keys, ship features.

For Python developers, the openai SDK is the most practical starting point. It works with OpenAI directly and with five or more additional providers through base_url configuration. The anthropic SDK is essential if you need Claude's prompt caching or prefer Claude's response quality. The google-genai SDK is your best option for free-tier development.

The most efficient architecture: use the openai SDK pointed at TokenMix.ai's endpoint. You get access to every major model through one installation, one API key, and one billing account. When you need provider-specific features like Anthropic's prompt caching, use the native SDK alongside.

Start with one SDK. Add others as your needs grow. The code examples in this guide are production-ready starting points.

FAQ

What is the best Python SDK for AI API development?

The openai Python package is the most versatile SDK because it works with OpenAI and any OpenAI-compatible provider (DeepSeek, Groq, Mistral, TokenMix.ai). For Claude-specific features like prompt caching, use the anthropic SDK. For free-tier development, use google-genai. Most production Python applications use two SDKs: openai as the primary and one provider-specific SDK for specialized features.

Can I use the openai Python SDK with non-OpenAI providers?

Yes. Any provider implementing the OpenAI chat completions API format works with the openai SDK. Set base_url to the provider's endpoint. DeepSeek, Groq, Mistral, Together AI, Perplexity, and TokenMix.ai all support this. TokenMix.ai's endpoint gives you access to 300+ models from all providers through the openai SDK.

How do I handle rate limits in Python AI SDKs?

The openai and anthropic SDKs include automatic retry with exponential backoff (2 retries by default). For custom handling, catch RateLimitError and implement your own backoff logic. Monitor rate limits headers in responses: x-ratelimit-remaining-requests and x-ratelimit-remaining-tokens. TokenMix.ai pools rate limits across providers, reducing the likelihood of hitting limits.

What Python version do I need for AI API development?

Python 3.9 or higher is recommended. The openai and anthropic SDKs support Python 3.8+, but google-genai requires 3.9+. Python 3.11+ offers the best async performance, which matters for streaming responses and concurrent API calls.

How do I reduce AI API costs in Python applications?

Five strategies: use prompt caching (Anthropic saves up to 90%), use batch API for non-real-time work (50% discount on OpenAI and Anthropic), route simple queries to cheaper models, implement response caching for repeated queries, and use TokenMix.ai's cost-optimized routing. The biggest savings come from model selection -- using GPT-4.1 mini instead of GPT-4.1 cuts costs by 80%.

Is async important for Python AI API calls?

Yes, for production applications. Async allows concurrent API calls without blocking your server. A synchronous Flask app handles one API call at a time per worker. An async FastAPI app handles dozens concurrently. Use AsyncOpenAI or AsyncAnthropic for web applications that serve multiple users simultaneously.

Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: OpenAI Python SDK, Anthropic Python SDK, Google GenAI SDK + TokenMix.ai

AI API for Python Developers: Complete Guide to OpenAI, Anthropic, and Google SDKs (2026)

Table of Contents

Quick SDK Comparison for Python

Prerequisites and Setup

The openai Python SDK: One SDK, Five+ Providers

Basic Chat Completion

Using the Same SDK With DeepSeek

Using the Same SDK With Groq

Async Usage

Error Handling

The anthropic Python SDK: Claude Models

Basic Message

System Prompt

Prompt Caching (Claude's Unique Advantage)

Async Usage

The google-genai Python SDK: Gemini Models

Basic Generation

With System Instruction

Multi-Turn Conversation

Free Tier Usage

Streaming Responses in Python

Streaming With openai SDK

Streaming With anthropic SDK

Streaming With google-genai SDK

Tool Calling and Function Execution

Tool Calling With openai SDK

Tool Calling With anthropic SDK

Structured Output: Getting JSON From Models

JSON Mode With openai SDK

Structured Output With anthropic SDK

Using TokenMix.ai With the openai Python SDK

Full SDK Feature Comparison Table

Cost Comparison for Python Developers

Which Python AI SDK Should You Use?

What's the Bottom Line on Python AI SDKs?

FAQ

What is the best Python SDK for AI API development?

Can I use the openai Python SDK with non-OpenAI providers?

How do I handle rate limits in Python AI SDKs?

What Python version do I need for AI API development?

How do I reduce AI API costs in Python applications?

Is async important for Python AI API calls?