TokenMix Research Lab ยท 2026-04-12

AI API for Python Developers 2026: SDK Quick Start in 5 Minutes

AI API for Python Developers: Complete Guide to OpenAI, Anthropic, and Google SDKs (2026)

Python is the default language for AI API integration. Every major provider ships a Python SDK, and the openai package alone works with five or more providers through OpenAI-compatible endpoints. This tutorial covers every Python SDK you need: openai (works with OpenAI, DeepSeek, Groq, Mistral, and TokenMix.ai), anthropic (for Claude models), and google-genai (for Gemini models). Code examples for each, feature comparison, and a clear recommendation for which SDK to use when. All examples tested and verified by TokenMix.ai as of April 2026.

Table of Contents


Quick SDK Comparison for Python

Feature openai anthropic google-genai
Install pip install openai pip install anthropic pip install google-genai
Providers Supported OpenAI, DeepSeek, Groq, Mistral, TokenMix.ai, Together, Perplexity Anthropic only Google only
Type Hints Excellent Excellent Good
Async Support Yes (AsyncOpenAI) Yes (AsyncAnthropic) Yes
Streaming Async iterators Event stream Async iterators
Auto-Retries Yes (configurable) Yes (configurable) Limited
Latest Version 1.x 0.49+ 1.x
Python Minimum 3.8+ 3.8+ 3.9+

Prerequisites and Setup

Before writing any code, you need:

  1. Python 3.9+ (3.11+ recommended for best async performance)
  2. An API key from at least one provider
  3. A virtual environment (always isolate AI dependencies)
# Create project and virtual environment
mkdir ai-api-project && cd ai-api-project
python -m venv venv
source venv/bin/activate  # Linux/Mac
# venv\Scripts\activate   # Windows

# Install all three SDKs
pip install openai anthropic google-genai

# Set API keys as environment variables
export OPENAI_API_KEY="sk-your-key"
export ANTHROPIC_API_KEY="sk-ant-your-key"
export GOOGLE_API_KEY="your-google-key"

Security rule: Never hardcode API keys in source files. Always use environment variables or a secrets manager.


The openai Python SDK: One SDK, Five+ Providers

The openai package is the most versatile Python AI SDK. It works with any provider that implements the OpenAI chat completions API format.

Basic Chat Completion

from openai import OpenAI

client = OpenAI()  # Uses OPENAI_API_KEY env var

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)
# Output: The capital of France is Paris.

Using the Same SDK With DeepSeek

from openai import OpenAI

# Only the base_url and api_key change
client = OpenAI(
    api_key="dsk-your-deepseek-key",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-chat",  # [DeepSeek V4](https://tokenmix.ai/blog/deepseek-api-pricing)
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)

Using the Same SDK With Groq

from openai import OpenAI

client = OpenAI(
    api_key="gsk-your-groq-key",
    base_url="https://api.groq.com/openai/v1"
)

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)

Async Usage

import asyncio
from openai import AsyncOpenAI

async def main():
    client = AsyncOpenAI()
    
    response = await client.chat.completions.create(
        model="gpt-4.1-mini",
        messages=[{"role": "user", "content": "Hello"}]
    )
    
    print(response.choices[0].message.content)

asyncio.run(main())

Error Handling

from openai import OpenAI, RateLimitError, APIError, AuthenticationError

client = OpenAI()

try:
    response = client.chat.completions.create(
        model="gpt-4.1",
        messages=[{"role": "user", "content": "Hello"}]
    )
except AuthenticationError:
    print("Invalid API key. Check your OPENAI_API_KEY.")
except RateLimitError as e:
    print(f"Rate limited. Retry after: {e.response.headers.get('retry-after')}s")
except APIError as e:
    print(f"API error: {e.status_code} - {e.message}")

The anthropic Python SDK: Claude Models

Anthropic's SDK has a different API design. It is not OpenAI-compatible, but it is clean and well-typed.

Basic Message

from anthropic import Anthropic

client = Anthropic()  # Uses ANTHROPIC_API_KEY env var

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.content[0].text)

Key differences from OpenAI SDK:

System Prompt

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are an expert Python developer. Give concise, practical answers.",
    messages=[
        {"role": "user", "content": "How do I read a CSV file?"}
    ]
)

Prompt Caching (Claude's Unique Advantage)

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an AI assistant with access to a large knowledge base...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "Summarize the key points."}
    ]
)

# Check cache performance
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")

Prompt caching reduces costs by up to 90% on cached tokens. For applications with long system prompts, this is the single biggest cost optimization available.

Async Usage

import asyncio
from anthropic import AsyncAnthropic

async def main():
    client = AsyncAnthropic()
    
    response = await client.messages.create(
        model="claude-haiku-3-5-20241022",
        max_tokens=256,
        messages=[{"role": "user", "content": "Hello"}]
    )
    
    print(response.content[0].text)

asyncio.run(main())

The google-genai Python SDK: Gemini Models

Google's SDK takes a different approach with its own API design.

Basic Generation

from google import genai

client = genai.Client()  # Uses GOOGLE_API_KEY env var

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="What is the capital of France?"
)

print(response.text)

With System Instruction

from google.genai import types

response = client.models.generate_content(
    model="gemini-2.0-flash",
    config=types.GenerateContentConfig(
        system_instruction="You are an expert Python developer.",
        temperature=0.7,
        max_output_tokens=1024
    ),
    contents="How do I read a CSV file?"
)

print(response.text)

Multi-Turn Conversation

chat = client.chats.create(model="gemini-2.0-flash")

response1 = chat.send_message("What is Python?")
print(response1.text)

response2 = chat.send_message("What are its main advantages?")
print(response2.text)  # Retains context from previous turn

Free Tier Usage

Google Gemini's free tier is the most generous in the industry. No credit card required. Gemini 2.0 Flash allows 15 requests per minute and 1,500 requests per day at zero cost. This is enough to build and test complete applications.


Streaming Responses in Python

Streaming is essential for chat interfaces. Here is how to stream with each SDK.

Streaming With openai SDK

from openai import OpenAI

client = OpenAI()

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Write a haiku about Python."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Streaming With anthropic SDK

from anthropic import Anthropic

client = Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about Python."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Streaming With google-genai SDK

from google import genai

client = genai.Client()

response = client.models.generate_content_stream(
    model="gemini-2.0-flash",
    contents="Write a haiku about Python."
)

for chunk in response:
    print(chunk.text, end="", flush=True)

Tool Calling and Function Execution

Tool calling lets models invoke your Python functions. Each SDK handles it differently.

Tool Calling With openai SDK

import json
from openai import OpenAI

client = OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)

# Check if the model wants to call a function
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)
    print(f"Model wants to call: {tool_call.function.name}({args})")

Tool Calling With anthropic SDK

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "input_schema": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    ],
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)

for block in response.content:
    if block.type == "tool_use":
        print(f"Tool: {block.name}, Input: {block.input}")

Structured Output: Getting JSON From Models

JSON Mode With openai SDK

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {"role": "system", "content": "Return JSON with keys: name, capital, population"},
        {"role": "user", "content": "Tell me about Japan"}
    ],
    response_format={"type": "json_object"}
)

import json
data = json.loads(response.choices[0].message.content)
print(data)  # {"name": "Japan", "capital": "Tokyo", "population": 125000000}

Structured Output With anthropic SDK

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "format_country_info",
            "description": "Format country information",
            "input_schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "capital": {"type": "string"},
                    "population": {"type": "integer"}
                },
                "required": ["name", "capital", "population"]
            }
        }
    ],
    tool_choice={"type": "tool", "name": "format_country_info"},
    messages=[{"role": "user", "content": "Tell me about Japan"}]
)

# Extract structured data from tool use
for block in response.content:
    if block.type == "tool_use":
        print(block.input)  # {"name": "Japan", "capital": "Tokyo", "population": 125000000}

Using TokenMix.ai With the openai Python SDK

TokenMix.ai works with the standard openai Python SDK. Change the base URL and API key to access 300+ models from all providers through a single endpoint.

from openai import OpenAI

# One client, all providers
client = OpenAI(
    api_key="tmx-your-key",
    base_url="https://api.tokenmix.ai/v1"
)

# Use OpenAI models
response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)

# Use Claude models (via OpenAI-format endpoint)
response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)

# Use DeepSeek models
response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)

The advantage: one SDK installation, one API key, one billing dashboard, 300+ models. TokenMix.ai handles the provider translation behind the scenes.


Full SDK Feature Comparison Table

Feature openai (Python) anthropic (Python) google-genai (Python)
Chat completions Yes Yes (messages API) Yes (generate_content)
Streaming Yes (async iterator) Yes (event stream) Yes (generate_content_stream)
Tool calling Yes Yes Yes
JSON mode Yes (response_format) Via tool_choice Yes (response_schema)
Vision/images Yes Yes Yes
Embeddings Yes No (use Voyage) Yes
Prompt caching Automatic Manual (powerful) Context caching
Batch API Yes (50% discount) Yes (50% discount) No
Auto-retry Yes (2 retries default) Yes (2 retries default) Limited
Timeout config Yes Yes Yes
Type hints Excellent Excellent Good
Async client AsyncOpenAI AsyncAnthropic Async methods
Min Python 3.8 3.8 3.9
Multi-provider Yes (via base_url) No No

Cost Comparison for Python Developers

Typical Python development patterns and their costs:

Use Case Requests/Month Avg Tokens/Req Best Model Monthly Cost
Personal project/learning 1,000 500 Gemini Flash (free) $0
Prototype/MVP 10,000 1,000 GPT-4.1 mini $8.80
Side project in production 50,000 1,500 DeepSeek V4 $41
Small SaaS product 200,000 2,000 GPT-4.1 mini 76
High-traffic application 1,000,000 2,000 Mixed via TokenMix.ai $400-$800

Decision Guide: Which Python AI SDK Should You Use

Your Situation Recommended SDK Why
Want one SDK for everything openai (via TokenMix.ai) Works with all OpenAI-compatible providers
Need prompt caching anthropic Best caching implementation
Free tier prototyping google-genai Most generous free tier
Building agent with tools openai or anthropic Most mature tool calling
Need fastest setup openai Best docs, largest community
Multi-provider production app openai (via TokenMix.ai) One SDK, 300+ models

Conclusion

For Python developers, the openai SDK is the most practical starting point. It works with OpenAI directly and with five or more additional providers through base_url configuration. The anthropic SDK is essential if you need Claude's prompt caching or prefer Claude's response quality. The google-genai SDK is your best option for free-tier development.

The most efficient architecture: use the openai SDK pointed at TokenMix.ai's endpoint. You get access to every major model through one installation, one API key, and one billing account. When you need provider-specific features like Anthropic's prompt caching, use the native SDK alongside.

Start with one SDK. Add others as your needs grow. The code examples in this guide are production-ready starting points.


FAQ

What is the best Python SDK for AI API development?

The openai Python package is the most versatile SDK because it works with OpenAI and any OpenAI-compatible provider (DeepSeek, Groq, Mistral, TokenMix.ai). For Claude-specific features like prompt caching, use the anthropic SDK. For free-tier development, use google-genai. Most production Python applications use two SDKs: openai as the primary and one provider-specific SDK for specialized features.

Can I use the openai Python SDK with non-OpenAI providers?

Yes. Any provider implementing the OpenAI chat completions API format works with the openai SDK. Set base_url to the provider's endpoint. DeepSeek, Groq, Mistral, Together AI, Perplexity, and TokenMix.ai all support this. TokenMix.ai's endpoint gives you access to 300+ models from all providers through the openai SDK.

How do I handle rate limits in Python AI SDKs?

The openai and anthropic SDKs include automatic retry with exponential backoff (2 retries by default). For custom handling, catch RateLimitError and implement your own backoff logic. Monitor rate limits headers in responses: x-ratelimit-remaining-requests and x-ratelimit-remaining-tokens. TokenMix.ai pools rate limits across providers, reducing the likelihood of hitting limits.

What Python version do I need for AI API development?

Python 3.9 or higher is recommended. The openai and anthropic SDKs support Python 3.8+, but google-genai requires 3.9+. Python 3.11+ offers the best async performance, which matters for streaming responses and concurrent API calls.

How do I reduce AI API costs in Python applications?

Five strategies: use prompt caching (Anthropic saves up to 90%), use batch API for non-real-time work (50% discount on OpenAI and Anthropic), route simple queries to cheaper models, implement response caching for repeated queries, and use TokenMix.ai's cost-optimized routing. The biggest savings come from model selection -- using GPT-4.1 mini instead of GPT-4.1 cuts costs by 80%.

Is async important for Python AI API calls?

Yes, for production applications. Async allows concurrent API calls without blocking your server. A synchronous Flask app handles one API call at a time per worker. An async FastAPI app handles dozens concurrently. Use AsyncOpenAI or AsyncAnthropic for web applications that serve multiple users simultaneously.


Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: OpenAI Python SDK, Anthropic Python SDK, Google GenAI SDK + TokenMix.ai