TokenMix Research Lab · 2026-04-12

AI API for Python Developers 2026: SDK Quick Start in 5 Minutes

AI API for Python Developers: Complete Guide to OpenAI, Anthropic, and Google SDKs (2026)

Last Updated: 2026-04-29
Author: TokenMix Research Lab

Python AI SDK landscape: openai works with 5+ providers (OpenAI/DeepSeek/Groq/Mistral/TokenMix.ai/Together/Perplexity) — most versatile. anthropic for Claude (90% cache savings unique). google-genai for Gemini (most generous free tier — 15 RPM/1500 RPD no card). All have async support + type hints + streaming. Best architecture: openai SDK + TokenMix.ai endpoint = one installation, 300+ models, single billing.

Python is the default language for AI API integration. Every major provider ships a Python SDK, and the openai package alone works with five or more providers through OpenAI-compatible endpoints. This tutorial covers every Python SDK you need: openai (works with OpenAI, DeepSeek, Groq, Mistral, and TokenMix.ai), anthropic (for Claude models), and google-genai (for Gemini models). Code examples for each, feature comparison, and a clear recommendation for which SDK to use when. All examples tested and verified by TokenMix.ai as of April 2026.

Table of Contents


Quick SDK Comparison for Python

Three official SDKs ranked by versatility: openai (works with 7+ providers via base_url switch — most flexible). anthropic (Claude only — best caching). google-genai (Gemini only — best free tier). All support: async, streaming, type hints, auto-retries (limited on Google). Python minimum: openai/anthropic 3.8+, google-genai 3.9+. SDK choice less important than architecture: openai SDK + TokenMix.ai = one SDK + 300+ models.

Feature openai anthropic google-genai
Install pip install openai pip install anthropic pip install google-genai
Providers Supported OpenAI, DeepSeek, Groq, Mistral, TokenMix.ai, Together, Perplexity Anthropic only Google only
Type Hints Excellent Excellent Good
Async Support Yes (AsyncOpenAI) Yes (AsyncAnthropic) Yes
Streaming Async iterators Event stream Async iterators
Auto-Retries Yes (configurable) Yes (configurable) Limited
Latest Version 1.x 0.49+ 1.x
Python Minimum 3.8+ 3.8+ 3.9+

Prerequisites and Setup

Three requirements: Python 3.9+ (3.11+ recommended for async perf), at least one provider API key, virtual environment to isolate AI dependencies. Install all three SDKs in one venv: pip install openai anthropic google-genai. Set API keys as env vars (NEVER hardcode in source). Setup time: 5-10 minutes from zero to first API call. Use os.getenv() for portable config across environments.

Before writing any code, you need:

  1. Python 3.9+ (3.11+ recommended for best async performance)
  2. An API key from at least one provider
  3. A virtual environment (always isolate AI dependencies)
# Create project and virtual environment
mkdir ai-api-project && cd ai-api-project
python -m venv venv
source venv/bin/activate  # Linux/Mac
# venv\Scripts\activate   # Windows

# Install all three SDKs
pip install openai anthropic google-genai

# Set API keys as environment variables
export OPENAI_API_KEY="sk-your-key"
export ANTHROPIC_API_KEY="sk-ant-your-key"
export GOOGLE_API_KEY="your-google-key"

Security rule: Never hardcode API keys in source files. Always use environment variables or a secrets manager.


The openai Python SDK: One SDK, Five+ Providers

Most versatile SDK — works with any OpenAI-compatible endpoint. Switch providers by changing base_url only: OpenAI → DeepSeek (api.deepseek.com) → Groq (api.groq.com/openai/v1) → Mistral → TokenMix.ai. Same client.chat.completions.create() call across all. AsyncOpenAI for concurrent calls. Specific error classes (RateLimitError, AuthenticationError, APIError) enable precise exception handling. Auto-retry with exponential backoff (2 retries default).

The openai package is the most versatile Python AI SDK. It works with any provider that implements the OpenAI chat completions API format.

Basic Chat Completion

from openai import OpenAI

client = OpenAI()  # Uses OPENAI_API_KEY env var

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)
# Output: The capital of France is Paris.

Using the Same SDK With DeepSeek

from openai import OpenAI

# Only the base_url and api_key change
client = OpenAI(
    api_key="dsk-your-deepseek-key",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-chat",  # [DeepSeek V4](https://tokenmix.ai/blog/deepseek-api-pricing)
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)

Using the Same SDK With Groq

from openai import OpenAI

client = OpenAI(
    api_key="gsk-your-groq-key",
    base_url="https://api.groq.com/openai/v1"
)

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)

Async Usage

import asyncio
from openai import AsyncOpenAI

async def main():
    client = AsyncOpenAI()
    
    response = await client.chat.completions.create(
        model="gpt-4.1-mini",
        messages=[{"role": "user", "content": "Hello"}]
    )
    
    print(response.choices[0].message.content)

asyncio.run(main())

Error Handling

from openai import OpenAI, RateLimitError, APIError, AuthenticationError

client = OpenAI()

try:
    response = client.chat.completions.create(
        model="gpt-4.1",
        messages=[{"role": "user", "content": "Hello"}]
    )
except AuthenticationError:
    print("Invalid API key. Check your OPENAI_API_KEY.")
except RateLimitError as e:
    print(f"Rate limited. Retry after: {e.response.headers.get('retry-after')}s")
except APIError as e:
    print(f"API error: {e.status_code} - {e.message}")

The anthropic Python SDK: Claude Models

Three key API differences from OpenAI SDK: (1) max_tokens required (OpenAI defaults). (2) System prompt is separate parameter, not a message role. (3) Response structure: response.content[0].text (not response.choices[0].message.content). Unique advantage: prompt caching via cache_control: {"type": "ephemeral"} saves up to 90% on cached tokens — single biggest cost optimization for long system prompts. AsyncAnthropic for concurrent calls.

Anthropic's SDK has a different API design. It is not OpenAI-compatible, but it is clean and well-typed.

Basic Message

from anthropic import Anthropic

client = Anthropic()  # Uses ANTHROPIC_API_KEY env var

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.content[0].text)

Key differences from OpenAI SDK:

System Prompt

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are an expert Python developer. Give concise, practical answers.",
    messages=[
        {"role": "user", "content": "How do I read a CSV file?"}
    ]
)

Prompt Caching (Claude's Unique Advantage)

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an AI assistant with access to a large knowledge base...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "Summarize the key points."}
    ]
)

# Check cache performance
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")

Prompt caching reduces costs by up to 90% on cached tokens. For applications with long system prompts, this is the single biggest cost optimization available.

Async Usage

import asyncio
from anthropic import AsyncAnthropic

async def main():
    client = AsyncAnthropic()
    
    response = await client.messages.create(
        model="claude-haiku-3-5-20241022",
        max_tokens=256,
        messages=[{"role": "user", "content": "Hello"}]
    )
    
    print(response.content[0].text)

asyncio.run(main())

The google-genai Python SDK: Gemini Models

API patterns: client.models.generate_content() (single call), client.chats.create() for multi-turn (retains context automatically), client.models.generate_content_stream() for streaming. System instruction via types.GenerateContentConfig(system_instruction=...). Free tier: Gemini 2.0 Flash 15 RPM/1,500 RPD/no card — sufficient to build + test complete applications. Most generous free tier in industry — best SDK for prototyping/learning before paying.

Google's SDK takes a different approach with its own API design.

Basic Generation

from google import genai

client = genai.Client()  # Uses GOOGLE_API_KEY env var

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="What is the capital of France?"
)

print(response.text)

With System Instruction

from google.genai import types

response = client.models.generate_content(
    model="gemini-2.0-flash",
    config=types.GenerateContentConfig(
        system_instruction="You are an expert Python developer.",
        temperature=0.7,
        max_output_tokens=1024
    ),
    contents="How do I read a CSV file?"
)

print(response.text)

Multi-Turn Conversation

chat = client.chats.create(model="gemini-2.0-flash")

response1 = chat.send_message("What is Python?")
print(response1.text)

response2 = chat.send_message("What are its main advantages?")
print(response2.text)  # Retains context from previous turn

Free Tier Usage

Google Gemini's free tier is the most generous in the industry. No credit card required. Gemini 2.0 Flash allows 15 requests per minute and 1,500 requests per day at zero cost. This is enough to build and test complete applications.


Streaming Responses in Python

Three streaming patterns. openai: stream=True parameter, iterate chunk.choices[0].delta.content. anthropic: context manager with client.messages.stream(...) then iterate stream.text_stream. google-genai: client.models.generate_content_stream(), iterate chunk.text. All async-compatible. Critical for chat interfaces — without streaming, users wait full generation time. With streaming, first tokens appear within TTFT (80ms-300ms) creating instant feel.

Streaming is essential for chat interfaces. Here is how to stream with each SDK.

Streaming With openai SDK

from openai import OpenAI

client = OpenAI()

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Write a haiku about Python."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Streaming With anthropic SDK

from anthropic import Anthropic

client = Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about Python."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Streaming With google-genai SDK

from google import genai

client = genai.Client()

response = client.models.generate_content_stream(
    model="gemini-2.0-flash",
    contents="Write a haiku about Python."
)

for chunk in response:
    print(chunk.text, end="", flush=True)

Tool Calling and Function Execution

Tool calling lets models invoke Python functions. openai: define tools array with JSON Schema, model returns tool_calls array (parse function.arguments as JSON). anthropic: tools array with input_schema, response contains tool_use blocks (already parsed input). Both 95%+ reliability on simple schemas. Complex multi-tool sequences: GPT 94% accuracy vs Claude similar. Test edge case inputs — argument formatting can vary between providers.

Tool calling lets models invoke your Python functions. Each SDK handles it differently.

Tool Calling With openai SDK

import json
from openai import OpenAI

client = OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)

# Check if the model wants to call a function
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)
    print(f"Model wants to call: {tool_call.function.name}({args})")

Tool Calling With anthropic SDK

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "input_schema": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    ],
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)

for block in response.content:
    if block.type == "tool_use":
        print(f"Tool: {block.name}, Input: {block.input}")

Structured Output: Getting JSON From Models

Three approaches. openai: response_format={"type": "json_object"} JSON mode + system prompt with desired schema. JSON validity 99.5% with structured output mode (token-level constraint). anthropic: use tool_choice with input_schema defining structure — model "calls" your tool with structured data. Both reliable for production. google-genai supports response_schema parameter. Always parse JSON output before downstream use to catch occasional invalid responses.

JSON Mode With openai SDK

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {"role": "system", "content": "Return JSON with keys: name, capital, population"},
        {"role": "user", "content": "Tell me about Japan"}
    ],
    response_format={"type": "json_object"}
)

import json
data = json.loads(response.choices[0].message.content)
print(data)  # {"name": "Japan", "capital": "Tokyo", "population": 125000000}

Structured Output With anthropic SDK

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "format_country_info",
            "description": "Format country information",
            "input_schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "capital": {"type": "string"},
                    "population": {"type": "integer"}
                },
                "required": ["name", "capital", "population"]
            }
        }
    ],
    tool_choice={"type": "tool", "name": "format_country_info"},
    messages=[{"role": "user", "content": "Tell me about Japan"}]
)

# Extract structured data from tool use
for block in response.content:
    if block.type == "tool_use":
        print(block.input)  # {"name": "Japan", "capital": "Tokyo", "population": 125000000}

Using TokenMix.ai With the openai Python SDK

Standard openai Python SDK + base_url change = access 300+ models from all providers. Single OpenAI() client with base_url="https://api.tokenmix.ai/v1". Use any model name as parameter: gpt-4.1, claude-sonnet-4, deepseek-chat, gemini-2.0-flash — TokenMix.ai handles provider translation behind the scenes. One pip install, one API key, one billing dashboard. Eliminates managing 3-5 SDK installations + 3-5 separate API keys.

TokenMix.ai works with the standard openai Python SDK. Change the base URL and API key to access 300+ models from all providers through a single endpoint.

from openai import OpenAI

# One client, all providers
client = OpenAI(
    api_key="tmx-your-key",
    base_url="https://api.tokenmix.ai/v1"
)

# Use OpenAI models
response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)

# Use Claude models (via OpenAI-format endpoint)
response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)

# Use DeepSeek models
response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Hello from TokenMix.ai"}]
)

The advantage: one SDK installation, one API key, one billing dashboard, 300+ models. TokenMix.ai handles the provider translation behind the scenes.


Full SDK Feature Comparison Table

3 SDKs × 14 dimensions. Multi-provider via base_url: openai only. Prompt caching: anthropic manual but powerful (90% off), openai automatic, google context caching. Batch API 50% off: openai + anthropic only (google none). Auto-retry: openai/anthropic 2 retries default, google limited. Embeddings: openai + google (anthropic uses Voyage). Async client: AsyncOpenAI/AsyncAnthropic/google async methods. Min Python: openai/anthropic 3.8, google 3.9.

Feature openai (Python) anthropic (Python) google-genai (Python)
Chat completions Yes Yes (messages API) Yes (generate_content)
Streaming Yes (async iterator) Yes (event stream) Yes (generate_content_stream)
Tool calling Yes Yes Yes
JSON mode Yes (response_format) Via tool_choice Yes (response_schema)
Vision/images Yes Yes Yes
Embeddings Yes No (use Voyage) Yes
Prompt caching Automatic Manual (powerful) Context caching
Batch API Yes (50% discount) Yes (50% discount) No
Auto-retry Yes (2 retries default) Yes (2 retries default) Limited
Timeout config Yes Yes Yes
Type hints Excellent Excellent Good
Async client AsyncOpenAI AsyncAnthropic Async methods
Min Python 3.8 3.8 3.9
Multi-provider Yes (via base_url) No No

Cost Comparison for Python Developers

5 use case tiers: Personal/learning (1K req/mo, 500 tokens) = Gemini Flash free $0. Prototype/MVP (10K req/mo, 1K tokens) = GPT-4.1 mini $8.80. Side project production (50K req/mo) = DeepSeek V4 $41. Small SaaS (200K req/mo) = GPT-4.1 mini $176. High-traffic (1M req/mo) = mixed via TokenMix.ai $400-800. Free tier sufficient for learning + light prototyping. Production starts at $10-200/mo for most apps.

Typical Python development patterns and their costs:

Use Case Requests/Month Avg Tokens/Req Best Model Monthly Cost
Personal project/learning 1,000 500 Gemini Flash (free) $0
Prototype/MVP 10,000 1,000 GPT-4.1 mini $8.80
Side project in production 50,000 1,500 DeepSeek V4 $41
Small SaaS product 200,000 2,000 GPT-4.1 mini $176
High-traffic application 1,000,000 2,000 Mixed via TokenMix.ai $400-$800

Which Python AI SDK Should You Use?

Want one SDK for everything: openai (via TokenMix.ai) — works with all OpenAI-compatible providers. Need prompt caching: anthropic (90% savings on long prompts). Free tier prototyping: google-genai (most generous, no card). Building agent with tools: openai or anthropic (most mature tool calling). Fastest setup: openai (best docs, largest community). Multi-provider production: openai SDK + TokenMix.ai endpoint = one SDK + 300+ models.

Your Situation Recommended SDK Why
Want one SDK for everything openai (via TokenMix.ai) Works with all OpenAI-compatible providers
Need prompt caching anthropic Best caching implementation
Free tier prototyping google-genai Most generous free tier
Building agent with tools openai or anthropic Most mature tool calling
Need fastest setup openai Best docs, largest community
Multi-provider production app openai (via TokenMix.ai) One SDK, 300+ models

What's the Bottom Line on Python AI SDKs?

Most efficient architecture: openai SDK pointed at TokenMix.ai endpoint = one installation, one API key, one billing account, 300+ models. Add anthropic SDK only when you need prompt caching for long system prompts. Add google-genai only for free tier development. Start with one SDK, add others as needs grow. The code examples in this guide are production-ready starting points — copy them, change API keys, ship features.

For Python developers, the openai SDK is the most practical starting point. It works with OpenAI directly and with five or more additional providers through base_url configuration. The anthropic SDK is essential if you need Claude's prompt caching or prefer Claude's response quality. The google-genai SDK is your best option for free-tier development.

The most efficient architecture: use the openai SDK pointed at TokenMix.ai's endpoint. You get access to every major model through one installation, one API key, and one billing account. When you need provider-specific features like Anthropic's prompt caching, use the native SDK alongside.

Start with one SDK. Add others as your needs grow. The code examples in this guide are production-ready starting points.


FAQ

What is the best Python SDK for AI API development?

The openai Python package is the most versatile SDK because it works with OpenAI and any OpenAI-compatible provider (DeepSeek, Groq, Mistral, TokenMix.ai). For Claude-specific features like prompt caching, use the anthropic SDK. For free-tier development, use google-genai. Most production Python applications use two SDKs: openai as the primary and one provider-specific SDK for specialized features.

Can I use the openai Python SDK with non-OpenAI providers?

Yes. Any provider implementing the OpenAI chat completions API format works with the openai SDK. Set base_url to the provider's endpoint. DeepSeek, Groq, Mistral, Together AI, Perplexity, and TokenMix.ai all support this. TokenMix.ai's endpoint gives you access to 300+ models from all providers through the openai SDK.

How do I handle rate limits in Python AI SDKs?

The openai and anthropic SDKs include automatic retry with exponential backoff (2 retries by default). For custom handling, catch RateLimitError and implement your own backoff logic. Monitor rate limits headers in responses: x-ratelimit-remaining-requests and x-ratelimit-remaining-tokens. TokenMix.ai pools rate limits across providers, reducing the likelihood of hitting limits.

What Python version do I need for AI API development?

Python 3.9 or higher is recommended. The openai and anthropic SDKs support Python 3.8+, but google-genai requires 3.9+. Python 3.11+ offers the best async performance, which matters for streaming responses and concurrent API calls.

How do I reduce AI API costs in Python applications?

Five strategies: use prompt caching (Anthropic saves up to 90%), use batch API for non-real-time work (50% discount on OpenAI and Anthropic), route simple queries to cheaper models, implement response caching for repeated queries, and use TokenMix.ai's cost-optimized routing. The biggest savings come from model selection -- using GPT-4.1 mini instead of GPT-4.1 cuts costs by 80%.

Is async important for Python AI API calls?

Yes, for production applications. Async allows concurrent API calls without blocking your server. A synchronous Flask app handles one API call at a time per worker. An async FastAPI app handles dozens concurrently. Use AsyncOpenAI or AsyncAnthropic for web applications that serve multiple users simultaneously.


Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: OpenAI Python SDK, Anthropic Python SDK, Google GenAI SDK + TokenMix.ai