How to Call AI API in Python: Universal Code for OpenAI, Anthropic, Google, DeepSeek, and Groq (2026)
One Python pattern works with every major AI provider. The OpenAI SDK has become the universal standard -- OpenAI, DeepSeek, Groq, and others all use the same API format. Anthropic and Google have their own SDKs but follow similar patterns. This tutorial gives you working Python code for all five providers, explains the key parameters, and shows how to call any AI API through a single TokenMix.ai endpoint. All code tested and verified as of April 2026.
Table of Contents
[Quick Reference: 5 Providers in 5 Code Snippets]
[Prerequisites: What You Need Before Starting]
[The Universal Pattern: How All AI API Calls Work]
[Provider 1: OpenAI (GPT Models)]
[Provider 2: Anthropic (Claude Models)]
[Provider 3: Google (Gemini Models)]
[Provider 4: DeepSeek]
[Provider 5: Groq (Llama and Open Models)]
[The Universal Approach: All Providers via TokenMix.ai]
# Install all provider SDKs at once
pip install openai anthropic google-generativeai
# Or install only what you need
pip install openai # Works for OpenAI, DeepSeek, Groq, TokenMix.ai
pip install anthropic # For Anthropic Claude
pip install google-generativeai # For Google Gemini
Set up your API keys as environment variables:
# Add to your .env file or export directly
export OPENAI_API_KEY="sk-your-openai-key"
export ANTHROPIC_API_KEY="sk-ant-your-anthropic-key"
export GOOGLE_API_KEY="your-google-key"
export DEEPSEEK_API_KEY="sk-your-deepseek-key"
export GROQ_API_KEY="gsk_your-groq-key"
export TOKENMIX_API_KEY="your-tokenmix-key"
Never hardcode API keys in your source code. Use environment variables or a secrets manager.
The Universal Pattern: How All AI API Calls Work
Every AI API call in Python follows the same four-step pattern, regardless of provider.
# Step 1: Import and initialize the client
from openai import OpenAI
client = OpenAI(api_key="your-key")
# Step 2: Define your messages
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Python?"}
]
# Step 3: Make the API call
response = client.chat.completions.create(
model="gpt-4.1-mini",
messages=messages,
max_tokens=300,
temperature=0.7
)
# Step 4: Extract the response
answer = response.choices[0].message.content
tokens_used = response.usage.total_tokens
print(answer)
print(f"Tokens used: {tokens_used}")
This pattern works for OpenAI, DeepSeek, Groq, and any OpenAI-compatible provider. Anthropic and Google use slightly different syntax but the same conceptual flow.
Provider 1: OpenAI (GPT Models)
OpenAI is the most widely used AI API provider. Their SDK sets the standard that other providers follow.
Installation:pip install openai
Complete example:
import os
from openai import OpenAI
# Initialize client
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
# Simple chat completion
response = client.chat.completions.create(
model="gpt-4.1-mini", # Budget model, great for most tasks
messages=[
{"role": "system", "content": "You are a concise technical assistant."},
{"role": "user", "content": "Explain list comprehension in Python."}
],
max_tokens=300,
temperature=0.3
)
print(response.choices[0].message.content)
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
Available models:
Model
Best For
Input Price
gpt-5.4
Complex reasoning, best quality
$2.50/M
gpt-4.1
Strong general purpose
$2.00/M
gpt-4.1-mini
Best value, most tasks
$0.40/M
gpt-4.1-nano
Simple tasks, lowest cost
$0.10/M
o4-mini
Reasoning-heavy tasks
.10/M
Provider 2: Anthropic (Claude Models)
Anthropic uses its own SDK with a different syntax. The messages structure is similar but the client initialization and response format differ.
Installation:pip install anthropic
Complete example:
import os
import anthropic
# Initialize client
client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
# Chat completion
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=300,
system="You are a concise technical assistant.", # System prompt is separate
messages=[
{"role": "user", "content": "Explain list comprehension in Python."}
]
)
print(response.content[0].text)
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
Key differences from OpenAI:
System prompt is a separate parameter, not in the messages array
Response text is in response.content[0].text, not response.choices[0].message.content
Usage is input_tokens and output_tokens, not prompt_tokens and completion_tokens
The method is client.messages.create(), not client.chat.completions.create()
Available models:
Model
Best For
Input Price
claude-opus-4-20250514
Best quality, complex tasks
5.00/M
claude-sonnet-4-20250514
Strong general purpose
$3.00/M
claude-haiku-3-5-20241022
Fast, budget option
$0.80/M
Provider 3: Google (Gemini Models)
Google offers two SDK options: the google-generativeai package for Google AI Studio, and the Vertex AI SDK for enterprise. Here we use the simpler AI Studio approach.
Installation:pip install google-generativeai
Complete example:
import os
import google.generativeai as genai
# Initialize
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
# Create model instance
model = genai.GenerativeModel(
model_name="gemini-2.0-flash",
system_instruction="You are a concise technical assistant."
)
# Generate response
response = model.generate_content("Explain list comprehension in Python.")
print(response.text)
print(f"Input tokens: {response.usage_metadata.prompt_token_count}")
print(f"Output tokens: {response.usage_metadata.candidates_token_count}")
Key differences from OpenAI:
Uses genai.configure() instead of client initialization
Creates a model object first, then calls generate_content()
System instruction is set at model creation, not per-request
DeepSeek uses an OpenAI-compatible API. You use the same openai Python package -- just change the base URL and API key.
Installation:pip install openai (same package as OpenAI)
Complete example:
import os
from openai import OpenAI
# Initialize with DeepSeek endpoint
client = OpenAI(
api_key=os.environ.get("DEEPSEEK_API_KEY"),
base_url="https://api.deepseek.com"
)
# Same syntax as OpenAI
response = client.chat.completions.create(
model="deepseek-chat", # DeepSeek V3
messages=[
{"role": "system", "content": "You are a concise technical assistant."},
{"role": "user", "content": "Explain list comprehension in Python."}
],
max_tokens=300,
temperature=0.3
)
print(response.choices[0].message.content)
print(f"Total tokens: {response.usage.total_tokens}")
The code is identical to OpenAI except for two lines: the API key and the base URL. This is the beauty of OpenAI-compatible APIs -- zero learning curve.
Groq hosts open-source models on custom LPU hardware for ultra-fast inference. Like DeepSeek, it uses the OpenAI-compatible format.
Installation:pip install openai (same package)
Complete example:
import os
from openai import OpenAI
# Initialize with Groq endpoint
client = OpenAI(
api_key=os.environ.get("GROQ_API_KEY"),
base_url="https://api.groq.com/openai/v1"
)
# Same syntax, different models
response = client.chat.completions.create(
model="llama-4-scout-17b-16e-instruct",
messages=[
{"role": "system", "content": "You are a concise technical assistant."},
{"role": "user", "content": "Explain list comprehension in Python."}
],
max_tokens=300,
temperature=0.3
)
print(response.choices[0].message.content)
print(f"Total tokens: {response.usage.total_tokens}")
Available models on Groq:
Model
Best For
Speed
llama-4-scout-17b-16e-instruct
General purpose
Ultra-fast (~200ms TTFT)
llama-4-maverick-17b-128e-instruct
Complex tasks
Fast
llama-3.3-70b-versatile
Quality-focused
Fast
The Universal Approach: All Providers via TokenMix.ai
If you want to call any model from any provider through a single endpoint, TokenMix.ai provides a unified OpenAI-compatible API.
One client, any model:
import os
from openai import OpenAI
# Single client for all providers
client = OpenAI(
api_key=os.environ.get("TOKENMIX_API_KEY"),
base_url="https://api.tokenmix.ai/v1"
)
# Call OpenAI models
gpt_response = client.chat.completions.create(
model="gpt-4.1-mini",
messages=[{"role": "user", "content": "Hello from GPT!"}]
)
# Call DeepSeek models
ds_response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Hello from DeepSeek!"}]
)
# Call Google models
gemini_response = client.chat.completions.create(
model="gemini-2.0-flash",
messages=[{"role": "user", "content": "Hello from Gemini!"}]
)
# Call Anthropic models
claude_response = client.chat.completions.create(
model="claude-sonnet-4",
messages=[{"role": "user", "content": "Hello from Claude!"}]
)
Why this matters:
One API key instead of five
One billing dashboard instead of five
Same code pattern for every model
Switch models by changing one string
Automatic failover if a provider goes down
TokenMix.ai supports 300+ models from all major providers. Check available models and pricing at TokenMix.ai.
Key Parameters Explained
Every AI API call accepts these common parameters. Understanding them is essential for controlling quality and cost.
Parameter
What It Does
Recommended Values
model
Which AI model to use
See provider tables above
messages
Conversation history (system + user + assistant)
Always include system prompt
max_tokens
Maximum output length
Set based on expected response size
temperature
Randomness (0 = deterministic, 1 = creative)
0-0.3 for factual, 0.7-1.0 for creative
top_p
Nucleus sampling (alternative to temperature)
Usually leave at 1.0
stream
Return tokens as they generate
True for chat UIs
stop
Stop sequences (halt generation at specific text)
Useful for structured output
Temperature guide:
Use Case
Temperature
Why
Code generation
0-0.2
Deterministic, consistent output
Data extraction
0
Exact, reproducible results
General Q&A
0.3-0.5
Balanced accuracy and naturalness
Creative writing
0.7-1.0
Varied, creative responses
Brainstorming
0.9-1.0
Maximum diversity
Streaming Responses in Python
Streaming returns tokens as they are generated, instead of waiting for the full response. Essential for chat interfaces.
from openai import OpenAI
client = OpenAI()
# Streaming response
stream = client.chat.completions.create(
model="gpt-4.1-mini",
messages=[{"role": "user", "content": "Write a short poem about Python."}],
stream=True
)
# Print tokens as they arrive
full_response = ""
for chunk in stream:
if chunk.choices[0].delta.content:
token = chunk.choices[0].delta.content
print(token, end="", flush=True)
full_response += token
print() # New line after stream ends
Streaming works with all OpenAI-compatible providers (OpenAI, DeepSeek, Groq, TokenMix.ai). For Anthropic, the syntax is slightly different:
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=300,
messages=[{"role": "user", "content": "Write a short poem about Python."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Error Handling Best Practices
Production code needs robust error handling. Here is a complete example.
import os
import time
from openai import OpenAI, RateLimitError, APIError, APIConnectionError
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
def call_ai_api(messages, model="gpt-4.1-mini", max_retries=3):
"""Make an AI API call with proper error handling."""
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model=model,
messages=messages,
max_tokens=500,
temperature=0.3
)
return {
"content": response.choices[0].message.content,
"tokens": response.usage.total_tokens,
"model": response.model
}
except RateLimitError:
# Wait and retry with exponential backoff
wait = 2 ** attempt
print(f"Rate limited. Waiting {wait}s...")
time.sleep(wait)
except APIConnectionError:
# Network issue -- retry
print(f"Connection error. Retrying...")
time.sleep(1)
except APIError as e:
if e.status_code >= 500:
# Server error -- retry
time.sleep(2)
else:
# Client error (400, 401, etc.) -- do not retry
raise
raise Exception(f"Failed after {max_retries} retries")
# Usage
result = call_ai_api([
{"role": "user", "content": "What is Python?"}
])
print(result["content"])
For comprehensive error handling including multi-provider failover, see our 429 error solutions guide.
Calling an AI API in Python follows the same pattern across all providers: initialize a client, define messages, call the API, extract the response. The OpenAI SDK works directly with OpenAI, DeepSeek, and Groq. Anthropic and Google have their own SDKs with minor syntax differences.
The fastest path to using all providers: install the openai package, point it at TokenMix.ai's endpoint, and switch between 300+ models by changing a single string. One API key, one bill, any model from any provider.
Get started with any provider today. Check real-time model availability and pricing at TokenMix.ai.
FAQ
What is the easiest AI API to call from Python?
OpenAI is the easiest to start with due to the largest number of tutorials and community resources. The code is 5 lines: import, initialize client, call API, print response. DeepSeek and Groq use the identical code pattern (same SDK, different base URL), so they are equally easy once you know the OpenAI pattern.
Do I need different Python packages for each AI provider?
No. The openai package works for OpenAI, DeepSeek, Groq, and any OpenAI-compatible provider (including TokenMix.ai). You only need anthropic for Claude and google-generativeai for Gemini if calling them directly. Through TokenMix.ai, the openai package accesses all providers.
How do I handle API keys securely in Python?
Store API keys as environment variables and read them with os.environ.get("KEY_NAME"). Never hardcode keys in source files. For production, use a secrets manager (AWS Secrets Manager, HashiCorp Vault, or similar). Add .env to your .gitignore to prevent accidental commits.
What is the difference between streaming and non-streaming API calls?
Non-streaming waits for the complete response before returning. Streaming returns tokens as they are generated, allowing real-time display. Use streaming for chat interfaces (better UX) and non-streaming for batch processing (simpler code). Streaming uses the same number of tokens and costs the same.
Can I call multiple AI providers from the same Python script?
Yes. Either initialize multiple clients (one per provider) or use TokenMix.ai as a unified endpoint that routes to any provider. The unified approach is simpler -- one client, one API key, switch models by changing the model name string.
How much does it cost to make 1,000 AI API calls in Python?
At 400 tokens per call (simple chat), 1,000 calls cost: $0.08 on DeepSeek V3, $0.10 on GPT-4.1 nano, $0.40 on GPT-4.1 mini, $2.00 on GPT-4.1, $3.60 on Claude Sonnet 4. For detailed cost breakdowns, check our AI API cost per request guide on TokenMix.ai.