TokenMix Research Lab · 2026-04-13

How to Call AI API in Python: Universal Code for OpenAI, Anthropic, Google, DeepSeek, and Groq (2026)
Last Updated: 2026-04-29
Author: TokenMix Research Lab
One Python pattern works with every major AI provider. The OpenAI SDK has become the universal standard -- OpenAI, DeepSeek, Groq, and others all use the same API format. Anthropic and Google have their own SDKs but follow similar patterns. This tutorial gives you working Python code for all five providers, explains the key parameters, and shows how to call any AI API through a single TokenMix.ai endpoint. All code tested and verified as of April 2026.
Table of Contents
- Quick Reference: 5 Providers in 5 Code Snippets
- Prerequisites: What You Need Before Starting
- The Universal Pattern: How All AI API Calls Work
- Provider 1: OpenAI (GPT Models)
- Provider 2: Anthropic (Claude Models)
- Provider 3: Google (Gemini Models)
- Provider 4: DeepSeek
- Provider 5: Groq (Llama and Open Models)
- The Universal Approach: All Providers via TokenMix.ai
- Key Parameters Explained
- Streaming Responses in Python
- Error Handling Best Practices
- How to Choose Which Provider to Call
- Conclusion
- FAQ
Quick Reference: 5 Providers in 5 Code Snippets
| Provider | SDK Package | Base URL | Model Example |
|---|---|---|---|
| OpenAI | openai |
https://api.openai.com/v1 |
gpt-4.1-mini |
| Anthropic | anthropic |
https://api.anthropic.com |
claude-sonnet-4-20250514 |
google-generativeai |
Google AI Studio | gemini-2.0-flash |
|
| DeepSeek | openai (compatible) |
https://api.deepseek.com |
deepseek-chat |
| Groq | openai (compatible) |
https://api.groq.com/openai/v1 |
llama-4-scout-17b-16e-instruct |
| TokenMix.ai | openai (compatible) |
https://api.tokenmix.ai/v1 |
Any model from any provider |
Prerequisites: What You Need Before Starting
Required:
- Python 3.8 or later. Check with
python --version. - pip package manager. Comes with Python.
- An API key from at least one provider. See our DeepSeek API key tutorial for a step-by-step guide.
Install the SDKs:
# Install all provider SDKs at once
pip install openai anthropic google-generativeai
# Or install only what you need
pip install openai # Works for OpenAI, DeepSeek, Groq, TokenMix.ai
pip install anthropic # For Anthropic Claude
pip install google-generativeai # For Google Gemini
Set up your API keys as environment variables:
# Add to your .env file or export directly
export OPENAI_API_KEY="sk-your-openai-key"
export ANTHROPIC_API_KEY="sk-ant-your-anthropic-key"
export GOOGLE_API_KEY="your-google-key"
export DEEPSEEK_API_KEY="sk-your-deepseek-key"
export GROQ_API_KEY="gsk_your-groq-key"
export TOKENMIX_API_KEY="your-tokenmix-key"
Never hardcode API keys in your source code. Use environment variables or a secrets manager.
The Universal Pattern: How All AI API Calls Work
Every AI API call in Python follows the same four-step pattern, regardless of provider.
# Step 1: Import and initialize the client
from openai import OpenAI
client = OpenAI(api_key="your-key")
# Step 2: Define your messages
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Python?"}
]
# Step 3: Make the API call
response = client.chat.completions.create(
model="gpt-4.1-mini",
messages=messages,
max_tokens=300,
temperature=0.7
)
# Step 4: Extract the response
answer = response.choices[0].message.content
tokens_used = response.usage.total_tokens
print(answer)
print(f"Tokens used: {tokens_used}")
This pattern works for OpenAI, DeepSeek, Groq, and any OpenAI-compatible provider. Anthropic and Google use slightly different syntax but the same conceptual flow.
Provider 1: OpenAI (GPT Models)
OpenAI is the most widely used AI API provider. Their SDK sets the standard that other providers follow.
Installation: pip install openai
Complete example:
import os
from openai import OpenAI
# Initialize client
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
# Simple chat completion
response = client.chat.completions.create(
model="gpt-4.1-mini", # Budget model, great for most tasks
messages=[
{"role": "system", "content": "You are a concise technical assistant."},
{"role": "user", "content": "Explain list comprehension in Python."}
],
max_tokens=300,
temperature=0.3
)
print(response.choices[0].message.content)
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
Available models:
| Model | Best For | Input Price |
|---|---|---|
gpt-5.4 |
Complex reasoning, best quality | $2.50/M |
gpt-4.1 |
Strong general purpose | $2.00/M |
gpt-4.1-mini |
Best value, most tasks | $0.40/M |
gpt-4.1-nano |
Simple tasks, lowest cost | $0.10/M |
o4-mini |
Reasoning-heavy tasks | $1.10/M |
Provider 2: Anthropic (Claude Models)
Anthropic uses its own SDK with a different syntax. The messages structure is similar but the client initialization and response format differ.
Installation: pip install anthropic
Complete example:
import os
import anthropic
# Initialize client
client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
# Chat completion
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=300,
system="You are a concise technical assistant.", # System prompt is separate
messages=[
{"role": "user", "content": "Explain list comprehension in Python."}
]
)
print(response.content[0].text)
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
Key differences from OpenAI:
- System prompt is a separate parameter, not in the messages array
- Response text is in
response.content[0].text, notresponse.choices[0].message.content - Usage is
input_tokensandoutput_tokens, notprompt_tokensandcompletion_tokens - The method is
client.messages.create(), notclient.chat.completions.create()
Available models:
| Model | Best For | Input Price |
|---|---|---|
claude-opus-4-20250514 |
Best quality, complex tasks | $15.00/M |
claude-sonnet-4-20250514 |
Strong general purpose | $3.00/M |
claude-haiku-3-5-20241022 |
Fast, budget option | $0.80/M |
Provider 3: Google (Gemini Models)
Google offers two SDK options: the google-generativeai package for Google AI Studio, and the Vertex AI SDK for enterprise. Here we use the simpler AI Studio approach.
Installation: pip install google-generativeai
Complete example:
import os
import google.generativeai as genai
# Initialize
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
# Create model instance
model = genai.GenerativeModel(
model_name="gemini-2.0-flash",
system_instruction="You are a concise technical assistant."
)
# Generate response
response = model.generate_content("Explain list comprehension in Python.")
print(response.text)
print(f"Input tokens: {response.usage_metadata.prompt_token_count}")
print(f"Output tokens: {response.usage_metadata.candidates_token_count}")
Key differences from OpenAI:
- Uses
genai.configure()instead of client initialization - Creates a model object first, then calls
generate_content() - System instruction is set at model creation, not per-request
- Response text is
response.textdirectly - Token counts are in
response.usage_metadata
Available models:
| Model | Best For | Input Price |
|---|---|---|
gemini-3.1-pro |
Complex tasks, long context | $1.25/M |
gemini-2.0-flash |
Fast, budget, 1M context | $0.075/M |
For a detailed Google vs OpenAI comparison, see our OpenAI vs Google AI API guide.
Provider 4: DeepSeek
DeepSeek uses an OpenAI-compatible API. You use the same openai Python package -- just change the base URL and API key.
Installation: pip install openai (same package as OpenAI)
Complete example:
import os
from openai import OpenAI
# Initialize with DeepSeek endpoint
client = OpenAI(
api_key=os.environ.get("DEEPSEEK_API_KEY"),
base_url="https://api.deepseek.com"
)
# Same syntax as OpenAI
response = client.chat.completions.create(
model="deepseek-chat", # DeepSeek V3
messages=[
{"role": "system", "content": "You are a concise technical assistant."},
{"role": "user", "content": "Explain list comprehension in Python."}
],
max_tokens=300,
temperature=0.3
)
print(response.choices[0].message.content)
print(f"Total tokens: {response.usage.total_tokens}")
The code is identical to OpenAI except for two lines: the API key and the base URL. This is the beauty of OpenAI-compatible APIs -- zero learning curve.
Available models:
| Model | Best For | Input Price |
|---|---|---|
deepseek-chat |
General purpose (V3) | $0.14/M |
deepseek-reasoner |
Complex reasoning (R1) | $0.55/M |
For a complete setup guide, see our DeepSeek API key tutorial.
Provider 5: Groq (Llama and Open Models)
Groq hosts open-source models on custom LPU hardware for ultra-fast inference. Like DeepSeek, it uses the OpenAI-compatible format.
Installation: pip install openai (same package)
Complete example:
import os
from openai import OpenAI
# Initialize with Groq endpoint
client = OpenAI(
api_key=os.environ.get("GROQ_API_KEY"),
base_url="https://api.groq.com/openai/v1"
)
# Same syntax, different models
response = client.chat.completions.create(
model="llama-4-scout-17b-16e-instruct",
messages=[
{"role": "system", "content": "You are a concise technical assistant."},
{"role": "user", "content": "Explain list comprehension in Python."}
],
max_tokens=300,
temperature=0.3
)
print(response.choices[0].message.content)
print(f"Total tokens: {response.usage.total_tokens}")
Available models on Groq:
| Model | Best For | Speed |
|---|---|---|
llama-4-scout-17b-16e-instruct |
General purpose | Ultra-fast (~200ms TTFT) |
llama-4-maverick-17b-128e-instruct |
Complex tasks | Fast |
llama-3.3-70b-versatile |
Quality-focused | Fast |
The Universal Approach: All Providers via TokenMix.ai
If you want to call any model from any provider through a single endpoint, TokenMix.ai provides a unified OpenAI-compatible API.
One client, any model:
import os
from openai import OpenAI
# Single client for all providers
client = OpenAI(
api_key=os.environ.get("TOKENMIX_API_KEY"),
base_url="https://api.tokenmix.ai/v1"
)
# Call OpenAI models
gpt_response = client.chat.completions.create(
model="gpt-4.1-mini",
messages=[{"role": "user", "content": "Hello from GPT!"}]
)
# Call DeepSeek models
ds_response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Hello from DeepSeek!"}]
)
# Call Google models
gemini_response = client.chat.completions.create(
model="gemini-2.0-flash",
messages=[{"role": "user", "content": "Hello from Gemini!"}]
)
# Call Anthropic models
claude_response = client.chat.completions.create(
model="claude-sonnet-4",
messages=[{"role": "user", "content": "Hello from Claude!"}]
)
Why this matters:
- One API key instead of five
- One billing dashboard instead of five
- Same code pattern for every model
- Switch models by changing one string
- Automatic failover if a provider goes down
TokenMix.ai supports 300+ models from all major providers. Check available models and pricing at TokenMix.ai.
Key Parameters Explained
Every AI API call accepts these common parameters. Understanding them is essential for controlling quality and cost.
| Parameter | What It Does | Recommended Values |
|---|---|---|
model |
Which AI model to use | See provider tables above |
messages |
Conversation history (system + user + assistant) | Always include system prompt |
max_tokens |
Maximum output length | Set based on expected response size |
temperature |
Randomness (0 = deterministic, 1 = creative) | 0-0.3 for factual, 0.7-1.0 for creative |
top_p |
Nucleus sampling (alternative to temperature) | Usually leave at 1.0 |
stream |
Return tokens as they generate | True for chat UIs |
stop |
Stop sequences (halt generation at specific text) | Useful for structured output |
Temperature guide:
| Use Case | Temperature | Why |
|---|---|---|
| Code generation | 0-0.2 | Deterministic, consistent output |
| Data extraction | 0 | Exact, reproducible results |
| General Q&A | 0.3-0.5 | Balanced accuracy and naturalness |
| Creative writing | 0.7-1.0 | Varied, creative responses |
| Brainstorming | 0.9-1.0 | Maximum diversity |
Streaming Responses in Python
Streaming returns tokens as they are generated, instead of waiting for the full response. Essential for chat interfaces.
from openai import OpenAI
client = OpenAI()
# Streaming response
stream = client.chat.completions.create(
model="gpt-4.1-mini",
messages=[{"role": "user", "content": "Write a short poem about Python."}],
stream=True
)
# Print tokens as they arrive
full_response = ""
for chunk in stream:
if chunk.choices[0].delta.content:
token = chunk.choices[0].delta.content
print(token, end="", flush=True)
full_response += token
print() # New line after stream ends
Streaming works with all OpenAI-compatible providers (OpenAI, DeepSeek, Groq, TokenMix.ai). For Anthropic, the syntax is slightly different:
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=300,
messages=[{"role": "user", "content": "Write a short poem about Python."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Error Handling Best Practices
Production code needs robust error handling. Here is a complete example.
import os
import time
from openai import OpenAI, RateLimitError, APIError, APIConnectionError
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
def call_ai_api(messages, model="gpt-4.1-mini", max_retries=3):
"""Make an AI API call with proper error handling."""
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model=model,
messages=messages,
max_tokens=500,
temperature=0.3
)
return {
"content": response.choices[0].message.content,
"tokens": response.usage.total_tokens,
"model": response.model
}
except RateLimitError:
# Wait and retry with exponential backoff
wait = 2 ** attempt
print(f"Rate limited. Waiting {wait}s...")
time.sleep(wait)
except APIConnectionError:
# Network issue -- retry
print(f"Connection error. Retrying...")
time.sleep(1)
except APIError as e:
if e.status_code >= 500:
# Server error -- retry
time.sleep(2)
else:
# Client error (400, 401, etc.) -- do not retry
raise
raise Exception(f"Failed after {max_retries} retries")
# Usage
result = call_ai_api([
{"role": "user", "content": "What is Python?"}
])
print(result["content"])
For comprehensive error handling including multi-provider failover, see our 429 error solutions guide.
How to Choose Which Provider to Call
| Your Need | Best Provider | Best Model | Why |
|---|---|---|---|
| Cheapest possible | DeepSeek | deepseek-chat |
$0.14/M input |
| Best free tier | gemini-2.0-flash |
No credit card needed | |
| Best coding | Anthropic | claude-sonnet-4 |
Top SWE-bench scores |
| Fastest inference | Groq | llama-4-scout |
200ms first token |
| Largest ecosystem | OpenAI | gpt-4.1-mini |
Most tools and tutorials |
| One API for everything | TokenMix.ai | Any model | 300+ models, one key |
For a detailed provider comparison, see our guide on choosing the right LLM API.
Conclusion
Calling an AI API in Python follows the same pattern across all providers: initialize a client, define messages, call the API, extract the response. The OpenAI SDK works directly with OpenAI, DeepSeek, and Groq. Anthropic and Google have their own SDKs with minor syntax differences.
The fastest path to using all providers: install the openai package, point it at TokenMix.ai's endpoint, and switch between 300+ models by changing a single string. One API key, one bill, any model from any provider.
Get started with any provider today. Check real-time model availability and pricing at TokenMix.ai.
FAQ
What is the easiest AI API to call from Python?
OpenAI is the easiest to start with due to the largest number of tutorials and community resources. The code is 5 lines: import, initialize client, call API, print response. DeepSeek and Groq use the identical code pattern (same SDK, different base URL), so they are equally easy once you know the OpenAI pattern.
Do I need different Python packages for each AI provider?
No. The openai package works for OpenAI, DeepSeek, Groq, and any OpenAI-compatible provider (including TokenMix.ai). You only need anthropic for Claude and google-generativeai for Gemini if calling them directly. Through TokenMix.ai, the openai package accesses all providers.
How do I handle API keys securely in Python?
Store API keys as environment variables and read them with os.environ.get("KEY_NAME"). Never hardcode keys in source files. For production, use a secrets manager (AWS Secrets Manager, HashiCorp Vault, or similar). Add .env to your .gitignore to prevent accidental commits.
What is the difference between streaming and non-streaming API calls?
Non-streaming waits for the complete response before returning. Streaming returns tokens as they are generated, allowing real-time display. Use streaming for chat interfaces (better UX) and non-streaming for batch processing (simpler code). Streaming uses the same number of tokens and costs the same.
Can I call multiple AI providers from the same Python script?
Yes. Either initialize multiple clients (one per provider) or use TokenMix.ai as a unified endpoint that routes to any provider. The unified approach is simpler -- one client, one API key, switch models by changing the model name string.
How much does it cost to make 1,000 AI API calls in Python?
At 400 tokens per call (simple chat), 1,000 calls cost: $0.08 on DeepSeek V3, $0.10 on GPT-4.1 nano, $0.40 on GPT-4.1 mini, $2.00 on GPT-4.1, $3.60 on Claude Sonnet 4. For detailed cost breakdowns, check our AI API cost per request guide on TokenMix.ai.
Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: OpenAI Python SDK, Anthropic Python SDK, Google AI Python SDK, TokenMix.ai