TokenMix Research Lab · 2026-04-13

How to Use DeepSeek API in Python: Step-by-Step Tutorial with Code Examples (2026)
Last Updated: 2026-04-29
Author: TokenMix Research Lab
Per DeepSeek's official API documentation, DeepSeek implements the OpenAI-compatible API — pip install openai, set base_url="https://api.deepseek.com", and a working call runs in 3 lines.
DeepSeek's API docs confirm full OpenAI SDK compatibility, meaning every feature in this tutorial (streaming, system prompts, function calling, JSON mode) works with the unmodified openai package. New accounts get 5M free tokens at $0.27/$1.10 per million paid rates — roughly 2,100 chat calls per $1 spent. Compared to OpenAI's GPT-5.4 Mini at $0.40/$1.60, DeepSeek runs ~32% cheaper with comparable Python integration. Code examples below were tested against DeepSeek API as of 2026-04-28.
All code examples are tested against the DeepSeek API as of April 2026. TokenMix.ai maintains compatibility data for DeepSeek and 300+ other API providers.
Table of Contents
- Prerequisites: What You Need Before Starting
- Step 1: Install the OpenAI Python Package
- Step 2: Get Your DeepSeek API Key
- Step 3: Make Your First DeepSeek API Call
- Step 4: Add Streaming for Real-Time Responses
- Step 5: Use System Prompts and Multi-Turn Conversations
- Step 6: Enable Prompt Caching to Save Money
- Step 7: Handle Errors and Rate Limits
- Complete Working Example: DeepSeek Chat Application
- DeepSeek API Python Quick Reference Table
- FAQ
Prerequisites: What You Need Before Starting
Need only Python 3.8+, the openai pip package, and a free DeepSeek API key — per DeepSeek's signup page, no DeepSeek-specific SDK exists; the standard OpenAI client handles everything.
Before writing code, make sure you have these ready.
| Requirement | Details |
|---|---|
| Python version | 3.8 or higher |
| Package manager | pip (comes with Python) |
| DeepSeek account | Free signup at platform.deepseek.com |
| API key | Generated from DeepSeek console |
| Free credits | 5M tokens included with new accounts |
No DeepSeek-specific SDK is needed. The standard OpenAI Python package works because DeepSeek implements the OpenAI-compatible API format. This also means any code you write for DeepSeek can be switched to OpenAI, Groq, or other compatible providers by changing two lines.
Step 1: Install the OpenAI Python Package
Single command: pip install openai. Per OpenAI's Python SDK release notes, version 1.x is required for the chat completions interface used here.
Open your terminal and install the package.
pip install openai
Verify the installation:
python -c "import openai; print(openai.__version__)"
You should see version 1.x or higher. If you have an older version, upgrade:
pip install --upgrade openai
That is the only dependency. No other packages are required for basic DeepSeek API usage.
Step 2: Get Your DeepSeek API Key
Sign up at platform.deepseek.com, generate an API key from the console, store it as DEEPSEEK_API_KEY env var — never hardcode in source or commit .env files to git.
2a. Go to platform.deepseek.com and sign up or log in.
2b. Navigate to "API Keys" in the left sidebar.
2c. Click "Create new API key." Give it a descriptive name like "python-tutorial."
2d. Copy the key immediately. It will not be shown again.
2e. Store the key securely. The recommended approach is an environment variable:
# Linux/Mac
export DEEPSEEK_API_KEY="your-key-here"
# Windows (Command Prompt)
set DEEPSEEK_API_KEY=your-key-here
# Windows (PowerShell)
$env:DEEPSEEK_API_KEY="your-key-here"
Alternatively, use a .env file with python-dotenv:
pip install python-dotenv
# .env file
DEEPSEEK_API_KEY=your-key-here
Important: Never hardcode API keys in your source code. Never commit .env files to version control. Add .env to your .gitignore.
Step 3: Make Your First DeepSeek API Call
Per DeepSeek's API docs, model="deepseek-chat" maps to V4 (general purpose) and model="deepseek-reasoner" maps to R1 (reasoning, uses 3-10x more tokens for chain-of-thought).
Here is the minimal working example. Copy this, replace the API key, and run it.
from openai import OpenAI
import os
client = OpenAI(
api_key=os.getenv("DEEPSEEK_API_KEY"),
base_url="https://api.deepseek.com"
)
response = client.chat.completions.create(
model="deepseek-chat", # This is DeepSeek V4
messages=[
{"role": "user", "content": "What is the capital of France?"}
],
max_tokens=100
)
print(response.choices[0].message.content)
Expected output:
The capital of France is Paris.
What each parameter does:
| Parameter | Value | Purpose |
|---|---|---|
api_key |
Your DeepSeek key | Authentication |
base_url |
https://api.deepseek.com |
Points to DeepSeek instead of OpenAI |
model |
deepseek-chat |
DeepSeek V4 (general purpose) |
messages |
List of message dicts | Conversation history |
max_tokens |
100 | Maximum response length |
Available DeepSeek models:
| Model ID | Model Name | Best For |
|---|---|---|
deepseek-chat |
DeepSeek V4 | General chat, content, code |
deepseek-reasoner |
DeepSeek R1 | Complex reasoning, math |
Use deepseek-chat for most tasks. Switch to deepseek-reasoner only when you need explicit chain-of-thought reasoning. R1 uses more tokens due to internal thinking, which consumes your free credits faster. For more details on DeepSeek pricing, see our DeepSeek API pricing guide.
Step 4: Add Streaming for Real-Time Responses
Set stream=True on the create call and iterate chunk.choices[0].delta.content — token billing is identical to non-streaming per DeepSeek's pricing docs; the perceived latency improvement is purely UX.
Streaming shows the response token by token as it generates, instead of waiting for the complete response. Essential for chat interfaces.
from openai import OpenAI
import os
client = OpenAI(
api_key=os.getenv("DEEPSEEK_API_KEY"),
base_url="https://api.deepseek.com"
)
stream = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "user", "content": "Explain what an API is in 3 sentences."}
],
max_tokens=200,
stream=True # Enable streaming
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
print() # New line at the end
Key points about streaming:
- Set
stream=Truein the create call - Response comes as chunks, not a complete message
- Each chunk's content is in
chunk.choices[0].delta.content - Content can be
Nonefor the final chunk, so always check - Token usage is still counted the same way (no cost difference)
Streaming is especially important for side projects with chat interfaces. Users perceive streamed responses as faster, even though total generation time is the same. See our AI API for side projects guide for more implementation patterns.
Step 5: Use System Prompts and Multi-Turn Conversations
DeepSeek does NOT maintain conversation state server-side — per DeepSeek's API docs, you must send full conversation history with every request, and every previous turn counts as input tokens billed at $0.27/M.
Real applications need system prompts (to set behavior) and multi-turn conversations (to maintain context).
System Prompt Example
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{
"role": "system",
"content": "You are a concise technical assistant. Answer in 2-3 sentences maximum. Use specific numbers when possible."
},
{
"role": "user",
"content": "How much does [GPT-5.4](https://tokenmix.ai/blog/gpt-5-api-pricing) Mini cost?"
}
],
max_tokens=150
)
print(response.choices[0].message.content)
Multi-Turn Conversation
conversation = [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to reverse a string."},
]
# First turn
response = client.chat.completions.create(
model="deepseek-chat",
messages=conversation,
max_tokens=300
)
assistant_reply = response.choices[0].message.content
print("Assistant:", assistant_reply)
# Add assistant's reply to conversation history
conversation.append({"role": "assistant", "content": assistant_reply})
# Second turn
conversation.append({"role": "user", "content": "Now make it handle Unicode characters correctly."})
response = client.chat.completions.create(
model="deepseek-chat",
messages=conversation,
max_tokens=300
)
print("Assistant:", response.choices[0].message.content)
Important: DeepSeek does not maintain conversation state server-side. You must send the full conversation history with every request. Each message in the history counts as input tokens and is billed accordingly.
Token management tip: For long conversations, trim older messages to stay within context limits and budget. Keep the system prompt and the last 5-10 messages for most use cases.
Step 6: Enable Prompt Caching to Save Money
Per DeepSeek's pricing documentation, cached input tokens drop from $0.27/M to $0.07/M (74% savings) — caching is automatic for matching 128+ token prefixes, no API parameter required.
DeepSeek supports automatic prompt caching for repeated prefixes. When the beginning of your messages (system prompt + shared context) is identical across requests, DeepSeek caches those tokens and charges a reduced rate.
# The system prompt below will be cached after the first call
system_prompt = """You are a customer support agent for an e-commerce platform.
You have access to the following policies:
1. Returns accepted within 30 days
2. Free shipping on orders over $50
3. Price matching available for 14 days after purchase
4. Loyalty points: 1 point per $1 spent
Respond professionally and concisely."""
# First call: full price for all input tokens
response1 = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Can I return an item I bought 3 weeks ago?"}
],
max_tokens=200
)
# Second call: system prompt tokens are cached (cheaper)
response2 = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": "How do loyalty points work?"}
],
max_tokens=200
)
How caching saves money:
| Token Type | Standard Rate (V4) | Cached Rate | Savings |
|---|---|---|---|
| Input (uncached) | $0.27/M | $0.27/M | 0% |
| Input (cached) | $0.27/M | $0.07/M | 74% |
| Output | $1.10/M | $1.10/M | 0% |
Caching is automatic. If the first 128+ tokens of your message sequence match a recent request, DeepSeek applies the cached rate. No configuration needed.
For a complete guide to prompt caching across providers, see our prompt caching guide.
Step 7: Handle Errors and Rate Limits
Catch RateLimitError (429), APIConnectionError (network), and APIError (5xx) with exponential backoff (1s/2s/4s) — DeepSeek returns 402 for insufficient balance, 401 for bad keys, per DeepSeek's error code reference.
Production code needs error handling. Here are the most common errors and how to handle them.
from openai import OpenAI, APIError, RateLimitError, APIConnectionError
import time
import os
client = OpenAI(
api_key=os.getenv("DEEPSEEK_API_KEY"),
base_url="https://api.deepseek.com"
)
def call_deepseek(messages, max_retries=3):
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="deepseek-chat",
messages=messages,
max_tokens=500
)
return response.choices[0].message.content
except RateLimitError:
wait_time = 2 ** attempt # Exponential backoff: 1s, 2s, 4s
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
except APIConnectionError:
print("Connection error. Retrying...")
time.sleep(1)
except APIError as e:
print(f"API error: {e.status_code} - {e.message}")
if e.status_code >= 500:
time.sleep(2) # Server error, retry
else:
raise # Client error, don't retry
raise Exception("Max retries exceeded")
# Usage
result = call_deepseek([
{"role": "user", "content": "Hello!"}
])
print(result)
Common DeepSeek API error codes:
| Error Code | Meaning | Action |
|---|---|---|
| 400 | Bad request (invalid params) | Fix your request format |
| 401 | Invalid API key | Check your DEEPSEEK_API_KEY |
| 402 | Insufficient balance | Add credits or payment method |
| 429 | Rate limit exceeded | Wait and retry with backoff |
| 500 | Server error | Retry after short delay |
| 503 | Service unavailable | Retry after longer delay |
Complete Working Example: DeepSeek Chat Application
Production-ready 50-line terminal chat with streaming, conversation history, rate limit handling, and memory trimming — runs entirely on free credits at near-zero cost per DeepSeek's pricing.
Here is a complete, ready-to-run terminal chat application using DeepSeek.
"""
DeepSeek Chat - Terminal Chat Application
Requires: pip install openai
Set DEEPSEEK_API_KEY environment variable before running.
"""
from openai import OpenAI, RateLimitError
import os
import time
def create_client():
api_key = os.getenv("DEEPSEEK_API_KEY")
if not api_key:
raise ValueError("Set DEEPSEEK_API_KEY environment variable")
return OpenAI(api_key=api_key, base_url="https://api.deepseek.com")
def chat(client, messages):
try:
stream = client.chat.completions.create(
model="deepseek-chat",
messages=messages,
max_tokens=1000,
stream=True
)
full_response = ""
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
full_response += content
print()
return full_response
except RateLimitError:
print("\nRate limited. Waiting 5 seconds...")
time.sleep(5)
return chat(client, messages)
def main():
client = create_client()
conversation = [
{"role": "system", "content": "You are a helpful assistant. Be concise."}
]
print("DeepSeek Chat (type 'quit' to exit, 'clear' to reset)")
print("-" * 50)
while True:
user_input = input("\nYou: ").strip()
if user_input.lower() == "quit":
break
if user_input.lower() == "clear":
conversation = [conversation[0]] # Keep system prompt
print("Conversation cleared.")
continue
if not user_input:
continue
conversation.append({"role": "user", "content": user_input})
print("\nDeepSeek: ", end="")
reply = chat(client, conversation)
conversation.append({"role": "assistant", "content": reply})
# Trim conversation if it gets too long (keep last 10 exchanges)
if len(conversation) > 21: # system + 10 pairs
conversation = [conversation[0]] + conversation[-20:]
if __name__ == "__main__":
main()
To run:
export DEEPSEEK_API_KEY="your-key-here"
python deepseek_chat.py
This example includes streaming, conversation history management, rate limit handling, and memory trimming. It works within free credits and costs almost nothing on paid usage.
DeepSeek API Python Quick Reference Table
All commands here use the openai Python SDK exactly as documented in OpenAI's official Python guide — only base_url and model differ between DeepSeek and OpenAI calls.
| Task | Code Snippet | Notes |
|---|---|---|
| Initialize client | OpenAI(api_key=key, base_url="https://api.deepseek.com") |
Same as OpenAI SDK |
| Basic chat | client.chat.completions.create(model="deepseek-chat", messages=[...]) |
Returns full response |
| Streaming | Add stream=True, iterate over chunks |
Real-time output |
| System prompt | Add {"role": "system", "content": "..."} as first message |
Sets behavior |
| Multi-turn | Append all messages to conversation list | Full history each call |
| Set response length | Add max_tokens=N |
Prevents runaway responses |
| Temperature | Add temperature=0.7 (0.0 to 2.0) |
Lower = more deterministic |
| JSON mode | Add response_format={"type": "json_object"} |
Structured output |
| Reasoning model | Use model="deepseek-reasoner" |
More tokens, deeper thinking |
For function calling implementation with DeepSeek, see our function calling guide.
TokenMix.ai provides a unified API that lets you switch between DeepSeek, OpenAI, Claude, and other providers without changing your code beyond the base URL. Useful when you want to compare model outputs or add fallback providers.
FAQ
Does DeepSeek API work with the OpenAI Python SDK?
Yes. DeepSeek implements the OpenAI-compatible API format. Install the openai package with pip install openai, then create a client with base_url="https://api.deepseek.com" and your DeepSeek API key. All standard OpenAI SDK features (chat completions, streaming, function calling) work with DeepSeek.
What Python version do I need for the DeepSeek API?
Python 3.8 or higher is required. The OpenAI SDK version 1.0+ requires Python 3.8+. For the best experience, use Python 3.10 or newer, which provides better type hinting support and performance.
How much does it cost to use the DeepSeek API in Python?
DeepSeek V4 costs $0.27 per million input tokens and $1.10 per million output tokens. New accounts get 5 million free tokens. A typical chat interaction (500 input + 300 output tokens) costs about $0.00047 per call, meaning $1 buys roughly 2,100 API calls.
How do I switch between DeepSeek and OpenAI in my Python code?
Change two values: base_url and model. For DeepSeek, use base_url="https://api.deepseek.com" and model="deepseek-chat". For OpenAI, remove the base_url parameter and use model="gpt-5.4-mini". The rest of your code stays identical.
Does DeepSeek support function calling in Python?
Yes. DeepSeek supports the OpenAI function calling format (tools parameter). Define your functions as tool schemas, pass them in the tools parameter, and handle tool calls in the response. The implementation is identical to OpenAI's function calling.
How do I handle DeepSeek API rate limits in Python?
Implement exponential backoff: catch RateLimitError exceptions and retry after increasing delays (1s, 2s, 4s). The DeepSeek API returns a Retry-After header with rate limit errors. For production applications, use a queue-based system to throttle requests below the rate limit threshold.
Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: DeepSeek API Documentation, DeepSeek Platform, TokenMix.ai