TokenMix Research Lab · 2026-04-13

DeepSeek API Python Tutorial: Working Call in 5 Minutes (2026)

How to Use DeepSeek API in Python: Step-by-Step Tutorial with Code Examples (2026)

Last Updated: 2026-04-29
Author: TokenMix Research Lab

Per DeepSeek's official API documentation, DeepSeek implements the OpenAI-compatible API — pip install openai, set base_url="https://api.deepseek.com", and a working call runs in 3 lines.

DeepSeek's API docs confirm full OpenAI SDK compatibility, meaning every feature in this tutorial (streaming, system prompts, function calling, JSON mode) works with the unmodified openai package. New accounts get 5M free tokens at $0.27/$1.10 per million paid rates — roughly 2,100 chat calls per $1 spent. Compared to OpenAI's GPT-5.4 Mini at $0.40/$1.60, DeepSeek runs ~32% cheaper with comparable Python integration. Code examples below were tested against DeepSeek API as of 2026-04-28.

All code examples are tested against the DeepSeek API as of April 2026. TokenMix.ai maintains compatibility data for DeepSeek and 300+ other API providers.

Prerequisites: What You Need Before Starting
Step 1: Install the OpenAI Python Package
Step 2: Get Your DeepSeek API Key
Step 3: Make Your First DeepSeek API Call
Step 4: Add Streaming for Real-Time Responses
Step 5: Use System Prompts and Multi-Turn Conversations
Step 6: Enable Prompt Caching to Save Money
Step 7: Handle Errors and Rate Limits
Complete Working Example: DeepSeek Chat Application
DeepSeek API Python Quick Reference Table
FAQ

Prerequisites: What You Need Before Starting

Need only Python 3.8+, the openai pip package, and a free DeepSeek API key — per DeepSeek's signup page, no DeepSeek-specific SDK exists; the standard OpenAI client handles everything.

Before writing code, make sure you have these ready.

Requirement	Details
Python version	3.8 or higher
Package manager	pip (comes with Python)
DeepSeek account	Free signup at platform.deepseek.com
API key	Generated from DeepSeek console
Free credits	5M tokens included with new accounts

No DeepSeek-specific SDK is needed. The standard OpenAI Python package works because DeepSeek implements the OpenAI-compatible API format. This also means any code you write for DeepSeek can be switched to OpenAI, Groq, or other compatible providers by changing two lines.

Step 1: Install the OpenAI Python Package

Single command: pip install openai. Per OpenAI's Python SDK release notes, version 1.x is required for the chat completions interface used here.

Open your terminal and install the package.

pip install openai

Verify the installation:

python -c "import openai; print(openai.__version__)"

You should see version 1.x or higher. If you have an older version, upgrade:

pip install --upgrade openai

That is the only dependency. No other packages are required for basic DeepSeek API usage.

Step 2: Get Your DeepSeek API Key

Sign up at platform.deepseek.com, generate an API key from the console, store it as DEEPSEEK_API_KEY env var — never hardcode in source or commit .env files to git.

2a. Go to platform.deepseek.com and sign up or log in.

2b. Navigate to "API Keys" in the left sidebar.

2c. Click "Create new API key." Give it a descriptive name like "python-tutorial."

2d. Copy the key immediately. It will not be shown again.

2e. Store the key securely. The recommended approach is an environment variable:

# Linux/Mac
export DEEPSEEK_API_KEY="your-key-here"

# Windows (Command Prompt)
set DEEPSEEK_API_KEY=your-key-here

# Windows (PowerShell)
$env:DEEPSEEK_API_KEY="your-key-here"

Alternatively, use a .env file with python-dotenv:

pip install python-dotenv

# .env file
DEEPSEEK_API_KEY=your-key-here

Important: Never hardcode API keys in your source code. Never commit .env files to version control. Add .env to your .gitignore.

Step 3: Make Your First DeepSeek API Call

Per DeepSeek's API docs, model="deepseek-chat" maps to V4 (general purpose) and model="deepseek-reasoner" maps to R1 (reasoning, uses 3-10x more tokens for chain-of-thought).

Here is the minimal working example. Copy this, replace the API key, and run it.

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("DEEPSEEK_API_KEY"),
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-chat",  # This is DeepSeek V4
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    max_tokens=100
)

print(response.choices[0].message.content)

Expected output:

The capital of France is Paris.

What each parameter does:

Parameter	Value	Purpose
`api_key`	Your DeepSeek key	Authentication
`base_url`	`https://api.deepseek.com`	Points to DeepSeek instead of OpenAI
`model`	`deepseek-chat`	DeepSeek V4 (general purpose)
`messages`	List of message dicts	Conversation history
`max_tokens`	100	Maximum response length

Available DeepSeek models:

Model ID	Model Name	Best For
`deepseek-chat`	DeepSeek V4	General chat, content, code
`deepseek-reasoner`	DeepSeek R1	Complex reasoning, math

Use deepseek-chat for most tasks. Switch to deepseek-reasoner only when you need explicit chain-of-thought reasoning. R1 uses more tokens due to internal thinking, which consumes your free credits faster. For more details on DeepSeek pricing, see our DeepSeek API pricing guide.

Step 4: Add Streaming for Real-Time Responses

Set stream=True on the create call and iterate chunk.choices[0].delta.content — token billing is identical to non-streaming per DeepSeek's pricing docs; the perceived latency improvement is purely UX.

Streaming shows the response token by token as it generates, instead of waiting for the complete response. Essential for chat interfaces.

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("DEEPSEEK_API_KEY"),
    base_url="https://api.deepseek.com"
)

stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "user", "content": "Explain what an API is in 3 sentences."}
    ],
    max_tokens=200,
    stream=True  # Enable streaming
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

print()  # New line at the end

Key points about streaming:

Set stream=True in the create call
Response comes as chunks, not a complete message
Each chunk's content is in chunk.choices[0].delta.content
Content can be None for the final chunk, so always check
Token usage is still counted the same way (no cost difference)

Streaming is especially important for side projects with chat interfaces. Users perceive streamed responses as faster, even though total generation time is the same. See our AI API for side projects guide for more implementation patterns.

Step 5: Use System Prompts and Multi-Turn Conversations

DeepSeek does NOT maintain conversation state server-side — per DeepSeek's API docs, you must send full conversation history with every request, and every previous turn counts as input tokens billed at $0.27/M.

Real applications need system prompts (to set behavior) and multi-turn conversations (to maintain context).

System Prompt Example

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {
            "role": "system",
            "content": "You are a concise technical assistant. Answer in 2-3 sentences maximum. Use specific numbers when possible."
        },
        {
            "role": "user",
            "content": "How much does [GPT-5.4](https://tokenmix.ai/blog/gpt-5-api-pricing) Mini cost?"
        }
    ],
    max_tokens=150
)

print(response.choices[0].message.content)

Multi-Turn Conversation

conversation = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to reverse a string."},
]

# First turn
response = client.chat.completions.create(
    model="deepseek-chat",
    messages=conversation,
    max_tokens=300
)

assistant_reply = response.choices[0].message.content
print("Assistant:", assistant_reply)

# Add assistant's reply to conversation history
conversation.append({"role": "assistant", "content": assistant_reply})

# Second turn
conversation.append({"role": "user", "content": "Now make it handle Unicode characters correctly."})

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=conversation,
    max_tokens=300
)

print("Assistant:", response.choices[0].message.content)

Important: DeepSeek does not maintain conversation state server-side. You must send the full conversation history with every request. Each message in the history counts as input tokens and is billed accordingly.

Token management tip: For long conversations, trim older messages to stay within context limits and budget. Keep the system prompt and the last 5-10 messages for most use cases.

Step 6: Enable Prompt Caching to Save Money

Per DeepSeek's pricing documentation, cached input tokens drop from $0.27/M to $0.07/M (74% savings) — caching is automatic for matching 128+ token prefixes, no API parameter required.

DeepSeek supports automatic prompt caching for repeated prefixes. When the beginning of your messages (system prompt + shared context) is identical across requests, DeepSeek caches those tokens and charges a reduced rate.

# The system prompt below will be cached after the first call
system_prompt = """You are a customer support agent for an e-commerce platform.
You have access to the following policies:
1. Returns accepted within 30 days
2. Free shipping on orders over $50
3. Price matching available for 14 days after purchase
4. Loyalty points: 1 point per $1 spent
Respond professionally and concisely."""

# First call: full price for all input tokens
response1 = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "Can I return an item I bought 3 weeks ago?"}
    ],
    max_tokens=200
)

# Second call: system prompt tokens are cached (cheaper)
response2 = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "How do loyalty points work?"}
    ],
    max_tokens=200
)

How caching saves money:

Token Type	Standard Rate (V4)	Cached Rate	Savings
Input (uncached)	$0.27/M	$0.27/M	0%
Input (cached)	$0.27/M	$0.07/M	74%
Output	$1.10/M	$1.10/M	0%

Caching is automatic. If the first 128+ tokens of your message sequence match a recent request, DeepSeek applies the cached rate. No configuration needed.

For a complete guide to prompt caching across providers, see our prompt caching guide.

Step 7: Handle Errors and Rate Limits

Catch RateLimitError (429), APIConnectionError (network), and APIError (5xx) with exponential backoff (1s/2s/4s) — DeepSeek returns 402 for insufficient balance, 401 for bad keys, per DeepSeek's error code reference.

Production code needs error handling. Here are the most common errors and how to handle them.

from openai import OpenAI, APIError, RateLimitError, APIConnectionError
import time
import os

client = OpenAI(
    api_key=os.getenv("DEEPSEEK_API_KEY"),
    base_url="https://api.deepseek.com"
)

def call_deepseek(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="deepseek-chat",
                messages=messages,
                max_tokens=500
            )
            return response.choices[0].message.content

        except RateLimitError:
            wait_time = 2 ** attempt  # Exponential backoff: 1s, 2s, 4s
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)

        except APIConnectionError:
            print("Connection error. Retrying...")
            time.sleep(1)

        except APIError as e:
            print(f"API error: {e.status_code} - {e.message}")
            if e.status_code >= 500:
                time.sleep(2)  # Server error, retry
            else:
                raise  # Client error, don't retry

    raise Exception("Max retries exceeded")

# Usage
result = call_deepseek([
    {"role": "user", "content": "Hello!"}
])
print(result)

Common DeepSeek API error codes:

Error Code	Meaning	Action
400	Bad request (invalid params)	Fix your request format
401	Invalid API key	Check your DEEPSEEK_API_KEY
402	Insufficient balance	Add credits or payment method
429	Rate limit exceeded	Wait and retry with backoff
500	Server error	Retry after short delay
503	Service unavailable	Retry after longer delay

Complete Working Example: DeepSeek Chat Application

Production-ready 50-line terminal chat with streaming, conversation history, rate limit handling, and memory trimming — runs entirely on free credits at near-zero cost per DeepSeek's pricing.

Here is a complete, ready-to-run terminal chat application using DeepSeek.

"""
DeepSeek Chat - Terminal Chat Application
Requires: pip install openai
Set DEEPSEEK_API_KEY environment variable before running.
"""

from openai import OpenAI, RateLimitError
import os
import time

def create_client():
    api_key = os.getenv("DEEPSEEK_API_KEY")
    if not api_key:
        raise ValueError("Set DEEPSEEK_API_KEY environment variable")
    return OpenAI(api_key=api_key, base_url="https://api.deepseek.com")

def chat(client, messages):
    try:
        stream = client.chat.completions.create(
            model="deepseek-chat",
            messages=messages,
            max_tokens=1000,
            stream=True
        )
        full_response = ""
        for chunk in stream:
            content = chunk.choices[0].delta.content
            if content:
                print(content, end="", flush=True)
                full_response += content
        print()
        return full_response
    except RateLimitError:
        print("\nRate limited. Waiting 5 seconds...")
        time.sleep(5)
        return chat(client, messages)

def main():
    client = create_client()
    conversation = [
        {"role": "system", "content": "You are a helpful assistant. Be concise."}
    ]

    print("DeepSeek Chat (type 'quit' to exit, 'clear' to reset)")
    print("-" * 50)

    while True:
        user_input = input("\nYou: ").strip()
        if user_input.lower() == "quit":
            break
        if user_input.lower() == "clear":
            conversation = [conversation[0]]  # Keep system prompt
            print("Conversation cleared.")
            continue
        if not user_input:
            continue

        conversation.append({"role": "user", "content": user_input})
        print("\nDeepSeek: ", end="")
        reply = chat(client, conversation)
        conversation.append({"role": "assistant", "content": reply})

        # Trim conversation if it gets too long (keep last 10 exchanges)
        if len(conversation) > 21:  # system + 10 pairs
            conversation = [conversation[0]] + conversation[-20:]

if __name__ == "__main__":
    main()

To run:

export DEEPSEEK_API_KEY="your-key-here"
python deepseek_chat.py

This example includes streaming, conversation history management, rate limit handling, and memory trimming. It works within free credits and costs almost nothing on paid usage.

DeepSeek API Python Quick Reference Table

All commands here use the openai Python SDK exactly as documented in OpenAI's official Python guide — only base_url and model differ between DeepSeek and OpenAI calls.

Task	Code Snippet	Notes
Initialize client	`OpenAI(api_key=key, base_url="https://api.deepseek.com")`	Same as OpenAI SDK
Basic chat	`client.chat.completions.create(model="deepseek-chat", messages=[...])`	Returns full response
Streaming	Add `stream=True`, iterate over chunks	Real-time output
System prompt	Add `{"role": "system", "content": "..."}` as first message	Sets behavior
Multi-turn	Append all messages to conversation list	Full history each call
Set response length	Add `max_tokens=N`	Prevents runaway responses
Temperature	Add `temperature=0.7` (0.0 to 2.0)	Lower = more deterministic
JSON mode	Add `response_format={"type": "json_object"}`	Structured output
Reasoning model	Use `model="deepseek-reasoner"`	More tokens, deeper thinking

For function calling implementation with DeepSeek, see our function calling guide.

TokenMix.ai provides a unified API that lets you switch between DeepSeek, OpenAI, Claude, and other providers without changing your code beyond the base URL. Useful when you want to compare model outputs or add fallback providers.

FAQ

Does DeepSeek API work with the OpenAI Python SDK?

Yes. DeepSeek implements the OpenAI-compatible API format. Install the openai package with pip install openai, then create a client with base_url="https://api.deepseek.com" and your DeepSeek API key. All standard OpenAI SDK features (chat completions, streaming, function calling) work with DeepSeek.

What Python version do I need for the DeepSeek API?

Python 3.8 or higher is required. The OpenAI SDK version 1.0+ requires Python 3.8+. For the best experience, use Python 3.10 or newer, which provides better type hinting support and performance.

How much does it cost to use the DeepSeek API in Python?

DeepSeek V4 costs $0.27 per million input tokens and $1.10 per million output tokens. New accounts get 5 million free tokens. A typical chat interaction (500 input + 300 output tokens) costs about $0.00047 per call, meaning $1 buys roughly 2,100 API calls.

How do I switch between DeepSeek and OpenAI in my Python code?

Change two values: base_url and model. For DeepSeek, use base_url="https://api.deepseek.com" and model="deepseek-chat". For OpenAI, remove the base_url parameter and use model="gpt-5.4-mini". The rest of your code stays identical.

Does DeepSeek support function calling in Python?

Yes. DeepSeek supports the OpenAI function calling format (tools parameter). Define your functions as tool schemas, pass them in the tools parameter, and handle tool calls in the response. The implementation is identical to OpenAI's function calling.

How do I handle DeepSeek API rate limits in Python?

Implement exponential backoff: catch RateLimitError exceptions and retry after increasing delays (1s, 2s, 4s). The DeepSeek API returns a Retry-After header with rate limit errors. For production applications, use a queue-based system to throttle requests below the rate limit threshold.

Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: DeepSeek API Documentation, DeepSeek Platform, TokenMix.ai