How to Use DeepSeek API in Python: Step-by-Step Tutorial with Code Examples

TokenMix Research Lab · 2026-04-13

How to Use DeepSeek API in Python: Step-by-Step Tutorial with Code Examples (2026)

This tutorial gets you from zero to a working DeepSeek API call in under 5 minutes. DeepSeek uses an OpenAI-compatible API, which means you use the `openai` Python package with a different base URL. This guide covers setup, first call, [streaming](https://tokenmix.ai/blog/ai-api-streaming-guide), prompt caching, and real code examples you can copy directly.

All code examples are tested against the DeepSeek API as of April 2026. TokenMix.ai maintains compatibility data for DeepSeek and 300+ other API providers.

[Prerequisites: What You Need Before Starting]
[Step 1: Install the OpenAI Python Package]
[Step 2: Get Your DeepSeek API Key]
[Step 3: Make Your First DeepSeek API Call]
[Step 4: Add Streaming for Real-Time Responses]
[Step 5: Use System Prompts and Multi-Turn Conversations]
[Step 6: Enable Prompt Caching to Save Money]
[Step 7: Handle Errors and Rate Limits]
[Complete Working Example: DeepSeek Chat Application]
[DeepSeek API Python Quick Reference Table]
[FAQ]

---

Prerequisites: What You Need Before Starting

Before writing code, make sure you have these ready.

| Requirement | Details | |-------------|---------| | Python version | 3.8 or higher | | Package manager | pip (comes with Python) | | DeepSeek account | Free signup at platform.deepseek.com | | API key | Generated from DeepSeek console | | Free credits | 5M tokens included with new accounts |

No DeepSeek-specific SDK is needed. The standard OpenAI Python package works because DeepSeek implements the OpenAI-compatible API format. This also means any code you write for DeepSeek can be switched to OpenAI, [Groq](https://tokenmix.ai/blog/groq-api-pricing), or other compatible providers by changing two lines.

Step 1: Install the OpenAI Python Package

Open your terminal and install the package.

Verify the installation:

You should see version 1.x or higher. If you have an older version, upgrade:

That is the only dependency. No other packages are required for basic DeepSeek API usage.

Step 2: Get Your DeepSeek API Key

**2a.** Go to [platform.deepseek.com](https://platform.deepseek.com) and sign up or log in.

**2b.** Navigate to "API Keys" in the left sidebar.

**2c.** Click "Create new API key." Give it a descriptive name like "python-tutorial."

**2d.** Copy the key immediately. It will not be shown again.

**2e.** Store the key securely. The recommended approach is an environment variable:

Windows (Command Prompt)

Windows (PowerShell)

Alternatively, use a `.env` file with `python-dotenv`:

**Important:** Never hardcode API keys in your source code. Never commit `.env` files to version control. Add `.env` to your `.gitignore`.

Step 3: Make Your First DeepSeek API Call

Here is the minimal working example. Copy this, replace the API key, and run it.

client = OpenAI( api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com" )

response = client.chat.completions.create( model="deepseek-chat", # This is DeepSeek V4 messages=[ {"role": "user", "content": "What is the capital of France?"} ], max_tokens=100 )

print(response.choices[0].message.content) ```

**Expected output:** ``` The capital of France is Paris. ```

**What each parameter does:**

| Parameter | Value | Purpose | |-----------|-------|---------| | `api_key` | Your DeepSeek key | Authentication | | `base_url` | `https://api.deepseek.com` | Points to DeepSeek instead of OpenAI | | `model` | `deepseek-chat` | DeepSeek V4 (general purpose) | | `messages` | List of message dicts | Conversation history | | `max_tokens` | 100 | Maximum response length |

**Available DeepSeek models:**

| Model ID | Model Name | Best For | |----------|-----------|---------| | `deepseek-chat` | DeepSeek V4 | General chat, content, code | | `deepseek-reasoner` | DeepSeek R1 | Complex reasoning, math |

Use `deepseek-chat` for most tasks. Switch to `deepseek-reasoner` only when you need explicit [chain-of-thought](https://tokenmix.ai/blog/chain-of-thought-prompting) reasoning. R1 uses more tokens due to internal thinking, which consumes your free credits faster. For more details on DeepSeek pricing, see our [DeepSeek API pricing guide](https://tokenmix.ai/blog/deepseek-api-pricing).

Step 4: Add Streaming for Real-Time Responses

Streaming shows the response token by token as it generates, instead of waiting for the complete response. Essential for chat interfaces.

client = OpenAI( api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com" )

stream = client.chat.completions.create( model="deepseek-chat", messages=[ {"role": "user", "content": "Explain what an API is in 3 sentences."} ], max_tokens=200, stream=True # Enable streaming )

for chunk in stream: if chunk.choices[0].delta.content is not None: print(chunk.choices[0].delta.content, end="", flush=True)

print() # New line at the end ```

**Key points about streaming:** - Set `stream=True` in the create call - Response comes as chunks, not a complete message - Each chunk's content is in `chunk.choices[0].delta.content` - Content can be `None` for the final chunk, so always check - Token usage is still counted the same way (no cost difference)

Streaming is especially important for side projects with chat interfaces. Users perceive streamed responses as faster, even though total generation time is the same. See our [AI API for side projects guide](https://tokenmix.ai/blog/ai-api-for-side-projects) for more implementation patterns.

Step 5: Use System Prompts and Multi-Turn Conversations

Real applications need system prompts (to set behavior) and multi-turn conversations (to maintain context).

System Prompt Example

print(response.choices[0].message.content) ```

Multi-Turn Conversation

First turn

assistant_reply = response.choices[0].message.content print("Assistant:", assistant_reply)

Add assistant's reply to conversation history

Second turn

response = client.chat.completions.create( model="deepseek-chat", messages=conversation, max_tokens=300 )

print("Assistant:", response.choices[0].message.content) ```

**Important:** DeepSeek does not maintain conversation state server-side. You must send the full conversation history with every request. Each message in the history counts as input tokens and is billed accordingly.

**Token management tip:** For long conversations, trim older messages to stay within context limits and budget. Keep the system prompt and the last 5-10 messages for most use cases.

Step 6: Enable Prompt Caching to Save Money

DeepSeek supports automatic prompt caching for repeated prefixes. When the beginning of your messages (system prompt + shared context) is identical across requests, DeepSeek caches those tokens and charges a reduced rate.

First call: full price for all input tokens

Second call: system prompt tokens are cached (cheaper)

**How caching saves money:**

| Token Type | Standard Rate (V4) | Cached Rate | Savings | |-----------|:---:|:---:|:---:| | Input (uncached) | $0.27/M | $0.27/M | 0% | | Input (cached) | $0.27/M | $0.07/M | 74% | | Output | $1.10/M | $1.10/M | 0% |

Caching is automatic. If the first 128+ tokens of your message sequence match a recent request, DeepSeek applies the cached rate. No configuration needed.

For a complete guide to prompt caching across providers, see our [prompt caching guide](https://tokenmix.ai/blog/prompt-caching-guide).

Step 7: Handle Errors and Rate Limits

Production code needs error handling. Here are the most common errors and how to handle them.

client = OpenAI( api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com" )

def call_deepseek(messages, max_retries=3): for attempt in range(max_retries): try: response = client.chat.completions.create( model="deepseek-chat", messages=messages, max_tokens=500 ) return response.choices[0].message.content

except RateLimitError: wait_time = 2 ** attempt # Exponential backoff: 1s, 2s, 4s print(f"Rate limited. Waiting {wait_time}s...") time.sleep(wait_time)

except APIConnectionError: print("Connection error. Retrying...") time.sleep(1)

except APIError as e: print(f"API error: {e.status_code} - {e.message}") if e.status_code >= 500: time.sleep(2) # Server error, retry else: raise # Client error, don't retry

raise Exception("Max retries exceeded")

Usage

**Common DeepSeek API error codes:**

Complete Working Example: DeepSeek Chat Application

Here is a complete, ready-to-run terminal chat application using DeepSeek.

from openai import OpenAI, RateLimitError import os import time

def create_client(): api_key = os.getenv("DEEPSEEK_API_KEY") if not api_key: raise ValueError("Set DEEPSEEK_API_KEY environment variable") return OpenAI(api_key=api_key, base_url="https://api.deepseek.com")

def chat(client, messages): try: stream = client.chat.completions.create( model="deepseek-chat", messages=messages, max_tokens=1000, stream=True ) full_response = "" for chunk in stream: content = chunk.choices[0].delta.content if content: print(content, end="", flush=True) full_response += content print() return full_response except RateLimitError: print("\nRate limited. Waiting 5 seconds...") time.sleep(5) return chat(client, messages)

def main(): client = create_client() conversation = [ {"role": "system", "content": "You are a helpful assistant. Be concise."} ]

print("DeepSeek Chat (type 'quit' to exit, 'clear' to reset)") print("-" * 50)

while True: user_input = input("\nYou: ").strip() if user_input.lower() == "quit": break if user_input.lower() == "clear": conversation = [conversation[0]] # Keep system prompt print("Conversation cleared.") continue if not user_input: continue

conversation.append({"role": "user", "content": user_input}) print("\nDeepSeek: ", end="") reply = chat(client, conversation) conversation.append({"role": "assistant", "content": reply})

Trim conversation if it gets too long (keep last 10 exchanges)

if __name__ == "__main__": main() ```

**To run:** ```bash export DEEPSEEK_API_KEY="your-key-here" python deepseek_chat.py ```

This example includes streaming, conversation history management, rate limit handling, and memory trimming. It works within free credits and costs almost nothing on paid usage.

DeepSeek API Python Quick Reference Table

| Task | Code Snippet | Notes | |------|-------------|-------| | Initialize client | `OpenAI(api_key=key, base_url="https://api.deepseek.com")` | Same as OpenAI SDK | | Basic chat | `client.chat.completions.create(model="deepseek-chat", messages=[...])` | Returns full response | | Streaming | Add `stream=True`, iterate over chunks | Real-time output | | System prompt | Add `{"role": "system", "content": "..."}` as first message | Sets behavior | | Multi-turn | Append all messages to conversation list | Full history each call | | Set response length | Add `max_tokens=N` | Prevents runaway responses | | Temperature | Add `temperature=0.7` (0.0 to 2.0) | Lower = more deterministic | | JSON mode | Add `response_format={"type": "json_object"}` | Structured output | | Reasoning model | Use `model="deepseek-reasoner"` | More tokens, deeper thinking |

For function calling implementation with DeepSeek, see our [function calling guide](https://tokenmix.ai/blog/function-calling-guide).

TokenMix.ai provides a unified API that lets you switch between DeepSeek, OpenAI, Claude, and other providers without changing your code beyond the base URL. Useful when you want to compare model outputs or add fallback providers.

FAQ

Does DeepSeek API work with the OpenAI Python SDK?

Yes. DeepSeek implements the OpenAI-compatible API format. Install the `openai` package with `pip install openai`, then create a client with `base_url="https://api.deepseek.com"` and your DeepSeek API key. All standard OpenAI SDK features (chat completions, streaming, function calling) work with DeepSeek.

What Python version do I need for the DeepSeek API?

Python 3.8 or higher is required. The OpenAI SDK version 1.0+ requires Python 3.8+. For the best experience, use Python 3.10 or newer, which provides better type hinting support and performance.

How much does it cost to use the DeepSeek API in Python?

DeepSeek V4 costs $0.27 per million input tokens and $1.10 per million output tokens. New accounts get 5 million free tokens. A typical chat interaction (500 input + 300 output tokens) costs about $0.00047 per call, meaning $1 buys roughly 2,100 API calls.

How do I switch between DeepSeek and OpenAI in my Python code?

Change two values: `base_url` and `model`. For DeepSeek, use `base_url="https://api.deepseek.com"` and `model="deepseek-chat"`. For OpenAI, remove the base_url parameter and use `model="gpt-5.4-mini"`. The rest of your code stays identical.

Does DeepSeek support function calling in Python?

Yes. DeepSeek supports the OpenAI function calling format (tools parameter). Define your functions as tool schemas, pass them in the `tools` parameter, and handle tool calls in the response. The implementation is identical to OpenAI's function calling.

How do I handle DeepSeek API rate limits in Python?

Implement exponential backoff: catch `RateLimitError` exceptions and retry after increasing delays (1s, 2s, 4s). The DeepSeek API returns a `Retry-After` header with rate limit errors. For production applications, use a queue-based system to throttle requests below the rate limit threshold.

---

*Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: [DeepSeek API Documentation](https://api-docs.deepseek.com), [DeepSeek Platform](https://platform.deepseek.com), [TokenMix.ai](https://tokenmix.ai)*

How to Use DeepSeek API in Python: Step-by-Step Tutorial with Code Examples

How to Use DeepSeek API in Python: Step-by-Step Tutorial with Code Examples (2026)

Table of Contents

Prerequisites: What You Need Before Starting

Step 1: Install the OpenAI Python Package

Step 2: Get Your DeepSeek API Key

Windows (Command Prompt)

Windows (PowerShell)

Step 3: Make Your First DeepSeek API Call

Step 4: Add Streaming for Real-Time Responses

Step 5: Use System Prompts and Multi-Turn Conversations

System Prompt Example

Multi-Turn Conversation

First turn

Add assistant's reply to conversation history

Second turn

Step 6: Enable Prompt Caching to Save Money

First call: full price for all input tokens

Second call: system prompt tokens are cached (cheaper)

Step 7: Handle Errors and Rate Limits

Usage

Complete Working Example: DeepSeek Chat Application

Trim conversation if it gets too long (keep last 10 exchanges)

DeepSeek API Python Quick Reference Table

FAQ

Does DeepSeek API work with the OpenAI Python SDK?

What Python version do I need for the DeepSeek API?

How much does it cost to use the DeepSeek API in Python?

How do I switch between DeepSeek and OpenAI in my Python code?

Does DeepSeek support function calling in Python?

How do I handle DeepSeek API rate limits in Python?