TokenMix Research Lab · 2026-04-30

Dify OpenAI-Compatible API 2026: Workflow Model Routing
Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30
Dify can use OpenAI-compatible APIs through its OpenAI-API-compatible model provider plugin. Use this when you want Dify workflows to route through TokenMix.ai, OpenRouter, Ollama, LM Studio, or another gateway.
Dify's model provider docs say workspaces can use system providers or custom providers, and custom providers let teams use their own API keys for direct access, control, billing, and often higher rate limits. The OpenAI-API-compatible plugin page says the plugin supports OpenAI-compatible providers for LLMs, reranking, embeddings, speech-to-text, and text-to-speech, and lets developers configure model name, API key, URL, completion settings, context, token limits, streaming, and vision. The short version: Dify is a good workflow layer. It still needs a reliable model access layer.
Table of Contents
- Quick Answer
- Confirmed vs Caveat
- When This Setup Makes Sense
- Dify Configuration Fields
- TokenMix.ai Setup Example
- OpenRouter And Local Model Examples
- LLM, Embeddings, Rerank, And Speech
- Dify vs TokenMix.ai vs LiteLLM
- Cost And Routing Math
- Troubleshooting
- Production Checklist
- Final Recommendation
- FAQ
- Related Articles
- Sources
Quick Answer
Install Dify's OpenAI-API-compatible provider plugin, then configure:
| Field | Example |
|---|---|
| Provider | OpenAI-API-compatible |
| API Key | Your gateway key |
| API URL | https://api.tokenmix.ai/v1 or another OpenAI-compatible base URL |
| Model name | Gateway model ID |
| Model type | LLM, embedding, rerank, STT, or TTS |
| Streaming | Enable only if the endpoint supports streaming |
| Vision | Enable only for vision-capable models |
Use TokenMix.ai when you want Dify to call many hosted models through one OpenAI-compatible API instead of managing separate provider keys inside every workflow stack.
Confirmed vs Caveat
| Claim | Status | Source / note |
|---|---|---|
| Dify supports custom model providers | Confirmed | Dify model provider docs |
| Dify has an OpenAI-API-compatible plugin | Confirmed | Dify Marketplace |
| The plugin supports LLMs | Confirmed | Plugin page |
| The plugin supports embeddings and reranking | Confirmed | Plugin page |
| The plugin supports STT and TTS | Confirmed | Plugin page |
| You can configure API key and URL | Confirmed | Plugin page |
| Every OpenAI-compatible provider supports every endpoint | False | Compatibility varies by provider |
| Dify replaces an API gateway | No | Dify builds workflows; gateway handles model access and routing |
When This Setup Makes Sense
| Use case | Good fit? | Why |
|---|---|---|
| Dify chatbot with GPT/Claude/Gemini fallback | Yes | Gateway can route models behind one provider config |
| Internal RAG workflow | Yes | Dify handles app logic; gateway handles models |
| Local model prototype with Ollama or LM Studio | Yes | OpenAI-compatible URL can point local |
| Production app with strict provider budgets | Yes, with gateway policy | Dify alone is not enough cost control |
| Native provider-only feature testing | Maybe | Direct provider integration may expose more options |
| High-volume low-latency serving | Depends | Measure Dify workflow overhead plus gateway latency |
The clean architecture is:
Dify workflow -> OpenAI-compatible model provider -> TokenMix.ai / gateway -> model provider
Dify Configuration Fields
| Field | What to enter | Common mistake |
|---|---|---|
| Type | LLM, embedding, rerank, STT, TTS | Picking LLM for embedding models |
| Name | Human-readable model alias | Using different names across workflows |
| API Key | Gateway or provider key | Pasting direct provider key into wrong gateway |
| URL | Base URL, usually ending in /v1 |
Adding /chat/completions instead of base URL |
| Completion mode | Chat/completion behavior | Using completion-only mode for chat models |
| Context length | Model context limit | Setting larger limit than provider supports |
| Max tokens | Output token cap | Letting outputs run into expensive defaults |
| Streaming | On/off | Enabling streaming on unsupported endpoint |
| Vision | On/off | Enabling vision for text-only model |
For gateway use, the URL should usually be the base path, not the full endpoint.
TokenMix.ai Setup Example
Use this pattern when Dify should call TokenMix.ai as its model gateway.
| Dify field | Example value |
|---|---|
| Provider plugin | OpenAI-API-compatible |
| API URL | https://api.tokenmix.ai/v1 |
| API Key | TOKENMIX_API_KEY |
| Model name | A TokenMix-supported model ID |
| Model type | LLM |
| Streaming | Enable after testing |
| Vision | Enable only for multimodal models |
Example chat test payload shape:
{
"model": "your-model-id",
"messages": [
{
"role": "user",
"content": "Summarize this customer ticket and propose a reply."
}
],
"stream": true
}
Why TokenMix.ai fits Dify:
| Need | Why TokenMix.ai helps |
|---|---|
| One model access layer | Dify connects once, gateway routes many models |
| OpenAI-compatible endpoint | Less custom plugin work |
| Multi-provider model choice | Use GPT, Claude, Gemini, DeepSeek, and open models |
| Payment flexibility | Useful when direct provider billing is a blocker |
| Internal workflow scale | Keep model routing out of individual Dify apps |
OpenRouter And Local Model Examples
| Provider route | Base URL example | Best for |
|---|---|---|
| TokenMix.ai | https://api.tokenmix.ai/v1 |
Hosted multi-model API access |
| OpenRouter | https://openrouter.ai/api/v1 |
Broad model catalog exploration |
| Ollama | http://localhost:11434/v1 |
Local models and private testing |
| LM Studio | http://localhost:1234/v1 |
Desktop local model testing |
| SGLang | http://localhost:30000/v1 |
Self-hosted high-throughput serving |
| TGI | Hugging Face endpoint URL ending in /v1 |
Hugging Face model serving |
Use Ollama OpenAI-compatible API, SGLang OpenAI-compatible API, and Text Generation Inference OpenAI-compatible API as the setup references for local or self-hosted backends.
LLM, Embeddings, Rerank, And Speech
Dify workflows often need more than chat completion.
| Model type | Dify need | Gateway caveat |
|---|---|---|
| LLM | Chat, agents, workflow nodes | Tool calling and streaming vary |
| Embedding | Knowledge base indexing | Endpoint must support /embeddings |
| Rerank | Retrieval quality improvement | Not every OpenAI-compatible gateway supports rerank |
| STT | Voice input workflows | Audio endpoint compatibility varies |
| TTS | Voice output | Voice list and audio format vary |
| Vision | Image input workflows | Enable only on multimodal models |
Do not assume that "OpenAI-compatible" means "all OpenAI endpoints are implemented." Confirm each endpoint type.
Dify vs TokenMix.ai vs LiteLLM
| Layer | Dify | TokenMix.ai | LiteLLM |
|---|---|---|---|
| Main role | Workflow/app builder | Hosted model API gateway | Self-hosted proxy/gateway |
| Best for | Chatbots, agents, RAG workflows | Multi-model hosted access | Internal platform control |
| OpenAI-compatible input | Through plugin/provider | Native API surface | Native proxy surface |
| Routing | App/workflow logic | Gateway routing | Self-managed routing |
| Provider keys | Stored in Dify provider config | Stored in gateway account | Stored in proxy config |
| Operations burden | Dify app ops | Low for model access | Higher |
Dify and TokenMix.ai are complementary. Dify runs the workflow. TokenMix.ai supplies the model access layer.
Cost And Routing Math
Cost calculation 1: bad default model
Assume a Dify workflow has 10,000 runs/month and each run uses 5,000 input tokens plus 1,000 output tokens.
| Model policy | Relative model cost | Monthly relative cost |
|---|---|---|
| All premium model | 8x | 8.0x |
| Cheap-first, premium fallback 20% | Mixed | 2.4x |
| Cheap-first, premium fallback 10% | Mixed | 1.7x |
| Manual per-workflow model choice | Varies | Depends on discipline |
The gateway policy matters because Dify workflows are easy to duplicate. One expensive default can spread across many apps.
Cost calculation 2: embedding spend
| Knowledge base size | Embedding tokens | Risk |
|---|---|---|
| Small docs | 1M | Easy to re-index |
| Medium docs | 100M | Re-indexing mistakes are visible |
| Large support corpus | 1B+ | Embedding model choice becomes material |
Keep embedding models separate from chat models. Do not route embeddings through a chat-only endpoint.
Cost calculation 3: local vs hosted
| Option | Direct cost | Hidden cost |
|---|---|---|
| Local Ollama | Low token bill | Hardware, uptime, latency |
| Self-hosted SGLang/TGI | GPU cost | DevOps and scaling |
| OpenRouter | Token + platform policy | Provider variation |
| TokenMix.ai | Gateway model pricing | External dependency |
For production Dify workflows, total cost includes tokens, retries, workflow failures, and maintenance time.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| Authentication failed | Wrong API key or provider | Recheck gateway key and Dify provider config |
| 404 model not found | Model name mismatch | Use exact gateway model ID |
| 404 endpoint not found | URL includes endpoint path | Use base URL ending in /v1 |
| Streaming fails | Gateway or model does not support streaming | Disable streaming or switch model |
| Vision fails | Text-only model selected | Use multimodal model and enable vision |
| Embedding fails | Chat endpoint used for embedding model | Add embedding model type separately |
| High cost | Premium model set as default | Add cheap-first routing policy |
| Slow workflow | Gateway plus model latency | Test p95 latency per model route |
Production Checklist
| Check | Why |
|---|---|
| Use separate providers for chat and embeddings | Prevent endpoint mismatch |
| Test streaming before enabling it in user-facing apps | Avoid broken UI streams |
| Pin model IDs for critical workflows | Prevent silent behavior changes |
| Add fallback only after measuring output quality | Fallback can change answer style |
| Track workflow-level cost | Model-level cost is not enough |
| Keep API keys out of shared screenshots and exports | Dify provider configs can leak operational secrets |
| Document which workflows use which gateway models | Prevent uncontrolled model drift |
| Link Dify workflows to an API gateway policy | Better cost and reliability control |
Final Recommendation
Use Dify with an OpenAI-compatible API when Dify is your workflow builder and you want model access to stay flexible. Use TokenMix.ai when you want one hosted gateway for GPT, Claude, Gemini, DeepSeek, and open models.
Do not let every Dify app owner pick random model providers. Centralize model access first. Then let Dify handle workflow logic.
FAQ
Does Dify support OpenAI-compatible APIs?
Yes. Dify has an OpenAI-API-compatible provider plugin that can connect to OpenAI-compatible model providers and gateways.
What URL should I put in Dify for an OpenAI-compatible API?
Use the provider's base URL, usually ending in /v1, such as https://api.tokenmix.ai/v1. Do not paste the full /chat/completions endpoint.
Can Dify use TokenMix.ai?
Yes. Configure TokenMix.ai as an OpenAI-compatible provider in Dify, using the TokenMix API URL, API key, and supported model ID.
Can Dify use OpenRouter?
Yes. Use the OpenAI-API-compatible plugin with https://openrouter.ai/api/v1, an OpenRouter API key, and the exact OpenRouter model ID.
Can Dify use local models?
Yes, if the local server exposes an OpenAI-compatible API. Ollama, LM Studio, SGLang, and TGI can all be used when configured correctly.
Why does my Dify model return 404?
The most common causes are a wrong base URL, a model ID mismatch, or using a provider that does not implement the endpoint Dify is calling.
Should I use Dify or LiteLLM?
Dify is a workflow/app builder. LiteLLM is a self-hosted model proxy. Use Dify for app logic, and use LiteLLM or TokenMix.ai for model access depending on whether you want self-hosting or hosted gateway access.
Is OpenAI-compatible enough for embeddings and speech?
Not always. Many providers support chat but not embeddings, rerank, STT, or TTS. Configure and test each model type separately.
Related Articles
- OpenAI-Compatible API Guide 2026: SDK, Providers, Pricing
- LLM API Gateway Guide: Routing, Fallbacks, Cost Control
- Unified AI API Gateway Comparison 2026
- Ollama OpenAI-Compatible API: Local Setup Guide
- SGLang OpenAI-Compatible API 2026: Server Setup Guide
- Text Generation Inference OpenAI-Compatible API 2026 Guide
- LiteLLM Alternatives 2026: AI Gateway Options Compared
Sources
- Dify model providers: https://docs.dify.ai/en/guides/model-configuration/customizable-model
- Dify OpenAI-API-compatible plugin: https://marketplace.dify.ai/plugins/langgenius/openai_api_compatible
- Dify model provider plugin development: https://docs.dify.ai/plugins/quick-start/develop-plugins/model-plugin/create-model-providers
- Dify model configuration docs: https://docs.dify.ai/versions/3-0-x/en/user-guide/model-configuration/readme