TokenMix Research Lab · 2026-04-30

Dify OpenAI-Compatible API 2026: Workflow Model Routing

Dify OpenAI-Compatible API 2026: Workflow Model Routing

Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30

Dify can use OpenAI-compatible APIs through its OpenAI-API-compatible model provider plugin. Use this when you want Dify workflows to route through TokenMix.ai, OpenRouter, Ollama, LM Studio, or another gateway.

Dify's model provider docs say workspaces can use system providers or custom providers, and custom providers let teams use their own API keys for direct access, control, billing, and often higher rate limits. The OpenAI-API-compatible plugin page says the plugin supports OpenAI-compatible providers for LLMs, reranking, embeddings, speech-to-text, and text-to-speech, and lets developers configure model name, API key, URL, completion settings, context, token limits, streaming, and vision. The short version: Dify is a good workflow layer. It still needs a reliable model access layer.

Table of Contents

Quick Answer

Install Dify's OpenAI-API-compatible provider plugin, then configure:

Field Example
Provider OpenAI-API-compatible
API Key Your gateway key
API URL https://api.tokenmix.ai/v1 or another OpenAI-compatible base URL
Model name Gateway model ID
Model type LLM, embedding, rerank, STT, or TTS
Streaming Enable only if the endpoint supports streaming
Vision Enable only for vision-capable models

Use TokenMix.ai when you want Dify to call many hosted models through one OpenAI-compatible API instead of managing separate provider keys inside every workflow stack.

Confirmed vs Caveat

Claim Status Source / note
Dify supports custom model providers Confirmed Dify model provider docs
Dify has an OpenAI-API-compatible plugin Confirmed Dify Marketplace
The plugin supports LLMs Confirmed Plugin page
The plugin supports embeddings and reranking Confirmed Plugin page
The plugin supports STT and TTS Confirmed Plugin page
You can configure API key and URL Confirmed Plugin page
Every OpenAI-compatible provider supports every endpoint False Compatibility varies by provider
Dify replaces an API gateway No Dify builds workflows; gateway handles model access and routing

When This Setup Makes Sense

Use case Good fit? Why
Dify chatbot with GPT/Claude/Gemini fallback Yes Gateway can route models behind one provider config
Internal RAG workflow Yes Dify handles app logic; gateway handles models
Local model prototype with Ollama or LM Studio Yes OpenAI-compatible URL can point local
Production app with strict provider budgets Yes, with gateway policy Dify alone is not enough cost control
Native provider-only feature testing Maybe Direct provider integration may expose more options
High-volume low-latency serving Depends Measure Dify workflow overhead plus gateway latency

The clean architecture is:

Dify workflow -> OpenAI-compatible model provider -> TokenMix.ai / gateway -> model provider

Dify Configuration Fields

Field What to enter Common mistake
Type LLM, embedding, rerank, STT, TTS Picking LLM for embedding models
Name Human-readable model alias Using different names across workflows
API Key Gateway or provider key Pasting direct provider key into wrong gateway
URL Base URL, usually ending in /v1 Adding /chat/completions instead of base URL
Completion mode Chat/completion behavior Using completion-only mode for chat models
Context length Model context limit Setting larger limit than provider supports
Max tokens Output token cap Letting outputs run into expensive defaults
Streaming On/off Enabling streaming on unsupported endpoint
Vision On/off Enabling vision for text-only model

For gateway use, the URL should usually be the base path, not the full endpoint.

TokenMix.ai Setup Example

Use this pattern when Dify should call TokenMix.ai as its model gateway.

Dify field Example value
Provider plugin OpenAI-API-compatible
API URL https://api.tokenmix.ai/v1
API Key TOKENMIX_API_KEY
Model name A TokenMix-supported model ID
Model type LLM
Streaming Enable after testing
Vision Enable only for multimodal models

Example chat test payload shape:

{
  "model": "your-model-id",
  "messages": [
    {
      "role": "user",
      "content": "Summarize this customer ticket and propose a reply."
    }
  ],
  "stream": true
}

Why TokenMix.ai fits Dify:

Need Why TokenMix.ai helps
One model access layer Dify connects once, gateway routes many models
OpenAI-compatible endpoint Less custom plugin work
Multi-provider model choice Use GPT, Claude, Gemini, DeepSeek, and open models
Payment flexibility Useful when direct provider billing is a blocker
Internal workflow scale Keep model routing out of individual Dify apps

OpenRouter And Local Model Examples

Provider route Base URL example Best for
TokenMix.ai https://api.tokenmix.ai/v1 Hosted multi-model API access
OpenRouter https://openrouter.ai/api/v1 Broad model catalog exploration
Ollama http://localhost:11434/v1 Local models and private testing
LM Studio http://localhost:1234/v1 Desktop local model testing
SGLang http://localhost:30000/v1 Self-hosted high-throughput serving
TGI Hugging Face endpoint URL ending in /v1 Hugging Face model serving

Use Ollama OpenAI-compatible API, SGLang OpenAI-compatible API, and Text Generation Inference OpenAI-compatible API as the setup references for local or self-hosted backends.

LLM, Embeddings, Rerank, And Speech

Dify workflows often need more than chat completion.

Model type Dify need Gateway caveat
LLM Chat, agents, workflow nodes Tool calling and streaming vary
Embedding Knowledge base indexing Endpoint must support /embeddings
Rerank Retrieval quality improvement Not every OpenAI-compatible gateway supports rerank
STT Voice input workflows Audio endpoint compatibility varies
TTS Voice output Voice list and audio format vary
Vision Image input workflows Enable only on multimodal models

Do not assume that "OpenAI-compatible" means "all OpenAI endpoints are implemented." Confirm each endpoint type.

Dify vs TokenMix.ai vs LiteLLM

Layer Dify TokenMix.ai LiteLLM
Main role Workflow/app builder Hosted model API gateway Self-hosted proxy/gateway
Best for Chatbots, agents, RAG workflows Multi-model hosted access Internal platform control
OpenAI-compatible input Through plugin/provider Native API surface Native proxy surface
Routing App/workflow logic Gateway routing Self-managed routing
Provider keys Stored in Dify provider config Stored in gateway account Stored in proxy config
Operations burden Dify app ops Low for model access Higher

Dify and TokenMix.ai are complementary. Dify runs the workflow. TokenMix.ai supplies the model access layer.

Cost And Routing Math

Cost calculation 1: bad default model

Assume a Dify workflow has 10,000 runs/month and each run uses 5,000 input tokens plus 1,000 output tokens.

Model policy Relative model cost Monthly relative cost
All premium model 8x 8.0x
Cheap-first, premium fallback 20% Mixed 2.4x
Cheap-first, premium fallback 10% Mixed 1.7x
Manual per-workflow model choice Varies Depends on discipline

The gateway policy matters because Dify workflows are easy to duplicate. One expensive default can spread across many apps.

Cost calculation 2: embedding spend

Knowledge base size Embedding tokens Risk
Small docs 1M Easy to re-index
Medium docs 100M Re-indexing mistakes are visible
Large support corpus 1B+ Embedding model choice becomes material

Keep embedding models separate from chat models. Do not route embeddings through a chat-only endpoint.

Cost calculation 3: local vs hosted

Option Direct cost Hidden cost
Local Ollama Low token bill Hardware, uptime, latency
Self-hosted SGLang/TGI GPU cost DevOps and scaling
OpenRouter Token + platform policy Provider variation
TokenMix.ai Gateway model pricing External dependency

For production Dify workflows, total cost includes tokens, retries, workflow failures, and maintenance time.

Troubleshooting

Symptom Likely cause Fix
Authentication failed Wrong API key or provider Recheck gateway key and Dify provider config
404 model not found Model name mismatch Use exact gateway model ID
404 endpoint not found URL includes endpoint path Use base URL ending in /v1
Streaming fails Gateway or model does not support streaming Disable streaming or switch model
Vision fails Text-only model selected Use multimodal model and enable vision
Embedding fails Chat endpoint used for embedding model Add embedding model type separately
High cost Premium model set as default Add cheap-first routing policy
Slow workflow Gateway plus model latency Test p95 latency per model route

Production Checklist

Check Why
Use separate providers for chat and embeddings Prevent endpoint mismatch
Test streaming before enabling it in user-facing apps Avoid broken UI streams
Pin model IDs for critical workflows Prevent silent behavior changes
Add fallback only after measuring output quality Fallback can change answer style
Track workflow-level cost Model-level cost is not enough
Keep API keys out of shared screenshots and exports Dify provider configs can leak operational secrets
Document which workflows use which gateway models Prevent uncontrolled model drift
Link Dify workflows to an API gateway policy Better cost and reliability control

Final Recommendation

Use Dify with an OpenAI-compatible API when Dify is your workflow builder and you want model access to stay flexible. Use TokenMix.ai when you want one hosted gateway for GPT, Claude, Gemini, DeepSeek, and open models.

Do not let every Dify app owner pick random model providers. Centralize model access first. Then let Dify handle workflow logic.

FAQ

Does Dify support OpenAI-compatible APIs?

Yes. Dify has an OpenAI-API-compatible provider plugin that can connect to OpenAI-compatible model providers and gateways.

What URL should I put in Dify for an OpenAI-compatible API?

Use the provider's base URL, usually ending in /v1, such as https://api.tokenmix.ai/v1. Do not paste the full /chat/completions endpoint.

Can Dify use TokenMix.ai?

Yes. Configure TokenMix.ai as an OpenAI-compatible provider in Dify, using the TokenMix API URL, API key, and supported model ID.

Can Dify use OpenRouter?

Yes. Use the OpenAI-API-compatible plugin with https://openrouter.ai/api/v1, an OpenRouter API key, and the exact OpenRouter model ID.

Can Dify use local models?

Yes, if the local server exposes an OpenAI-compatible API. Ollama, LM Studio, SGLang, and TGI can all be used when configured correctly.

Why does my Dify model return 404?

The most common causes are a wrong base URL, a model ID mismatch, or using a provider that does not implement the endpoint Dify is calling.

Should I use Dify or LiteLLM?

Dify is a workflow/app builder. LiteLLM is a self-hosted model proxy. Use Dify for app logic, and use LiteLLM or TokenMix.ai for model access depending on whether you want self-hosting or hosted gateway access.

Is OpenAI-compatible enough for embeddings and speech?

Not always. Many providers support chat but not embeddings, rerank, STT, or TTS. Configure and test each model type separately.

Related Articles

Sources