TokenMix Research Lab · 2026-04-24
LiteLLM + Gemini 3 Setup: Complete Integration Guide
LiteLLM is the most popular OpenAI-compatible proxy for routing across 100+ LLM providers — and integrating Gemini 3.1 Pro through LiteLLM takes 3 config lines. This guide covers the complete setup (native Gemini Python SDK vs LiteLLM proxy vs TokenMix.ai gateway), Gemini 3 tool use / function calling through LiteLLM, streaming support, image input, and the 5 common integration errors and their fixes. Verified against LiteLLM 1.50.0+ and google-genai 0.10.0 as of April 24, 2026. By the end, you'll have Gemini 3.1 Pro responding through the OpenAI SDK with zero code changes to your application.
Table of Contents
- Confirmed vs Speculation
- Three Integration Paths
- LiteLLM Setup in 5 Minutes
- Tool Use / Function Calling
- Streaming Support
- Image Input
- Common Errors & Fixes
- FAQ
Confirmed vs Speculation
| Claim | Status | Source |
|---|---|---|
| LiteLLM supports Gemini natively | Confirmed | LiteLLM docs |
Gemini 3.1 Pro model ID gemini-3.1-pro |
Confirmed | Google docs |
| Function calling works through LiteLLM | Confirmed | v1.45+ |
| Vision (image input) supported | Confirmed | |
| Streaming works in LiteLLM proxy mode | Confirmed | HTTP chunked |
| Gemini 3 native tool use is OpenAI-shape | Yes with translation | |
| Rate limit headers preserved | Partial — LiteLLM rewrites some |
Snapshot note (2026-04-24): LiteLLM 1.50.0 and google-genai 0.10.0 version numbers are the versions verified at snapshot. Both projects release frequently — check the latest version's release notes for breaking changes before pinning a dependency. Gemini 3.1 Pro is the current flagship; newer Gemini releases (e.g., a 3.2 line) will slot into the same
gemini/model-idpattern without config changes.
Three Integration Paths
Path 1 — Google native SDK (direct):
pip install google-genai- Call
genai.Client()directly - Pros: best latency, official support
- Cons: not OpenAI-compatible, code ties to Gemini
Path 2 — LiteLLM (local proxy):
- Run LiteLLM as a local proxy
- Call via OpenAI SDK pointing at LiteLLM
- Pros: code stays OpenAI-compatible, swap providers via config
- Cons: additional process, slight latency
Path 3 — TokenMix.ai gateway:
- OpenAI-compatible endpoint, no local proxy
- Multi-provider routing built-in
- Pros: simplest, production-ready
- Cons: external dependency
For exploration and local dev, LiteLLM. For production, TokenMix.ai or similar gateway.
LiteLLM Setup in 5 Minutes
Step 1 — Install:
pip install 'litellm[proxy]'
Step 2 — Get Gemini API key: From ai.google.dev → Get API key → save as environment variable
export GEMINI_API_KEY="your_key"
Step 3 — Config file (litellm_config.yaml):
model_list:
- model_name: gemini-3.1-pro
litellm_params:
model: gemini/gemini-3.1-pro
api_key: os.environ/GEMINI_API_KEY
- model_name: gemini-2.5-flash
litellm_params:
model: gemini/gemini-2.5-flash
api_key: os.environ/GEMINI_API_KEY
general_settings:
master_key: sk-your-master-key # for client auth to LiteLLM
Step 4 — Start proxy:
litellm --config litellm_config.yaml --port 4000
Step 5 — Call via OpenAI SDK:
from openai import OpenAI
client = OpenAI(
api_key="sk-your-master-key", # LiteLLM master key
base_url="http://localhost:4000"
)
response = client.chat.completions.create(
model="gemini-3.1-pro",
messages=[{"role": "user", "content": "Hello Gemini 3."}]
)
Tool Use / Function Calling
LiteLLM translates OpenAI tool format to Gemini's native schema:
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"]
}
}
}]
response = client.chat.completions.create(
model="gemini-3.1-pro",
tools=tools,
messages=[{"role": "user", "content": "Weather in Tokyo?"}]
)
# Response contains tool_calls in OpenAI format
if response.choices[0].message.tool_calls:
call = response.choices[0].message.tool_calls[0]
print(call.function.name, call.function.arguments)
Streaming Support
stream = client.chat.completions.create(
model="gemini-3.1-pro",
messages=[...],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
LiteLLM forwards Gemini's native streaming as OpenAI-shape chunks. Latency overhead ~5-15ms from the proxy translation.
Image Input
Pass image URL or base64:
response = client.chat.completions.create(
model="gemini-3.1-pro",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image."},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}]
)
Gemini 3.1 Pro's vision quality is competitive with Claude Opus 4.7 and GPT-5.4 for most scenarios.
Common Errors & Fixes
Error 1: InvalidArgument: model 'gemini-3.1-pro' not found
→ Wrong model prefix. Use gemini/gemini-3.1-pro in LiteLLM config (with gemini/ prefix).
Error 2: 500 API key not valid
→ $GEMINI_API_KEY not exported or wrong. Check os.environ.get('GEMINI_API_KEY') resolves.
Error 3: 429 Resource exhausted
→ Google's rate limit hit. Free tier has low limits. Enable billing in Google AI Studio.
Error 4: Tool call format mismatch
→ LiteLLM < 1.45 had translation bugs. Upgrade: pip install -U litellm.
Error 5: ContentFilterException
→ Gemini's safety filters triggered. Adjust safety thresholds or switch to model with laxer defaults.
FAQ
LiteLLM vs LangChain for Gemini integration?
LangChain has a Gemini connector but adds framework complexity. LiteLLM is a thin proxy — just keeps OpenAI SDK syntax working. For pure API routing, LiteLLM. For full agent framework with memory/retrievers, LangChain. Often used together (LangChain → LiteLLM → Gemini).
Can I run LiteLLM as a service (not local)?
Yes. Deploy LiteLLM container to Docker/Kubernetes. Use as team-wide LLM proxy. Or use TokenMix.ai for hosted alternative without managing infrastructure.
Does LiteLLM support Gemini's new gemini-3-flash-preview?
Yes, whichever Gemini model IDs Google exposes work through LiteLLM. New models usually supported within days of Google's release via LiteLLM version updates.
Is there latency overhead?
LiteLLM adds 5-15ms per request for the translation layer. Negligible for most workloads. For latency-critical real-time apps, consider direct Google SDK or Gemini Live API.
Can I route between Gemini and other providers in LiteLLM?
Yes — config multiple model_list entries with different model names. Use routing strategies (round-robin, failover, latency-based). Works similarly to TokenMix.ai's multi-provider routing built-in.
Does LiteLLM expose Gemini's task_type parameter for embeddings?
Yes via LiteLLM 1.48+. Pass task_type in extra_body. See LiteLLM embedding docs.
How does this compare to setting up OpenRouter for Gemini?
OpenRouter is similar concept but hosted (not local proxy). Similar functionality, different provider. Use OpenRouter if you want hosted aggregator, LiteLLM if you want local control + explicit multi-provider config.
Sources
- LiteLLM Documentation
- Google Gemini API Docs
- Gemini 3.1 Pro Review — TokenMix
- Gemini 2.5 Pro Review — TokenMix
- Gemini Embedding — TokenMix
- LangChain Tutorial — TokenMix
By TokenMix Research Lab · Updated 2026-04-24