TokenMix Research Lab · 2026-04-24

LiteLLM + Gemini 3 Setup: Complete Integration Guide

LiteLLM is the most popular OpenAI-compatible proxy for routing across 100+ LLM providers — and integrating Gemini 3.1 Pro through LiteLLM takes 3 config lines. This guide covers the complete setup (native Gemini Python SDK vs LiteLLM proxy vs TokenMix.ai gateway), Gemini 3 tool use / function calling through LiteLLM, streaming support, image input, and the 5 common integration errors and their fixes. Verified against LiteLLM 1.50.0+ and google-genai 0.10.0 as of April 24, 2026. By the end, you'll have Gemini 3.1 Pro responding through the OpenAI SDK with zero code changes to your application.

Confirmed vs Speculation
Three Integration Paths
LiteLLM Setup in 5 Minutes
Tool Use / Function Calling
Streaming Support
Image Input
Common Errors & Fixes
FAQ

Confirmed vs Speculation

Claim	Status	Source
LiteLLM supports Gemini natively	Confirmed	LiteLLM docs
Gemini 3.1 Pro model ID `gemini-3.1-pro`	Confirmed	Google docs
Function calling works through LiteLLM	Confirmed	v1.45+
Vision (image input) supported	Confirmed
Streaming works in LiteLLM proxy mode	Confirmed	HTTP chunked
Gemini 3 native tool use is OpenAI-shape	Yes with translation
Rate limit headers preserved	Partial — LiteLLM rewrites some

Snapshot note (2026-04-24): LiteLLM 1.50.0 and google-genai 0.10.0 version numbers are the versions verified at snapshot. Both projects release frequently — check the latest version's release notes for breaking changes before pinning a dependency. Gemini 3.1 Pro is the current flagship; newer Gemini releases (e.g., a 3.2 line) will slot into the same gemini/model-id pattern without config changes.

Three Integration Paths

Path 1 — Google native SDK (direct):

pip install google-genai
Call genai.Client() directly
Pros: best latency, official support
Cons: not OpenAI-compatible, code ties to Gemini

Path 2 — LiteLLM (local proxy):

Run LiteLLM as a local proxy
Call via OpenAI SDK pointing at LiteLLM
Pros: code stays OpenAI-compatible, swap providers via config
Cons: additional process, slight latency

Path 3 — TokenMix.ai gateway:

OpenAI-compatible endpoint, no local proxy
Multi-provider routing built-in
Pros: simplest, production-ready
Cons: external dependency

For exploration and local dev, LiteLLM. For production, TokenMix.ai or similar gateway.

LiteLLM Setup in 5 Minutes

Step 1 — Install:

pip install 'litellm[proxy]'

Step 2 — Get Gemini API key: From ai.google.dev → Get API key → save as environment variable

export GEMINI_API_KEY="your_key"

Step 3 — Config file (litellm_config.yaml):

model_list:
  - model_name: gemini-3.1-pro
    litellm_params:
      model: gemini/gemini-3.1-pro
      api_key: os.environ/GEMINI_API_KEY

  - model_name: gemini-2.5-flash
    litellm_params:
      model: gemini/gemini-2.5-flash
      api_key: os.environ/GEMINI_API_KEY

general_settings:
  master_key: sk-your-master-key  # for client auth to LiteLLM

Step 4 — Start proxy:

litellm --config litellm_config.yaml --port 4000

Step 5 — Call via OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-master-key",  # LiteLLM master key
    base_url="http://localhost:4000"
)

response = client.chat.completions.create(
    model="gemini-3.1-pro",
    messages=[{"role": "user", "content": "Hello Gemini 3."}]
)

Tool Use / Function Calling

LiteLLM translates OpenAI tool format to Gemini's native schema:

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="gemini-3.1-pro",
    tools=tools,
    messages=[{"role": "user", "content": "Weather in Tokyo?"}]
)

# Response contains tool_calls in OpenAI format
if response.choices[0].message.tool_calls:
    call = response.choices[0].message.tool_calls[0]
    print(call.function.name, call.function.arguments)

Streaming Support

stream = client.chat.completions.create(
    model="gemini-3.1-pro",
    messages=[...],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

LiteLLM forwards Gemini's native streaming as OpenAI-shape chunks. Latency overhead ~5-15ms from the proxy translation.

Image Input

Pass image URL or base64:

response = client.chat.completions.create(
    model="gemini-3.1-pro",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image."},
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
    }]
)

Gemini 3.1 Pro's vision quality is competitive with Claude Opus 4.7 and GPT-5.4 for most scenarios.

Common Errors & Fixes

Error 1: InvalidArgument: model 'gemini-3.1-pro' not found → Wrong model prefix. Use gemini/gemini-3.1-pro in LiteLLM config (with gemini/ prefix).

Error 2: 500 API key not valid → $GEMINI_API_KEY not exported or wrong. Check os.environ.get('GEMINI_API_KEY') resolves.

Error 3: 429 Resource exhausted → Google's rate limit hit. Free tier has low limits. Enable billing in Google AI Studio.

Error 4: Tool call format mismatch → LiteLLM < 1.45 had translation bugs. Upgrade: pip install -U litellm.

Error 5: ContentFilterException → Gemini's safety filters triggered. Adjust safety thresholds or switch to model with laxer defaults.

FAQ

LiteLLM vs LangChain for Gemini integration?

LangChain has a Gemini connector but adds framework complexity. LiteLLM is a thin proxy — just keeps OpenAI SDK syntax working. For pure API routing, LiteLLM. For full agent framework with memory/retrievers, LangChain. Often used together (LangChain → LiteLLM → Gemini).

Can I run LiteLLM as a service (not local)?

Yes. Deploy LiteLLM container to Docker/Kubernetes. Use as team-wide LLM proxy. Or use TokenMix.ai for hosted alternative without managing infrastructure.

Does LiteLLM support Gemini's new `gemini-3-flash-preview`?

Yes, whichever Gemini model IDs Google exposes work through LiteLLM. New models usually supported within days of Google's release via LiteLLM version updates.

Is there latency overhead?

LiteLLM adds 5-15ms per request for the translation layer. Negligible for most workloads. For latency-critical real-time apps, consider direct Google SDK or Gemini Live API.

Can I route between Gemini and other providers in LiteLLM?

Yes — config multiple model_list entries with different model names. Use routing strategies (round-robin, failover, latency-based). Works similarly to TokenMix.ai's multi-provider routing built-in.

Does LiteLLM expose Gemini's `task_type` parameter for embeddings?

Yes via LiteLLM 1.48+. Pass task_type in extra_body. See LiteLLM embedding docs.

How does this compare to setting up OpenRouter for Gemini?

OpenRouter is similar concept but hosted (not local proxy). Similar functionality, different provider. Use OpenRouter if you want hosted aggregator, LiteLLM if you want local control + explicit multi-provider config.

Sources

By TokenMix Research Lab · Updated 2026-04-24