TokenMix Research Lab · 2026-04-24

LiteLLM + Gemini 3 Setup: Complete Integration Guide

LiteLLM + Gemini 3 Setup: Complete Integration Guide

LiteLLM is the most popular OpenAI-compatible proxy for routing across 100+ LLM providers — and integrating Gemini 3.1 Pro through LiteLLM takes 3 config lines. This guide covers the complete setup (native Gemini Python SDK vs LiteLLM proxy vs TokenMix.ai gateway), Gemini 3 tool use / function calling through LiteLLM, streaming support, image input, and the 5 common integration errors and their fixes. Verified against LiteLLM 1.50.0+ and google-genai 0.10.0 as of April 24, 2026. By the end, you'll have Gemini 3.1 Pro responding through the OpenAI SDK with zero code changes to your application.

Table of Contents


Confirmed vs Speculation

Claim Status Source
LiteLLM supports Gemini natively Confirmed LiteLLM docs
Gemini 3.1 Pro model ID gemini-3.1-pro Confirmed Google docs
Function calling works through LiteLLM Confirmed v1.45+
Vision (image input) supported Confirmed
Streaming works in LiteLLM proxy mode Confirmed HTTP chunked
Gemini 3 native tool use is OpenAI-shape Yes with translation
Rate limit headers preserved Partial — LiteLLM rewrites some

Three Integration Paths

Path 1 — Google native SDK (direct):

Path 2 — LiteLLM (local proxy):

Path 3 — TokenMix.ai gateway:

For exploration and local dev, LiteLLM. For production, TokenMix.ai or similar gateway.

LiteLLM Setup in 5 Minutes

Step 1 — Install:

pip install 'litellm[proxy]'

Step 2 — Get Gemini API key: From ai.google.dev → Get API key → save as environment variable

export GEMINI_API_KEY="your_key"

Step 3 — Config file (litellm_config.yaml):

model_list:
  - model_name: gemini-3.1-pro
    litellm_params:
      model: gemini/gemini-3.1-pro
      api_key: os.environ/GEMINI_API_KEY

  - model_name: gemini-2.5-flash
    litellm_params:
      model: gemini/gemini-2.5-flash
      api_key: os.environ/GEMINI_API_KEY

general_settings:
  master_key: sk-your-master-key  # for client auth to LiteLLM

Step 4 — Start proxy:

litellm --config litellm_config.yaml --port 4000

Step 5 — Call via OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-master-key",  # LiteLLM master key
    base_url="http://localhost:4000"
)

response = client.chat.completions.create(
    model="gemini-3.1-pro",
    messages=[{"role": "user", "content": "Hello Gemini 3."}]
)

Tool Use / Function Calling

LiteLLM translates OpenAI tool format to Gemini's native schema:

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="gemini-3.1-pro",
    tools=tools,
    messages=[{"role": "user", "content": "Weather in Tokyo?"}]
)

# Response contains tool_calls in OpenAI format
if response.choices[0].message.tool_calls:
    call = response.choices[0].message.tool_calls[0]
    print(call.function.name, call.function.arguments)

Streaming Support

stream = client.chat.completions.create(
    model="gemini-3.1-pro",
    messages=[...],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

LiteLLM forwards Gemini's native streaming as OpenAI-shape chunks. Latency overhead ~5-15ms from the proxy translation.

Image Input

Pass image URL or base64:

response = client.chat.completions.create(
    model="gemini-3.1-pro",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image."},
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
    }]
)

Gemini 3.1 Pro's vision quality is competitive with Claude Opus 4.7 and GPT-5.4 for most scenarios.

Common Errors & Fixes

Error 1: InvalidArgument: model 'gemini-3.1-pro' not found → Wrong model prefix. Use gemini/gemini-3.1-pro in LiteLLM config (with gemini/ prefix).

Error 2: 500 API key not valid$GEMINI_API_KEY not exported or wrong. Check os.environ.get('GEMINI_API_KEY') resolves.

Error 3: 429 Resource exhausted → Google's rate limit hit. Free tier has low limits. Enable billing in Google AI Studio.

Error 4: Tool call format mismatch → LiteLLM < 1.45 had translation bugs. Upgrade: pip install -U litellm.

Error 5: ContentFilterException → Gemini's safety filters triggered. Adjust safety thresholds or switch to model with laxer defaults.

FAQ

LiteLLM vs LangChain for Gemini integration?

LangChain has a Gemini connector but adds framework complexity. LiteLLM is a thin proxy — just keeps OpenAI SDK syntax working. For pure API routing, LiteLLM. For full agent framework with memory/retrievers, LangChain. Often used together (LangChain → LiteLLM → Gemini).

Can I run LiteLLM as a service (not local)?

Yes. Deploy LiteLLM container to Docker/Kubernetes. Use as team-wide LLM proxy. Or use TokenMix.ai for hosted alternative without managing infrastructure.

Does LiteLLM support Gemini's new gemini-3-flash-preview?

Yes, whichever Gemini model IDs Google exposes work through LiteLLM. New models usually supported within days of Google's release via LiteLLM version updates.

Is there latency overhead?

LiteLLM adds 5-15ms per request for the translation layer. Negligible for most workloads. For latency-critical real-time apps, consider direct Google SDK or Gemini Live API.

Can I route between Gemini and other providers in LiteLLM?

Yes — config multiple model_list entries with different model names. Use routing strategies (round-robin, failover, latency-based). Works similarly to TokenMix.ai's multi-provider routing built-in.

Does LiteLLM expose Gemini's task_type parameter for embeddings?

Yes via LiteLLM 1.48+. Pass task_type in extra_body. See LiteLLM embedding docs.

How does this compare to setting up OpenRouter for Gemini?

OpenRouter is similar concept but hosted (not local proxy). Similar functionality, different provider. Use OpenRouter if you want hosted aggregator, LiteLLM if you want local control + explicit multi-provider config.


Sources

By TokenMix Research Lab · Updated 2026-04-24