Gemini API Pricing in 2026: Every Model from Flash-Lite to 3.1 Pro, Real Costs and Free Tier

TokenMix Research Lab · 2026-04-07

Gemini API Pricing in 2026: Every Model from Flash-Lite to 3.1 Pro, Real Costs and Free Tier

Gemini API Pricing in 2026: Every Model from Flash-Lite to 3.1 Pro, Real Costs and Free Tier

Google's Gemini API offers the widest pricing range of any major provider in 2026 — from Flash-Lite at $0.10/$0.40 per million tokens to 3.1 Pro at $2.00/$12.00. The free tier gives you 1,500 requests/day on Flash models with no credit card. And Gemini Embedding at $0.15/M is the second cheapest hosted embedding model after Google's own text-embedding-005 at $0.00625/M. This guide covers every Gemini model's real cost, compares them against [GPT-5.4](https://tokenmix.ai/blog/gpt-5-api-pricing), Claude, and DeepSeek, and shows when Google's pricing beats everyone else. All pricing from [Google AI's official pricing page](https://ai.google.dev/pricing) and tracked by [TokenMix.ai](https://tokenmix.ai), April 2026.

Table of Contents

---

Quick Gemini API Pricing Overview

All prices per 1M tokens, [Google AI official pricing](https://ai.google.dev/pricing), April 2026:

| Model | Input | Cached Input | Output | Context | Best For | | ------------------------ | ------- | ------------ | ------- | ------- | --------------------------- | | **Gemini 3.1 Pro** | $2.00 | $0.20 | $12.00 | 1M* | Flagship reasoning | | **Gemini 3 Flash** | $0.50 | $0.05 | $3.00 | 1M | Balanced production | | **Gemini 2.5 Pro** | $1.25 | $0.3125 | $10.00 | 1M* | Previous gen flagship | | **Gemini 2.5 Flash** | $0.30 | $0.075 | $2.50 | 1M | Cost-effective production | | **Gemini 2.5 Flash-Lite**| $0.10 | $0.025 | $0.40 | 1M | Ultra-budget, high volume | | **Gemini 3.1 Flash-Lite**| $0.25 | $0.025 | $1.50 | 1M | New gen budget | | Gemini Embedding | $0.15 | — | — | 8K | Text embedding |

*3.1 Pro and 2.5 Pro charge 2x for requests >200K input tokens.

**The headline:** Flash-Lite at $0.10/$0.40 is the cheapest production API from any major provider — only [Groq](https://tokenmix.ai/blog/groq-api-pricing) Llama 8B ($0.05/$0.08) is cheaper. And every Gemini model has a 1M token context window at base price.

---

Gemini Free Tier: What You Actually Get

Google offers the most generous free tier among major providers:

| Limit | Free Tier Value | | ------------------- | ---------------------------------- | | Requests per day | 1,500 (Flash models) | | Tokens per minute | 1,000,000 (Flash) | | Models available | Flash, Flash-Lite, Embedding | | Credit card needed | No | | Rate limits | Lower than paid tier |

**1,500 requests/day for free** is enough for a small production chatbot. Combined with the 1M token context window, Gemini's free tier is the strongest available for prototyping and small-scale deployment.

---

Gemini 3.1 Pro: Flagship with Long-Context Surcharge

Gemini 3.1 Pro is Google's most capable model at $2.00/$12.00. Like GPT-5.4 and Claude Sonnet, it has a long-context surcharge:

| Gemini 3.1 Pro | ≤200K Input | >200K Input | | -------------- | ----------- | ------------ | | Input | $2.00/M | $4.00/M | | Output | $12.00/M | $18.00/M |

**Gemini Pro is cheaper than GPT-5.4 on output** ($12 vs $15) and matches on input ($2.00 vs $2.50). Cache hits at $0.20/M are the cheapest among frontier models — cheaper than GPT's $0.25/M and Claude's $0.30/M.

**Best for:** Teams that want frontier quality at 20-40% below OpenAI/Anthropic pricing, especially for output-heavy workloads.

---

Gemini Flash Models: The Budget Sweet Spot

Flash models are Google's efficiency tier — and they're where Gemini's pricing truly shines:

| Model | Input | Output | Cache Hit | Value Proposition | | ----------------- | ------ | ------ | --------- | -------------------------------- | | 2.5 Flash-Lite | $0.10 | $0.40 | $0.025 | Cheapest major-provider model | | 2.5 Flash | $0.30 | $2.50 | $0.075 | Strong quality at budget price | | 3 Flash | $0.50 | $3.00 | $0.05 | Latest gen, best Flash quality | | 3.1 Flash-Lite | $0.25 | $1.50 | $0.025 | New gen budget option |

**2.5 Flash-Lite at $0.10/$0.40 competes with DeepSeek V3.2** ($0.27/$1.10) on input and beats it on output. For high-volume simple tasks — classification, extraction, simple Q&A — it's the cheapest option from a major provider.

**2.5 Flash at $0.30/$2.50** offers surprising quality for the price. It handles multi-step reasoning, coding, and analysis at a fraction of Pro pricing. Many teams find Flash sufficient for 80% of their production traffic.

All Flash models include the full 1M token context window at no surcharge — unlike Pro which doubles past 200K.

---

Gemini Embedding Pricing: Cheapest Hosted Option

| Model | Price/M tokens | Dimensions | | ----------------------- | -------------- | ---------- | | Gemini Embedding | $0.15 | 768 | | Gemini Embedding 2 | $0.20 | 768+ | | text-embedding-005 | $0.00625 | 768 | | OpenAI 3-small | $0.02 | 1536 |

Google's text-embedding-005 at $0.00625/M is **3x cheaper than OpenAI 3-small** — the cheapest hosted embedding option available. If you're building RAG and cost matters, Google embeddings are hard to beat.

---

Gemini API Pricing vs GPT-5.4 vs Claude vs DeepSeek

| Model | Input/M | Output/M | Cache Hit/M | Context | | ------------------ | ------- | -------- | ----------- | ------- | | Gemini 3.1 Pro | $2.00 | $12.00 | $0.20 | 1M* | | Gemini 2.5 Flash | $0.30 | $2.50 | $0.075 | 1M | | Gemini 2.5 Flash-Lite | $0.10 | $0.40 | $0.025 | 1M | | GPT-5.4 | $2.50 | $15.00 | $0.25 | 1.1M | | Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 | 1M | | DeepSeek V4 | $0.30 | $0.50 | $0.03 | 1M | | Grok 4.20 | $2.00 | $6.00 | $0.20 | 2M |

**Key insights from [TokenMix.ai](https://tokenmix.ai/pricing):**

1. **Gemini Pro is the cheapest flagship on output** ($12 vs $15 GPT/Claude, $6 Grok). Cache hits at $0.20/M also lead.

2. **Flash-Lite at $0.10/$0.40 is the ultra-budget champion** from a major provider. Only DeepSeek ($0.30/$0.50) and Groq Llama 8B ($0.05/$0.08) compete at this tier.

3. **Every Gemini model has 1M context** — no surcharge on Flash/Flash-Lite models. Pro charges 2x past 200K (same pattern as GPT and Claude).

Through [TokenMix.ai](https://tokenmix.ai), access Gemini alongside 155+ other models with a single API and automatic failover.

---

Real-World Gemini Cost Scenarios

Scenario 1: High-volume chatbot — 10,000 conversations/day

| Model | Monthly (Cached) | | ------------------ | ---------------- | | Gemini Flash-Lite | $43.50 | | DeepSeek V4 | $58.50 | | GPT-5.4 Nano | $80.10 | | Claude Haiku 4.5 | $69.00 |

**Flash-Lite wins at high volume** — cheaper than DeepSeek for input-heavy cached workloads.

Scenario 2: Production SaaS — 5,000 calls/day

| Model | Cached | | ----------------- | --------- | | Gemini 2.5 Flash | $664 | | Gemini 3.1 Pro | $2,790 | | GPT-5.4 Mini | $1,069 | | Claude Sonnet 4.6 | $1,713 |

**Gemini Flash at $664/month** — 38% cheaper than GPT Mini, 61% cheaper than Sonnet.

---

How to Choose the Right Gemini Model

| Your Situation | Recommended Model | Why | | -------------------------------- | --------------------------- | ----------------------------------------- | | Ultra-budget, high volume | Flash-Lite 2.5 ($0.10/$0.40)| Cheapest major-provider model | | Balanced production | Flash 2.5 ($0.30/$2.50) | Strong quality at budget pricing | | Flagship quality | 3.1 Pro ($2.00/$12.00) | Cheapest flagship output | | Free prototyping | Flash (free tier) | 1,500 req/day, no credit card | | RAG embeddings | text-embedding-005 ($0.006) | Cheapest hosted embedding | | Multi-provider with failover | Gemini via TokenMix.ai | Unified API, auto-failover |

---

**Related:** [Compare all model pricing in our complete LLM API pricing comparison](https://tokenmix.ai/blog/llm-api-pricing-comparison)

Conclusion

Gemini's pricing in 2026 covers every tier: Flash-Lite at $0.10/$0.40 for budget work, Flash at $0.30/$2.50 for balanced production, and Pro at $2.00/$12.00 for flagship quality with the cheapest output among premium models. The free tier (1,500 req/day, no credit card) is the strongest in the industry.

The 1M context window on every model — with no surcharge on Flash — is a genuine differentiator. And Google's embedding models at $0.006-$0.15/M are the cheapest hosted option available.

Compare Gemini pricing in real time at [tokenmix.ai/pricing](https://tokenmix.ai/pricing).

---

FAQ

How much does the Gemini API cost?

Ranges from Flash-Lite at $0.10/$0.40 to Pro at $2.00/$12.00 per million tokens. Cache hits save 90%. Free tier: 1,500 requests/day on Flash models, no credit card.

Is Gemini cheaper than GPT-5.4?

Yes at every tier. Gemini Pro ($2/$12) is 20% cheaper on input and 20% cheaper on output vs GPT-5.4 ($2.50/$15). Flash-Lite ($0.10/$0.40) is 50% cheaper than GPT Nano ($0.20/$1.25) on input.

Does Gemini have a free tier?

Yes — the most generous among major providers. 1,500 requests/day on Flash models, 1M tokens/minute, no credit card. Sufficient for prototyping and small-scale production.

What is Gemini's context window?

1M tokens on every model. Pro charges 2x past 200K input tokens. Flash and Flash-Lite have flat pricing across the full 1M window.

How does Gemini compare to DeepSeek?

Flash-Lite ($0.10/$0.40) is cheaper than [DeepSeek V4](https://tokenmix.ai/blog/deepseek-api-pricing) ($0.30/$0.50) on input but slightly cheaper on output too. DeepSeek has higher benchmark scores. For pure budget, Flash-Lite wins on input; DeepSeek wins on quality.

What are the cheapest Gemini embedding models?

text-embedding-005 at $0.00625/M — 3x cheaper than OpenAI's $0.02/M. Gemini Embedding at $0.15/M is also competitive.

---

*Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: [Google AI Pricing](https://ai.google.dev/pricing), [TokenMix.ai](https://tokenmix.ai), and [Artificial Analysis](https://artificialanalysis.ai)*