TokenMix Research Lab · 2026-04-07

Gemini API Pricing in 2026: Every Model from Flash-Lite to 3.1 Pro, Real Costs and Free Tier
Last Updated: 2026-04-29
Author: TokenMix Research Lab
Gemini pricing spans Flash-Lite at $0.10/$0.40 (cheapest major-provider model) to 3.1 Pro at $2/$12 per 1M tokens. Free tier: 1,500 req/day on Flash, no credit card. Every model ships with 1M context.
Google's Gemini API offers the widest pricing range of any major provider in 2026 — from Flash-Lite at $0.10/$0.40 per million tokens to 3.1 Pro at $2.00/$12.00. The free tier gives you 1,500 requests/day on Flash models with no credit card. And Gemini Embedding at $0.15/M is the second cheapest hosted embedding model after Google's own text-embedding-005 at $0.00625/M. This guide covers every Gemini model's real cost, compares them against GPT-5.4, Claude, and DeepSeek, and shows when Google's pricing beats everyone else. All pricing from Google AI's official pricing page and tracked by TokenMix.ai, April 2026.
Table of Contents
- Quick Gemini API Pricing Overview
- Gemini Free Tier: What You Actually Get
- Gemini 3.1 Pro: Flagship with Long-Context Surcharge
- Gemini Flash Models: The Budget Sweet Spot
- Gemini Embedding Pricing: Cheapest Hosted Option
- Gemini API Pricing vs GPT-5.4 vs Claude vs DeepSeek
- Real-World Gemini Cost Scenarios
- How to Choose the Right Gemini Model
- Conclusion
- FAQ
Quick Gemini API Pricing Overview
Six tiers from $0.10/$0.40 (Flash-Lite, cheapest major-provider model) to $2/$12 (3.1 Pro). Every model has a 1M token context — only Pro charges 2× past 200K input.
All prices per 1M tokens, Google AI official pricing, April 2026:
| Model | Input | Cached Input | Output | Context | Best For |
|---|---|---|---|---|---|
| Gemini 3.1 Pro | $2.00 | $0.20 | $12.00 | 1M* | Flagship reasoning |
| Gemini 3 Flash | $0.50 | $0.05 | $3.00 | 1M | Balanced production |
| Gemini 2.5 Pro | $1.25 | $0.3125 | $10.00 | 1M* | Previous gen flagship |
| Gemini 2.5 Flash | $0.30 | $0.075 | $2.50 | 1M | Cost-effective production |
| Gemini 2.5 Flash-Lite | $0.10 | $0.025 | $0.40 | 1M | Ultra-budget, high volume |
| Gemini 3.1 Flash-Lite | $0.25 | $0.025 | $1.50 | 1M | New gen budget |
| Gemini Embedding | $0.15 | — | — | 8K | Text embedding |
*3.1 Pro and 2.5 Pro charge 2x for requests >200K input tokens.
The headline: Flash-Lite at $0.10/$0.40 is the cheapest production API from any major provider — only Groq Llama 8B ($0.05/$0.08) is cheaper. And every Gemini model has a 1M token context window at base price.
Gemini Free Tier: What You Actually Get
Free tier gives 1,500 requests/day on Flash models with 1M tokens/minute and no credit card — the strongest free tier among major providers, sufficient for small production deployments.
| Limit | Free Tier Value |
|---|---|
| Requests per day | 1,500 (Flash models) |
| Tokens per minute | 1,000,000 (Flash) |
| Models available | Flash, Flash-Lite, Embedding |
| Credit card needed | No |
| Rate limits | Lower than paid tier |
1,500 requests/day for free is enough for a small production chatbot. Combined with the 1M token context window, Gemini's free tier is the strongest available for prototyping and small-scale deployment.
Gemini 3.1 Pro: Flagship with Long-Context Surcharge
3.1 Pro at $2/$12 has the cheapest output among premium models ($12 vs $15 GPT/Claude). Cache hits at $0.20/M lead the frontier tier. Past 200K input, pricing doubles to $4/$18. Gemini 3.1 Pro is Google's most capable model at $2.00/$12.00. Like GPT-5.4 and Claude Sonnet, it has a long-context surcharge:
| Gemini 3.1 Pro | ≤200K Input | >200K Input |
|---|---|---|
| Input | $2.00/M | $4.00/M |
| Output | $12.00/M | $18.00/M |
Gemini Pro is cheaper than GPT-5.4 on output ($12 vs $15) and matches on input ($2.00 vs $2.50). Cache hits at $0.20/M are the cheapest among frontier models — cheaper than GPT's $0.25/M and Claude's $0.30/M.
Best for: Teams that want frontier quality at 20-40% below OpenAI/Anthropic pricing, especially for output-heavy workloads.
Gemini Flash Models: The Budget Sweet Spot
Flash-Lite 2.5 at $0.10/$0.40 is the cheapest major-provider model. Flash 2.5 at $0.30/$2.50 handles 80% of production workloads at budget pricing. All Flash models include the full 1M context with no surcharge.
| Model | Input | Output | Cache Hit | Value Proposition |
|---|---|---|---|---|
| 2.5 Flash-Lite | $0.10 | $0.40 | $0.025 | Cheapest major-provider model |
| 2.5 Flash | $0.30 | $2.50 | $0.075 | Strong quality at budget price |
| 3 Flash | $0.50 | $3.00 | $0.05 | Latest gen, best Flash quality |
| 3.1 Flash-Lite | $0.25 | $1.50 | $0.025 | New gen budget option |
2.5 Flash-Lite at $0.10/$0.40 competes with DeepSeek V3.2 ($0.27/$1.10) on input and beats it on output. For high-volume simple tasks — classification, extraction, simple Q&A — it's the cheapest option from a major provider.
2.5 Flash at $0.30/$2.50 offers surprising quality for the price. It handles multi-step reasoning, coding, and analysis at a fraction of Pro pricing. Many teams find Flash sufficient for 80% of their production traffic.
All Flash models include the full 1M token context window at no surcharge — unlike Pro which doubles past 200K.
Gemini Embedding Pricing: Cheapest Hosted Option
Google text-embedding-005 at $0.00625/M is 3× cheaper than OpenAI 3-small ($0.02/M) — the cheapest hosted embedding option available, ideal for cost-sensitive RAG pipelines.
| Model | Price/M tokens | Dimensions |
|---|---|---|
| Gemini Embedding | $0.15 | 768 |
| Gemini Embedding 2 | $0.20 | 768+ |
| text-embedding-005 | $0.00625 | 768 |
| OpenAI 3-small | $0.02 | 1536 |
Google's text-embedding-005 at $0.00625/M is 3x cheaper than OpenAI 3-small — the cheapest hosted embedding option available. If you're building RAG and cost matters, Google embeddings are hard to beat.
Gemini API Pricing vs GPT-5.4 vs Claude vs DeepSeek
Gemini Pro wins flagship cost: 20% cheaper input than GPT-5.4, 20% cheaper output, 20% cheaper cache. Flash-Lite at $0.10/$0.40 wins ultra-budget vs DeepSeek and Claude Haiku.
| Model | Input/M | Output/M | Cache Hit/M | Context |
|---|---|---|---|---|
| Gemini 3.1 Pro | $2.00 | $12.00 | $0.20 | 1M* |
| Gemini 2.5 Flash | $0.30 | $2.50 | $0.075 | 1M |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | $0.025 | 1M |
| GPT-5.4 | $2.50 | $15.00 | $0.25 | 1.1M |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 | 1M |
| DeepSeek V4 | $0.30 | $0.50 | $0.03 | 1M |
| Grok 4.20 | $2.00 | $6.00 | $0.20 | 2M |
Key insights from TokenMix.ai:
Gemini Pro is the cheapest flagship on output ($12 vs $15 GPT/Claude, $6 Grok). Cache hits at $0.20/M also lead.
Flash-Lite at $0.10/$0.40 is the ultra-budget champion from a major provider. Only DeepSeek ($0.30/$0.50) and Groq Llama 8B ($0.05/$0.08) compete at this tier.
Every Gemini model has 1M context — no surcharge on Flash/Flash-Lite models. Pro charges 2x past 200K (same pattern as GPT and Claude).
Through TokenMix.ai, access Gemini alongside 155+ other models with a single API and automatic failover.
Real-World Gemini Cost Scenarios
High-volume chatbot (10K conv/day) costs $43.50 on Flash-Lite — cheapest among major providers, beats DeepSeek by 26%. SaaS production (5K calls/day) costs $664 on Flash, 38% under GPT Mini and 61% under Sonnet.
Scenario 1: High-volume chatbot — 10,000 conversations/day
- Average: 500 input + 300 output tokens, 70% cache hit rate
- Monthly: ~150M input, ~90M output tokens
| Model | Monthly (Cached) |
|---|---|
| Gemini Flash-Lite | $43.50 |
| DeepSeek V4 | $58.50 |
| GPT-5.4 Nano | $80.10 |
| Claude Haiku 4.5 | $69.00 |
Flash-Lite wins at high volume — cheaper than DeepSeek for input-heavy cached workloads.
Scenario 2: Production SaaS — 5,000 calls/day
- Average: 3,000 input + 1,500 output, 75% cache hit rate
- Monthly: ~450M input, ~225M output
| Model | Cached |
|---|---|
| Gemini 2.5 Flash | $664 |
| Gemini 3.1 Pro | $2,790 |
| GPT-5.4 Mini | $1,069 |
| Claude Sonnet 4.6 | $1,713 |
Gemini Flash at $664/month — 38% cheaper than GPT Mini, 61% cheaper than Sonnet.
Which Gemini Model Should You Pick?
Default to Flash 2.5 ($0.30/$2.50) for production — handles 80% of workloads. Drop to Flash-Lite for ultra-budget volume work; step up to 3.1 Pro for flagship reasoning. Free tier covers prototyping.
| Your Situation | Recommended Model | Why |
|---|---|---|
| Ultra-budget, high volume | Flash-Lite 2.5 ($0.10/$0.40) | Cheapest major-provider model |
| Balanced production | Flash 2.5 ($0.30/$2.50) | Strong quality at budget pricing |
| Flagship quality | 3.1 Pro ($2.00/$12.00) | Cheapest flagship output |
| Free prototyping | Flash (free tier) | 1,500 req/day, no credit card |
| RAG embeddings | text-embedding-005 ($0.006) | Cheapest hosted embedding |
| Multi-provider with failover | Gemini via TokenMix.ai | Unified API, auto-failover |
Related: Compare all model pricing in our complete LLM API pricing comparison
What's the Bottom Line on Gemini Pricing?
Gemini wins ultra-budget (Flash-Lite cheapest from any major provider), wins flagship output ($12/M, lowest in tier), wins free tier (1,500 req/day no credit card), wins embedding ($0.00625/M, 3× under OpenAI). Strongest pricing portfolio in 2026. Gemini's pricing in 2026 covers every tier: Flash-Lite at $0.10/$0.40 for budget work, Flash at $0.30/$2.50 for balanced production, and Pro at $2.00/$12.00 for flagship quality with the cheapest output among premium models. The free tier (1,500 req/day, no credit card) is the strongest in the industry.
The 1M context window on every model — with no surcharge on Flash — is a genuine differentiator. And Google's embedding models at $0.006-$0.15/M are the cheapest hosted option available.
Compare Gemini pricing in real time at tokenmix.ai/pricing.
FAQ
How much does the Gemini API cost?
Ranges from Flash-Lite at $0.10/$0.40 to Pro at $2.00/$12.00 per million tokens. Cache hits save 90%. Free tier: 1,500 requests/day on Flash models, no credit card.
Is Gemini cheaper than GPT-5.4?
Yes at every tier. Gemini Pro ($2/$12) is 20% cheaper on input and 20% cheaper on output vs GPT-5.4 ($2.50/$15). Flash-Lite ($0.10/$0.40) is 50% cheaper than GPT Nano ($0.20/$1.25) on input.
Does Gemini have a free tier?
Yes — the most generous among major providers. 1,500 requests/day on Flash models, 1M tokens/minute, no credit card. Sufficient for prototyping and small-scale production.
What is Gemini's context window?
1M tokens on every model. Pro charges 2x past 200K input tokens. Flash and Flash-Lite have flat pricing across the full 1M window.
How does Gemini compare to DeepSeek?
Flash-Lite ($0.10/$0.40) is cheaper than DeepSeek V4 ($0.30/$0.50) on input but slightly cheaper on output too. DeepSeek has higher benchmark scores. For pure budget, Flash-Lite wins on input; DeepSeek wins on quality.
What are the cheapest Gemini embedding models?
text-embedding-005 at $0.00625/M — 3x cheaper than OpenAI's $0.02/M. Gemini Embedding at $0.15/M is also competitive.
Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: Google AI Pricing, TokenMix.ai, and Artificial Analysis