TokenMix Research Lab · 2026-04-07

Gemini API Pricing 2026: Flash-Lite $0.10 to Pro $2, Free Tier

Gemini API Pricing in 2026: Every Model from Flash-Lite to 3.1 Pro, Real Costs and Free Tier

Last Updated: 2026-04-29
Author: TokenMix Research Lab

Gemini pricing spans Flash-Lite at $0.10/$0.40 (cheapest major-provider model) to 3.1 Pro at $2/$12 per 1M tokens. Free tier: 1,500 req/day on Flash, no credit card. Every model ships with 1M context.

Google's Gemini API offers the widest pricing range of any major provider in 2026 — from Flash-Lite at $0.10/$0.40 per million tokens to 3.1 Pro at $2.00/$12.00. The free tier gives you 1,500 requests/day on Flash models with no credit card. And Gemini Embedding at $0.15/M is the second cheapest hosted embedding model after Google's own text-embedding-005 at $0.00625/M. This guide covers every Gemini model's real cost, compares them against GPT-5.4, Claude, and DeepSeek, and shows when Google's pricing beats everyone else. All pricing from Google AI's official pricing page and tracked by TokenMix.ai, April 2026.

Table of Contents


Quick Gemini API Pricing Overview

Six tiers from $0.10/$0.40 (Flash-Lite, cheapest major-provider model) to $2/$12 (3.1 Pro). Every model has a 1M token context — only Pro charges 2× past 200K input.

All prices per 1M tokens, Google AI official pricing, April 2026:

Model Input Cached Input Output Context Best For
Gemini 3.1 Pro $2.00 $0.20 $12.00 1M* Flagship reasoning
Gemini 3 Flash $0.50 $0.05 $3.00 1M Balanced production
Gemini 2.5 Pro $1.25 $0.3125 $10.00 1M* Previous gen flagship
Gemini 2.5 Flash $0.30 $0.075 $2.50 1M Cost-effective production
Gemini 2.5 Flash-Lite $0.10 $0.025 $0.40 1M Ultra-budget, high volume
Gemini 3.1 Flash-Lite $0.25 $0.025 $1.50 1M New gen budget
Gemini Embedding $0.15 8K Text embedding

*3.1 Pro and 2.5 Pro charge 2x for requests >200K input tokens.

The headline: Flash-Lite at $0.10/$0.40 is the cheapest production API from any major provider — only Groq Llama 8B ($0.05/$0.08) is cheaper. And every Gemini model has a 1M token context window at base price.


Gemini Free Tier: What You Actually Get

Free tier gives 1,500 requests/day on Flash models with 1M tokens/minute and no credit card — the strongest free tier among major providers, sufficient for small production deployments.

Limit Free Tier Value
Requests per day 1,500 (Flash models)
Tokens per minute 1,000,000 (Flash)
Models available Flash, Flash-Lite, Embedding
Credit card needed No
Rate limits Lower than paid tier

1,500 requests/day for free is enough for a small production chatbot. Combined with the 1M token context window, Gemini's free tier is the strongest available for prototyping and small-scale deployment.


Gemini 3.1 Pro: Flagship with Long-Context Surcharge

3.1 Pro at $2/$12 has the cheapest output among premium models ($12 vs $15 GPT/Claude). Cache hits at $0.20/M lead the frontier tier. Past 200K input, pricing doubles to $4/$18. Gemini 3.1 Pro is Google's most capable model at $2.00/$12.00. Like GPT-5.4 and Claude Sonnet, it has a long-context surcharge:

Gemini 3.1 Pro ≤200K Input >200K Input
Input $2.00/M $4.00/M
Output $12.00/M $18.00/M

Gemini Pro is cheaper than GPT-5.4 on output ($12 vs $15) and matches on input ($2.00 vs $2.50). Cache hits at $0.20/M are the cheapest among frontier models — cheaper than GPT's $0.25/M and Claude's $0.30/M.

Best for: Teams that want frontier quality at 20-40% below OpenAI/Anthropic pricing, especially for output-heavy workloads.


Gemini Flash Models: The Budget Sweet Spot

Flash-Lite 2.5 at $0.10/$0.40 is the cheapest major-provider model. Flash 2.5 at $0.30/$2.50 handles 80% of production workloads at budget pricing. All Flash models include the full 1M context with no surcharge.

Model Input Output Cache Hit Value Proposition
2.5 Flash-Lite $0.10 $0.40 $0.025 Cheapest major-provider model
2.5 Flash $0.30 $2.50 $0.075 Strong quality at budget price
3 Flash $0.50 $3.00 $0.05 Latest gen, best Flash quality
3.1 Flash-Lite $0.25 $1.50 $0.025 New gen budget option

2.5 Flash-Lite at $0.10/$0.40 competes with DeepSeek V3.2 ($0.27/$1.10) on input and beats it on output. For high-volume simple tasks — classification, extraction, simple Q&A — it's the cheapest option from a major provider.

2.5 Flash at $0.30/$2.50 offers surprising quality for the price. It handles multi-step reasoning, coding, and analysis at a fraction of Pro pricing. Many teams find Flash sufficient for 80% of their production traffic.

All Flash models include the full 1M token context window at no surcharge — unlike Pro which doubles past 200K.


Gemini Embedding Pricing: Cheapest Hosted Option

Google text-embedding-005 at $0.00625/M is 3× cheaper than OpenAI 3-small ($0.02/M) — the cheapest hosted embedding option available, ideal for cost-sensitive RAG pipelines.

Model Price/M tokens Dimensions
Gemini Embedding $0.15 768
Gemini Embedding 2 $0.20 768+
text-embedding-005 $0.00625 768
OpenAI 3-small $0.02 1536

Google's text-embedding-005 at $0.00625/M is 3x cheaper than OpenAI 3-small — the cheapest hosted embedding option available. If you're building RAG and cost matters, Google embeddings are hard to beat.


Gemini API Pricing vs GPT-5.4 vs Claude vs DeepSeek

Gemini Pro wins flagship cost: 20% cheaper input than GPT-5.4, 20% cheaper output, 20% cheaper cache. Flash-Lite at $0.10/$0.40 wins ultra-budget vs DeepSeek and Claude Haiku.

Model Input/M Output/M Cache Hit/M Context
Gemini 3.1 Pro $2.00 $12.00 $0.20 1M*
Gemini 2.5 Flash $0.30 $2.50 $0.075 1M
Gemini 2.5 Flash-Lite $0.10 $0.40 $0.025 1M
GPT-5.4 $2.50 $15.00 $0.25 1.1M
Claude Sonnet 4.6 $3.00 $15.00 $0.30 1M
DeepSeek V4 $0.30 $0.50 $0.03 1M
Grok 4.20 $2.00 $6.00 $0.20 2M

Key insights from TokenMix.ai:

  1. Gemini Pro is the cheapest flagship on output ($12 vs $15 GPT/Claude, $6 Grok). Cache hits at $0.20/M also lead.

  2. Flash-Lite at $0.10/$0.40 is the ultra-budget champion from a major provider. Only DeepSeek ($0.30/$0.50) and Groq Llama 8B ($0.05/$0.08) compete at this tier.

  3. Every Gemini model has 1M context — no surcharge on Flash/Flash-Lite models. Pro charges 2x past 200K (same pattern as GPT and Claude).

Through TokenMix.ai, access Gemini alongside 155+ other models with a single API and automatic failover.


Real-World Gemini Cost Scenarios

High-volume chatbot (10K conv/day) costs $43.50 on Flash-Lite — cheapest among major providers, beats DeepSeek by 26%. SaaS production (5K calls/day) costs $664 on Flash, 38% under GPT Mini and 61% under Sonnet.

Scenario 1: High-volume chatbot — 10,000 conversations/day

Model Monthly (Cached)
Gemini Flash-Lite $43.50
DeepSeek V4 $58.50
GPT-5.4 Nano $80.10
Claude Haiku 4.5 $69.00

Flash-Lite wins at high volume — cheaper than DeepSeek for input-heavy cached workloads.

Scenario 2: Production SaaS — 5,000 calls/day

Model Cached
Gemini 2.5 Flash $664
Gemini 3.1 Pro $2,790
GPT-5.4 Mini $1,069
Claude Sonnet 4.6 $1,713

Gemini Flash at $664/month — 38% cheaper than GPT Mini, 61% cheaper than Sonnet.


Which Gemini Model Should You Pick?

Default to Flash 2.5 ($0.30/$2.50) for production — handles 80% of workloads. Drop to Flash-Lite for ultra-budget volume work; step up to 3.1 Pro for flagship reasoning. Free tier covers prototyping.

Your Situation Recommended Model Why
Ultra-budget, high volume Flash-Lite 2.5 ($0.10/$0.40) Cheapest major-provider model
Balanced production Flash 2.5 ($0.30/$2.50) Strong quality at budget pricing
Flagship quality 3.1 Pro ($2.00/$12.00) Cheapest flagship output
Free prototyping Flash (free tier) 1,500 req/day, no credit card
RAG embeddings text-embedding-005 ($0.006) Cheapest hosted embedding
Multi-provider with failover Gemini via TokenMix.ai Unified API, auto-failover

Related: Compare all model pricing in our complete LLM API pricing comparison

What's the Bottom Line on Gemini Pricing?

Gemini wins ultra-budget (Flash-Lite cheapest from any major provider), wins flagship output ($12/M, lowest in tier), wins free tier (1,500 req/day no credit card), wins embedding ($0.00625/M, 3× under OpenAI). Strongest pricing portfolio in 2026. Gemini's pricing in 2026 covers every tier: Flash-Lite at $0.10/$0.40 for budget work, Flash at $0.30/$2.50 for balanced production, and Pro at $2.00/$12.00 for flagship quality with the cheapest output among premium models. The free tier (1,500 req/day, no credit card) is the strongest in the industry.

The 1M context window on every model — with no surcharge on Flash — is a genuine differentiator. And Google's embedding models at $0.006-$0.15/M are the cheapest hosted option available.

Compare Gemini pricing in real time at tokenmix.ai/pricing.


FAQ

How much does the Gemini API cost?

Ranges from Flash-Lite at $0.10/$0.40 to Pro at $2.00/$12.00 per million tokens. Cache hits save 90%. Free tier: 1,500 requests/day on Flash models, no credit card.

Is Gemini cheaper than GPT-5.4?

Yes at every tier. Gemini Pro ($2/$12) is 20% cheaper on input and 20% cheaper on output vs GPT-5.4 ($2.50/$15). Flash-Lite ($0.10/$0.40) is 50% cheaper than GPT Nano ($0.20/$1.25) on input.

Does Gemini have a free tier?

Yes — the most generous among major providers. 1,500 requests/day on Flash models, 1M tokens/minute, no credit card. Sufficient for prototyping and small-scale production.

What is Gemini's context window?

1M tokens on every model. Pro charges 2x past 200K input tokens. Flash and Flash-Lite have flat pricing across the full 1M window.

How does Gemini compare to DeepSeek?

Flash-Lite ($0.10/$0.40) is cheaper than DeepSeek V4 ($0.30/$0.50) on input but slightly cheaper on output too. DeepSeek has higher benchmark scores. For pure budget, Flash-Lite wins on input; DeepSeek wins on quality.

What are the cheapest Gemini embedding models?

text-embedding-005 at $0.00625/M — 3x cheaper than OpenAI's $0.02/M. Gemini Embedding at $0.15/M is also competitive.


Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: Google AI Pricing, TokenMix.ai, and Artificial Analysis