TokenMix Research Lab · 2026-04-06

Google Gemini API Pricing 2026: Free to $2.50/M — Every Model

Google Gemini API Pricing in 2026: Every Model, Free Tier, Vertex AI Costs, and How to Save

Google Gemini API pricing starts at free — 1,500 requests per day on Google AI Studio at no cost. Paid pricing ranges from $0.01/$0.02 per million tokens (Gemini Flash Lite) to $2.50/$20 per million tokens (Gemini 2.5 Pro over 200K context). This guide covers every Gemini model's pricing, the free tier limits, Google AI Studio vs Vertex AI cost differences, embedding pricing, and real-world cost comparisons against OpenAI and Anthropic. Whether you are a beginner exploring the API for the first time or an enterprise evaluating Google Cloud costs, this guide gives you the numbers. All pricing verified against Google's official pricing page and tracked by TokenMix.ai as of April 2026.

[Gemini API Pricing Quick Reference]
[Google Gemini Free Tier: What You Get for $0]
[Gemini 2.5 Pro Pricing]
[Gemini 2.5 Flash Pricing]
[Gemini 2.0 Flash and Flash Lite Pricing]
[The 200K Context Surcharge Explained]
[Google Gemini Embedding Pricing]
[Google AI Studio vs Vertex AI Pricing]
[Context Caching: How to Cut Gemini API Costs]
[Gemini API Pricing vs OpenAI vs Anthropic]
[Real-World Cost Scenarios]
[How to Get Started with the Gemini API]
[Decision Guide: Choosing the Right Gemini Model]
[Conclusion]
[FAQ]

Gemini API Pricing Quick Reference

All prices per 1 million tokens, Google AI Studio, April 2026:

Model	Input	Cached Input	Output	Context	Free Tier
Gemini 2.5 Pro	.25	$0.315	0.00	1M	1,500 req/day
Gemini 2.5 Flash	$0.15	$0.0375	$0.60	1M	1,500 req/day
Gemini 2.0 Flash	$0.10	$0.025	$0.40	1M	1,500 req/day
Gemini 2.0 Flash Lite	$0.01	--	$0.02	1M	1,500 req/day

All models incur a 2x surcharge on requests exceeding 200K input tokens (except Flash Lite).

Google Gemini Free Tier: What You Get for $0

Google offers the most generous free tier among major AI API providers. This is often the first reason developers choose Gemini.

Free Tier Specs

Feature	Free Tier Limit
Requests per day	1,500
Requests per minute	15
Tokens per minute	32,000
Models available	All Gemini models
Quality restrictions	None — same model as paid
Credit card required	No

What 1,500 Requests Per Day Actually Means

At 1,500 requests per day, you can:

Run a small chatbot serving ~50 active users daily
Process ~45,000 API calls per month
Build and test an entire application before spending any money
Run A/B tests across multiple Gemini model variants

For comparison: OpenAI has no ongoing free tier. Anthropic offers limited free access through claude.ai but not API credits. DeepSeek offers a limited free tier with lower quotas.

Free Tier Limitations

The free tier uses lower rate limits (15 RPM vs higher paid limits). For development and testing, this is rarely a constraint. For production traffic, you will need to upgrade.

Important: free tier requests may be used by Google for product improvement. If data privacy is a concern, use the paid tier or Vertex AI, which explicitly does not use your data for training.

How to Access the Free Tier

Go to Google AI Studio
Sign in with a Google account
Get an API key — no credit card, no waitlist
Start making API calls immediately

This is the fastest path from zero to working AI API in the industry.

Gemini 2.5 Pro Pricing

Gemini 2.5 Pro is Google's flagship model — the direct competitor to GPT-5.4 and Claude Sonnet 4.6.

Base Pricing

Component	Up to 200K tokens	Over 200K tokens
Input	.25/M	$2.50/M
Cached Input	$0.315/M	$0.63/M
Output	0.00/M	$20.00/M
Thinking Tokens	Billed as output	Billed as output

Key Pricing Details

Thinking mode tokens are billed as output. When you enable Gemini 2.5 Pro's thinking mode, the model generates internal reasoning tokens. These are billed at the output rate ( 0/M or $20/M past 200K). A request with 2K visible output might generate 5-10K thinking tokens, increasing the effective output cost 3-5x.

Context caching storage costs apply. Cached tokens cost $4.50/M tokens per hour to store. This means caching only saves money at moderate-to-high request volumes.

Gemini 2.5 Pro vs GPT-5.4 Pricing

Dimension	Gemini 2.5 Pro	GPT-5.4
Input	.25/M	$2.50/M
Output	0.00/M	5.00/M
Cached Input	$0.315/M	$0.25/M
Total (3K in, 1K out)	$0.01375	$0.0225

Gemini 2.5 Pro is 39% cheaper than GPT-5.4 on a typical request. The input advantage (50% cheaper) and output advantage (33% cheaper) both contribute.

GPT-5.4 has slightly better benchmark scores (SWE-bench 80% vs 78%, MMLU 91% vs 90%). Whether the quality gap justifies the price gap depends on your workload.

Gemini 2.5 Flash Pricing

Gemini 2.5 Flash is the balanced mid-tier: better than Flash 2.0, much cheaper than Pro.

Base Pricing

Component	Up to 200K tokens	Over 200K tokens
Input	$0.15/M	$0.30/M
Cached Input	$0.0375/M	$0.075/M
Output	$0.60/M	.20/M
Thinking Output	$3.50/M	$7.00/M

Notable Detail: Thinking Tokens Cost More

Unlike Gemini 2.5 Pro where thinking tokens are billed at the standard output rate, Gemini 2.5 Flash charges $3.50/M for thinking tokens — nearly 6x its standard output rate of $0.60/M. This makes thinking mode significantly more expensive per token on Flash than on Pro relative to base output pricing.

For simple tasks without reasoning, Flash at $0.15/$0.60 is extremely cost-effective. For reasoning-heavy tasks, the thinking token premium changes the economics.

When to Use Flash vs Pro

Task	Flash Cost	Pro Cost	Recommendation
Simple Q&A (2K in, 500 out)	$0.0006	$0.0075	Flash (12x cheaper)
Summarization (10K in, 2K out)	$0.0027	$0.0325	Flash (12x cheaper)
Complex reasoning (5K in, 10K thinking, 2K out)	$0.0363	$0.1263	Flash (3.5x cheaper)
Code generation (8K in, 3K out)	$0.003	$0.04	Flash for budget, Pro for quality

Flash delivers strong quality for most production tasks at 10-12x lower cost than Pro. Reserve Pro for tasks where the quality difference is business-critical.

Gemini 2.0 Flash and Flash Lite Pricing

Gemini 2.0 Flash

Component	Up to 200K tokens	Over 200K tokens
Input	$0.10/M	$0.20/M
Cached Input	$0.025/M	$0.05/M
Output	$0.40/M	$0.80/M

Gemini 2.0 Flash is the previous generation Flash model. It remains available and is slightly cheaper than 2.5 Flash ($0.10/$0.40 vs $0.15/$0.60). Quality is lower, but for simple tasks the difference is minimal.

Gemini 2.0 Flash Lite

Component	Price
Input	$0.01/M
Output	$0.02/M
Caching	Not supported

Flash Lite is the cheapest model in Google's lineup — and among the cheapest in the industry. At $0.01/$0.02 per million tokens, it costs:

Less than DeepSeek V4 ($0.30/$0.50)
Less than GPT-4o-mini ($0.15/$0.60)
Less than any other model from a major provider

Quality is limited. Flash Lite is designed for the simplest tasks: classification, routing, basic extraction, and keyword detection. Do not expect it to handle complex reasoning or nuanced generation.

The 200K Context Surcharge Explained

All Gemini models (except Flash Lite) apply a 2x pricing surcharge on requests with more than 200K input tokens.

How It Works

Model	Input (up to 200K)	Input (over 200K)	Output (up to 200K)	Output (over 200K)
Gemini 2.5 Pro	.25/M	$2.50/M	0.00/M	$20.00/M
Gemini 2.5 Flash	$0.15/M	$0.30/M	$0.60/M	.20/M
Gemini 2.0 Flash	$0.10/M	$0.20/M	$0.40/M	$0.80/M

The surcharge applies to the entire request, not just the tokens above 200K. A request with 250K input tokens pays the higher rate on all 250K tokens.

Cost Impact Example

Processing a 400K token document with Gemini 2.5 Pro:

Under 200K rate: 400K x .25/M = $0.50
Over 200K rate (actual): 400K x $2.50/M = .00
Difference: 2x higher

For long-context workloads, this doubles the effective cost. Compare against DeepSeek V4 which charges a flat $0.30/M at any context length.

Surcharge Comparison

Provider	Threshold	Post-Surcharge Input
Google Gemini 2.5 Pro	200K	$2.50/M
OpenAI GPT-5.4	272K	$5.00/M
Anthropic Claude Sonnet 4.6	200K	$6.00/M
DeepSeek V4	None	$0.30/M (flat)

Even with the surcharge, Gemini 2.5 Pro ($2.50/M) is the cheapest Western provider for long-context work — half the cost of GPT-5.4 ($5.00/M) and less than half of Claude ($6.00/M).

Google Gemini Embedding Pricing

Model	Price/M tokens	Dimensions
text-embedding-005	Free (up to limits)	768
text-embedding-005 (paid)	$0.00 (included)	768

Google's text-embedding-005 is effectively free. Within the standard API rate limits, embedding generation incurs no per-token charge. This is a significant advantage over OpenAI's text-embedding-3 models ($0.02-$0.13/M tokens).

Embedding Cost Comparison

Embedding 1 million documents (500 tokens each, 500M total tokens):

Provider	Model	Cost
Google	text-embedding-005	$0 (within limits)
OpenAI	text-embedding-3-small	0
OpenAI	text-embedding-3-large	$65
Voyage AI	voyage-3	$30

For RAG (Retrieval-Augmented Generation) pipelines, free embeddings from Google significantly reduce the total cost of the system. Pair free Google embeddings with Gemini 2.5 Flash for generation, and the entire RAG pipeline costs a fraction of an equivalent OpenAI setup.

Google AI Studio vs Vertex AI Pricing

Google offers two access points for Gemini models. The pricing for tokens is identical, but the surrounding costs differ.

Google AI Studio

Token pricing: Same as listed above
Free tier: 1,500 requests/day
Credit card: Not required for free tier
SLA: No formal SLA
Data usage: Free tier data may be used for improvement
Best for: Developers, prototyping, small-scale production

Vertex AI (Google Cloud)

Token pricing: Same per-token rates as AI Studio
Free tier: Limited free credits for new Google Cloud accounts
Credit card: Required (Google Cloud billing)
SLA: Formal SLA with uptime guarantees
Data usage: Not used for training
Additional costs: Google Cloud infrastructure overhead

Vertex AI Additional Costs

While token pricing matches AI Studio, Vertex AI users may incur:

Cost Component	Detail
Google Cloud egress	Network egress charges apply
Provisioned throughput	Reserved capacity at premium rates
Cloud logging	Storage for request/response logs
VPC networking	Private service connect costs
Support plan	Google Cloud support tiers ($0- 50K+/year)

For most teams under 0K/month in API spend, Google AI Studio is the better starting point. Vertex AI becomes worthwhile when you need enterprise SLAs, data governance, or Google Cloud integration.

Enterprise Commitments

Google offers committed use discounts through Vertex AI for large-volume customers. These are negotiated directly and can reduce per-token pricing by 15-30% depending on commitment size and term length.

Context Caching: How to Cut Gemini API Costs

Context caching stores input tokens for reuse across multiple requests. Cached tokens are billed at 75% off the standard input rate.

Caching Pricing

Model	Standard Input	Cached Input	Discount	Storage Cost
Gemini 2.5 Pro	.25/M	$0.315/M	75%	$4.50/M/hour
Gemini 2.5 Flash	$0.15/M	$0.0375/M	75%	.00/M/hour
Gemini 2.0 Flash	$0.10/M	$0.025/M	75%	$0.25/M/hour

When Caching Saves Money

Caching adds a per-hour storage cost. It only saves money when the cache hit volume is high enough to offset storage.

Break-even calculation for Gemini 2.5 Pro:

Cache size: 50K tokens
Storage: 50K x $4.50/M/hour = $0.225/hour
Savings per cache hit: 50K x ( .25 - $0.315)/M = $0.047 per request
Break-even: $0.225 / $0.047 = ~5 requests per hour

If you send more than 5 requests per hour against the same cached content, caching saves money. Below 5 requests per hour, the storage cost exceeds the per-token savings.

For Gemini 2.5 Flash (lower storage cost):

Storage: 50K x .00/M/hour = $0.05/hour
Savings per hit: 50K x ($0.15 - $0.0375)/M = $0.0056 per request
Break-even: ~9 requests per hour

Caching Best Practices

Cache system prompts and reference documents that are shared across many requests
Monitor cache hit rates — remove caches with low utilization
Keep caches as small as practical — storage is billed by token count
Use Gemini 2.0 Flash for caching-heavy workloads (lowest storage cost at $0.25/M/hour)

Gemini API Pricing vs OpenAI vs Anthropic

Flagship Model Comparison

Dimension	Gemini 2.5 Pro	GPT-5.4	Claude Sonnet 4.6
Input/M	.25	$2.50	$3.00
Output/M	0.00	5.00	5.00
Cached Input/M	$0.315	$0.25	$0.30
Batch Discount	No	50%	50%
Context	1M	1.1M	1M
Surcharge Start	200K	272K	200K
Free Tier	1,500 req/day	No	Limited
SWE-bench	~78%	~80%	~73%
MMLU	~90%	~91%	~88%

Budget Model Comparison

Dimension	Gemini 2.5 Flash	GPT-5.4 Nano	GPT-4o-mini
Input/M	$0.15	$0.20	$0.15
Output/M	$0.60	.25	$0.60
Context	1M	400K	128K
Free Tier	1,500 req/day	No	No

Where Google Gemini Wins on Price

Cheapest Western flagship. Gemini 2.5 Pro is 50% cheaper on input than GPT-5.4 and 58% cheaper than Claude.
Free tier. 1,500 requests/day at no cost. No other major provider matches this.
Free embeddings. Zero-cost embeddings vs $0.02-$0.13/M at OpenAI.
Cheapest ultra-budget option. Gemini 2.0 Flash Lite at $0.01/$0.02 undercuts everything.
Long-context pricing. $2.50/M past 200K vs $5.00/M (GPT) and $6.00/M (Claude).

Where Google Gemini Loses on Price

No batch discount. OpenAI and Anthropic offer 50% batch discounts. Google does not. For batch-heavy workloads, GPT-5.4 batch ( .25/$7.50) is cheaper than Gemini 2.5 Pro standard ( .25/ 0).
Cache storage costs. Google charges per-hour storage for cached tokens. OpenAI's caching is automatic with no separate storage fee.
Thinking token economics on Flash. Gemini 2.5 Flash thinking tokens cost $3.50/M — nearly 6x the standard output rate. This makes reasoning tasks disproportionately expensive on Flash.

TokenMix.ai provides unified access to Google Gemini, OpenAI, Anthropic, and DeepSeek through a single API endpoint with automatic cost optimization — see tokenmix.ai for real-time pricing comparisons.

Real-World Cost Scenarios

Scenario 1: First AI Project (Hobbyist/Student)

5,000 requests/month, simple Q&A. Model: Gemini 2.5 Flash.

Approach	Monthly Cost
Gemini Free Tier	$0
OpenAI GPT-4o-mini	$2.25
Anthropic Claude Haiku	~ .50

For beginners and learning projects, Gemini's free tier is unbeatable. Zero cost, full model quality, no credit card.

Scenario 2: Startup MVP (10K users, 50K requests/month)

Average: 3K input, 1K output. Model: Gemini 2.5 Flash.

Provider	Monthly Cost
Gemini 2.5 Flash	$52.50
GPT-5.4 Nano	$97.50
GPT-5.4 Mini	$262.50

Gemini 2.5 Flash saves 46% vs GPT-5.4 Nano and 80% vs GPT-5.4 Mini, with comparable quality for general tasks.

Scenario 3: Enterprise RAG Pipeline (200K queries/month)

Average: 20K input (retrieved context + query), 2K output. Model: Gemini 2.5 Pro.

Provider	Monthly Cost (with caching)
Gemini 2.5 Pro	$5,260
GPT-5.4	$7,100
Claude Sonnet 4.6	$7,200

Gemini 2.5 Pro saves 26% vs GPT-5.4 and 27% vs Claude on this workload. Adding free Google embeddings for the retrieval layer increases the savings further.

Scenario 4: High-Volume Classification (1M requests/month)

Average: 500 input, 50 output tokens. Model: Gemini 2.0 Flash Lite.

Provider	Monthly Cost
Gemini 2.0 Flash Lite	$6
GPT-4o-mini	05
Gemini 2.5 Flash	05

Flash Lite is 17x cheaper than GPT-4o-mini for simple classification. If accuracy requirements are modest, this is the most economical option in the market.

How to Get Started with the Gemini API

Step 1: Get Your API Key (2 minutes)

Visit Google AI Studio
Sign in with any Google account
Click "Get API Key"
Copy the key — no credit card required

Step 2: Make Your First API Call

The Gemini API uses a REST interface compatible with standard HTTP libraries. Google also provides official SDKs for Python, Node.js, Go, and Dart.

Step 3: Choose Your Model

Just starting? Use Gemini 2.5 Flash on the free tier
Need maximum quality? Upgrade to Gemini 2.5 Pro
Need cheapest possible? Use Gemini 2.0 Flash Lite
Building RAG? Use free embeddings + Flash or Pro for generation

Step 4: Scale to Paid

When you exceed free tier limits (1,500 requests/day or 15 RPM), enable billing in Google AI Studio or migrate to Vertex AI for enterprise features.

Decision Guide: Choosing the Right Gemini Model

Your Need	Best Gemini Model	Cost per 1K Requests (3K in, 1K out)
Free development/testing	Any model (free tier)	$0
Ultra-cheap classification	Gemini 2.0 Flash Lite	$0.05
Budget production	Gemini 2.0 Flash	$0.70
Balanced quality/cost	Gemini 2.5 Flash	.05
Maximum quality	Gemini 2.5 Pro	3.75
Reasoning-heavy tasks	Gemini 2.5 Pro (thinking)	3.75+

When Not to Use Gemini

Maximum coding performance: GPT-5.4 and DeepSeek V4 score higher on SWE-bench
Best writing quality: Claude Sonnet 4.6 produces superior prose
Batch processing at scale: OpenAI and Anthropic offer 50% batch discounts that Gemini lacks
Absolute cheapest at scale: DeepSeek V4's $0.30/$0.50 is cheaper than all Gemini models except Flash Lite

Conclusion

Google Gemini API pricing in 2026 is designed to win developers at every scale. The free tier eliminates the barrier to entry. Flash models provide competitive quality at budget prices. Gemini 2.5 Pro undercuts GPT-5.4 and Claude Sonnet 4.6 on both input and output while delivering comparable benchmark performance.

The gaps: no batch discount, cache storage costs, and slightly lower coding benchmarks than GPT-5.4. For most workloads — especially those starting from zero or operating at moderate scale — Gemini offers the best value among Western AI API providers.

For teams using multiple providers, TokenMix.ai provides a single API that routes requests to Gemini, OpenAI, Anthropic, or DeepSeek based on cost and quality requirements — with unified billing and automatic failover across all providers.

FAQ

How much does the Google Gemini API cost?

Gemini API pricing ranges from free (1,500 requests/day) to .25/ 0 per million tokens (Gemini 2.5 Pro). The budget Gemini 2.5 Flash model costs $0.15/$0.60 per million tokens. Gemini 2.0 Flash Lite is the cheapest at $0.01/$0.02 per million tokens. All models are available on the free tier.

Is Google Gemini API free?

Yes, partially. Google AI Studio offers 1,500 free requests per day with no credit card required. All Gemini models are available on the free tier at the same quality as paid access. Rate limits are lower on the free tier (15 requests per minute). For production workloads exceeding these limits, paid access is required.

How does Gemini API pricing compare to OpenAI?

Gemini 2.5 Pro ( .25/ 0) is 50% cheaper on input and 33% cheaper on output than GPT-5.4 ($2.50/ 5). Gemini 2.5 Flash ($0.15/$0.60) is comparable to GPT-5.4 Nano ($0.20/ .25) with a much larger context window (1M vs 400K). Google also offers a free tier that OpenAI does not. However, OpenAI offers batch discounts (50%) and automatic caching that Google lacks.

What is the cheapest Gemini model?

Gemini 2.0 Flash Lite at $0.01 per million input tokens and $0.02 per million output tokens. This is among the cheapest model pricing from any major provider. Flash Lite is designed for simple tasks like classification, routing, and basic extraction. For more capable budget options, Gemini 2.5 Flash at $0.15/$0.60 offers significantly better quality.

Is Vertex AI more expensive than Google AI Studio for Gemini?

Per-token pricing is identical between Vertex AI and Google AI Studio. However, Vertex AI may incur additional costs for Google Cloud infrastructure: network egress, logging, VPC networking, and support plans. For API costs alone, both platforms charge the same. Vertex AI adds value through SLAs, data governance, and enterprise features.

Can I use Gemini for free in production?

The free tier allows 1,500 requests per day, which supports small-scale production applications. However, free tier data may be used by Google for product improvement. For production use requiring data privacy, use the paid tier or Vertex AI. Rate limits on the free tier (15 RPM) may also be insufficient for production traffic. Evaluate whether your application fits within these constraints before relying on the free tier at TokenMix.ai, which tracks real-time availability across all providers.

Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: Google AI Pricing, Google Cloud Vertex AI, TokenMix.ai

Google Gemini API Pricing in 2026: Every Model, Free Tier, Vertex AI Costs, and How to Save

Table of Contents

Gemini API Pricing Quick Reference

Google Gemini Free Tier: What You Get for $0

Free Tier Specs

What 1,500 Requests Per Day Actually Means

Free Tier Limitations

How to Access the Free Tier

Gemini 2.5 Pro Pricing

Base Pricing

Key Pricing Details

Gemini 2.5 Pro vs GPT-5.4 Pricing

Gemini 2.5 Flash Pricing

Base Pricing

Notable Detail: Thinking Tokens Cost More

When to Use Flash vs Pro

Gemini 2.0 Flash and Flash Lite Pricing

Gemini 2.0 Flash

Gemini 2.0 Flash Lite

The 200K Context Surcharge Explained

How It Works

Cost Impact Example

Surcharge Comparison

Google Gemini Embedding Pricing

Embedding Cost Comparison

Google AI Studio vs Vertex AI Pricing

Google AI Studio

Vertex AI (Google Cloud)

Vertex AI Additional Costs

Enterprise Commitments

Context Caching: How to Cut Gemini API Costs

Caching Pricing

When Caching Saves Money

Caching Best Practices

Gemini API Pricing vs OpenAI vs Anthropic

Flagship Model Comparison

Budget Model Comparison

Where Google Gemini Wins on Price

Where Google Gemini Loses on Price

Real-World Cost Scenarios

Scenario 1: First AI Project (Hobbyist/Student)

Scenario 2: Startup MVP (10K users, 50K requests/month)

Scenario 3: Enterprise RAG Pipeline (200K queries/month)

Scenario 4: High-Volume Classification (1M requests/month)

How to Get Started with the Gemini API

Step 1: Get Your API Key (2 minutes)

Step 2: Make Your First API Call

Step 3: Choose Your Model

Step 4: Scale to Paid

Decision Guide: Choosing the Right Gemini Model

When Not to Use Gemini

Conclusion

FAQ

How much does the Google Gemini API cost?

Is Google Gemini API free?

How does Gemini API pricing compare to OpenAI?

What is the cheapest Gemini model?

Is Vertex AI more expensive than Google AI Studio for Gemini?

Can I use Gemini for free in production?