Google Gemini API Pricing in 2026: Every Model, Free Tier, Vertex AI Costs, and How to Save
Google Gemini API pricing starts at free — 1,500 requests per day on Google AI Studio at no cost. Paid pricing ranges from $0.01/$0.02 per million tokens (Gemini Flash Lite) to $2.50/$20 per million tokens (Gemini 2.5 Pro over 200K context). This guide covers every Gemini model's pricing, the free tier limits, Google AI Studio vs Vertex AI cost differences, embedding pricing, and real-world cost comparisons against OpenAI and Anthropic. Whether you are a beginner exploring the API for the first time or an enterprise evaluating Google Cloud costs, this guide gives you the numbers. All pricing verified against Google's official pricing page and tracked by TokenMix.ai as of April 2026.
Table of Contents
[Gemini API Pricing Quick Reference]
[Google Gemini Free Tier: What You Get for $0]
[Gemini 2.5 Pro Pricing]
[Gemini 2.5 Flash Pricing]
[Gemini 2.0 Flash and Flash Lite Pricing]
[The 200K Context Surcharge Explained]
[Google Gemini Embedding Pricing]
[Google AI Studio vs Vertex AI Pricing]
[Context Caching: How to Cut Gemini API Costs]
[Gemini API Pricing vs OpenAI vs Anthropic]
[Real-World Cost Scenarios]
[How to Get Started with the Gemini API]
[Decision Guide: Choosing the Right Gemini Model]
[Conclusion]
[FAQ]
Gemini API Pricing Quick Reference
All prices per 1 million tokens, Google AI Studio, April 2026:
Model
Input
Cached Input
Output
Context
Free Tier
Gemini 2.5 Pro
.25
$0.315
0.00
1M
1,500 req/day
Gemini 2.5 Flash
$0.15
$0.0375
$0.60
1M
1,500 req/day
Gemini 2.0 Flash
$0.10
$0.025
$0.40
1M
1,500 req/day
Gemini 2.0 Flash Lite
$0.01
--
$0.02
1M
1,500 req/day
All models incur a 2x surcharge on requests exceeding 200K input tokens (except Flash Lite).
Google Gemini Free Tier: What You Get for $0
Google offers the most generous free tier among major AI API providers. This is often the first reason developers choose Gemini.
Free Tier Specs
Feature
Free Tier Limit
Requests per day
1,500
Requests per minute
15
Tokens per minute
32,000
Models available
All Gemini models
Quality restrictions
None — same model as paid
Credit card required
No
What 1,500 Requests Per Day Actually Means
At 1,500 requests per day, you can:
Run a small chatbot serving ~50 active users daily
Process ~45,000 API calls per month
Build and test an entire application before spending any money
Run A/B tests across multiple Gemini model variants
For comparison: OpenAI has no ongoing free tier. Anthropic offers limited free access through claude.ai but not API credits. DeepSeek offers a limited free tier with lower quotas.
Free Tier Limitations
The free tier uses lower rate limits (15 RPM vs higher paid limits). For development and testing, this is rarely a constraint. For production traffic, you will need to upgrade.
Important: free tier requests may be used by Google for product improvement. If data privacy is a concern, use the paid tier or Vertex AI, which explicitly does not use your data for training.
This is the fastest path from zero to working AI API in the industry.
Gemini 2.5 Pro Pricing
Gemini 2.5 Pro is Google's flagship model — the direct competitor to GPT-5.4 and Claude Sonnet 4.6.
Base Pricing
Component
Up to 200K tokens
Over 200K tokens
Input
.25/M
$2.50/M
Cached Input
$0.315/M
$0.63/M
Output
0.00/M
$20.00/M
Thinking Tokens
Billed as output
Billed as output
Key Pricing Details
Thinking mode tokens are billed as output. When you enable Gemini 2.5 Pro's thinking mode, the model generates internal reasoning tokens. These are billed at the output rate (
0/M or $20/M past 200K). A request with 2K visible output might generate 5-10K thinking tokens, increasing the effective output cost 3-5x.
Context caching storage costs apply. Cached tokens cost $4.50/M tokens per hour to store. This means caching only saves money at moderate-to-high request volumes.
Gemini 2.5 Pro vs GPT-5.4 Pricing
Dimension
Gemini 2.5 Pro
GPT-5.4
Input
.25/M
$2.50/M
Output
0.00/M
5.00/M
Cached Input
$0.315/M
$0.25/M
Total (3K in, 1K out)
$0.01375
$0.0225
Gemini 2.5 Pro is 39% cheaper than GPT-5.4 on a typical request. The input advantage (50% cheaper) and output advantage (33% cheaper) both contribute.
GPT-5.4 has slightly better benchmark scores (SWE-bench 80% vs 78%, MMLU 91% vs 90%). Whether the quality gap justifies the price gap depends on your workload.
Gemini 2.5 Flash Pricing
Gemini 2.5 Flash is the balanced mid-tier: better than Flash 2.0, much cheaper than Pro.
Base Pricing
Component
Up to 200K tokens
Over 200K tokens
Input
$0.15/M
$0.30/M
Cached Input
$0.0375/M
$0.075/M
Output
$0.60/M
.20/M
Thinking Output
$3.50/M
$7.00/M
Notable Detail: Thinking Tokens Cost More
Unlike Gemini 2.5 Pro where thinking tokens are billed at the standard output rate, Gemini 2.5 Flash charges $3.50/M for thinking tokens — nearly 6x its standard output rate of $0.60/M. This makes thinking mode significantly more expensive per token on Flash than on Pro relative to base output pricing.
For simple tasks without reasoning, Flash at $0.15/$0.60 is extremely cost-effective. For reasoning-heavy tasks, the thinking token premium changes the economics.
When to Use Flash vs Pro
Task
Flash Cost
Pro Cost
Recommendation
Simple Q&A (2K in, 500 out)
$0.0006
$0.0075
Flash (12x cheaper)
Summarization (10K in, 2K out)
$0.0027
$0.0325
Flash (12x cheaper)
Complex reasoning (5K in, 10K thinking, 2K out)
$0.0363
$0.1263
Flash (3.5x cheaper)
Code generation (8K in, 3K out)
$0.003
$0.04
Flash for budget, Pro for quality
Flash delivers strong quality for most production tasks at 10-12x lower cost than Pro. Reserve Pro for tasks where the quality difference is business-critical.
Gemini 2.0 Flash and Flash Lite Pricing
Gemini 2.0 Flash
Component
Up to 200K tokens
Over 200K tokens
Input
$0.10/M
$0.20/M
Cached Input
$0.025/M
$0.05/M
Output
$0.40/M
$0.80/M
Gemini 2.0 Flash is the previous generation Flash model. It remains available and is slightly cheaper than 2.5 Flash ($0.10/$0.40 vs $0.15/$0.60). Quality is lower, but for simple tasks the difference is minimal.
Gemini 2.0 Flash Lite
Component
Price
Input
$0.01/M
Output
$0.02/M
Caching
Not supported
Flash Lite is the cheapest model in Google's lineup — and among the cheapest in the industry. At $0.01/$0.02 per million tokens, it costs:
Quality is limited. Flash Lite is designed for the simplest tasks: classification, routing, basic extraction, and keyword detection. Do not expect it to handle complex reasoning or nuanced generation.
The 200K Context Surcharge Explained
All Gemini models (except Flash Lite) apply a 2x pricing surcharge on requests with more than 200K input tokens.
How It Works
Model
Input (up to 200K)
Input (over 200K)
Output (up to 200K)
Output (over 200K)
Gemini 2.5 Pro
.25/M
$2.50/M
0.00/M
$20.00/M
Gemini 2.5 Flash
$0.15/M
$0.30/M
$0.60/M
.20/M
Gemini 2.0 Flash
$0.10/M
$0.20/M
$0.40/M
$0.80/M
The surcharge applies to the entire request, not just the tokens above 200K. A request with 250K input tokens pays the higher rate on all 250K tokens.
Cost Impact Example
Processing a 400K token document with Gemini 2.5 Pro:
Under 200K rate: 400K x
.25/M = $0.50
Over 200K rate (actual): 400K x $2.50/M =
.00
Difference: 2x higher
For long-context workloads, this doubles the effective cost. Compare against DeepSeek V4 which charges a flat $0.30/M at any context length.
Surcharge Comparison
Provider
Threshold
Post-Surcharge Input
Google Gemini 2.5 Pro
200K
$2.50/M
OpenAI GPT-5.4
272K
$5.00/M
Anthropic Claude Sonnet 4.6
200K
$6.00/M
DeepSeek V4
None
$0.30/M (flat)
Even with the surcharge, Gemini 2.5 Pro ($2.50/M) is the cheapest Western provider for long-context work — half the cost of GPT-5.4 ($5.00/M) and less than half of Claude ($6.00/M).
Google Gemini Embedding Pricing
Model
Price/M tokens
Dimensions
text-embedding-005
Free (up to limits)
768
text-embedding-005 (paid)
$0.00 (included)
768
Google's text-embedding-005 is effectively free. Within the standard API rate limits, embedding generation incurs no per-token charge. This is a significant advantage over OpenAI's text-embedding-3 models ($0.02-$0.13/M tokens).
Embedding Cost Comparison
Embedding 1 million documents (500 tokens each, 500M total tokens):
Provider
Model
Cost
Google
text-embedding-005
$0 (within limits)
OpenAI
text-embedding-3-small
0
OpenAI
text-embedding-3-large
$65
Voyage AI
voyage-3
$30
For RAG (Retrieval-Augmented Generation) pipelines, free embeddings from Google significantly reduce the total cost of the system. Pair free Google embeddings with Gemini 2.5 Flash for generation, and the entire RAG pipeline costs a fraction of an equivalent OpenAI setup.
Google AI Studio vs Vertex AI Pricing
Google offers two access points for Gemini models. The pricing for tokens is identical, but the surrounding costs differ.
Google AI Studio
Token pricing: Same as listed above
Free tier: 1,500 requests/day
Credit card: Not required for free tier
SLA: No formal SLA
Data usage: Free tier data may be used for improvement
Best for: Developers, prototyping, small-scale production
Vertex AI (Google Cloud)
Token pricing: Same per-token rates as AI Studio
Free tier: Limited free credits for new Google Cloud accounts
Credit card: Required (Google Cloud billing)
SLA: Formal SLA with uptime guarantees
Data usage: Not used for training
Additional costs: Google Cloud infrastructure overhead
Vertex AI Additional Costs
While token pricing matches AI Studio, Vertex AI users may incur:
Cost Component
Detail
Google Cloud egress
Network egress charges apply
Provisioned throughput
Reserved capacity at premium rates
Cloud logging
Storage for request/response logs
VPC networking
Private service connect costs
Support plan
Google Cloud support tiers ($0-
50K+/year)
For most teams under
0K/month in API spend, Google AI Studio is the better starting point. Vertex AI becomes worthwhile when you need enterprise SLAs, data governance, or Google Cloud integration.
Enterprise Commitments
Google offers committed use discounts through Vertex AI for large-volume customers. These are negotiated directly and can reduce per-token pricing by 15-30% depending on commitment size and term length.
Context Caching: How to Cut Gemini API Costs
Context caching stores input tokens for reuse across multiple requests. Cached tokens are billed at 75% off the standard input rate.
Caching Pricing
Model
Standard Input
Cached Input
Discount
Storage Cost
Gemini 2.5 Pro
.25/M
$0.315/M
75%
$4.50/M/hour
Gemini 2.5 Flash
$0.15/M
$0.0375/M
75%
.00/M/hour
Gemini 2.0 Flash
$0.10/M
$0.025/M
75%
$0.25/M/hour
When Caching Saves Money
Caching adds a per-hour storage cost. It only saves money when the cache hit volume is high enough to offset storage.
Break-even calculation for Gemini 2.5 Pro:
Cache size: 50K tokens
Storage: 50K x $4.50/M/hour = $0.225/hour
Savings per cache hit: 50K x (
.25 - $0.315)/M = $0.047 per request
Break-even: $0.225 / $0.047 = ~5 requests per hour
If you send more than 5 requests per hour against the same cached content, caching saves money. Below 5 requests per hour, the storage cost exceeds the per-token savings.
For Gemini 2.5 Flash (lower storage cost):
Storage: 50K x
.00/M/hour = $0.05/hour
Savings per hit: 50K x ($0.15 - $0.0375)/M = $0.0056 per request
Break-even: ~9 requests per hour
Caching Best Practices
Cache system prompts and reference documents that are shared across many requests
Monitor cache hit rates — remove caches with low utilization
Keep caches as small as practical — storage is billed by token count
Use Gemini 2.0 Flash for caching-heavy workloads (lowest storage cost at $0.25/M/hour)
Gemini API Pricing vs OpenAI vs Anthropic
Flagship Model Comparison
Dimension
Gemini 2.5 Pro
GPT-5.4
Claude Sonnet 4.6
Input/M
.25
$2.50
$3.00
Output/M
0.00
5.00
5.00
Cached Input/M
$0.315
$0.25
$0.30
Batch Discount
No
50%
50%
Context
1M
1.1M
1M
Surcharge Start
200K
272K
200K
Free Tier
1,500 req/day
No
Limited
SWE-bench
~78%
~80%
~73%
MMLU
~90%
~91%
~88%
Budget Model Comparison
Dimension
Gemini 2.5 Flash
GPT-5.4 Nano
GPT-4o-mini
Input/M
$0.15
$0.20
$0.15
Output/M
$0.60
.25
$0.60
Context
1M
400K
128K
Free Tier
1,500 req/day
No
No
Where Google Gemini Wins on Price
Cheapest Western flagship. Gemini 2.5 Pro is 50% cheaper on input than GPT-5.4 and 58% cheaper than Claude.
Free tier. 1,500 requests/day at no cost. No other major provider matches this.
Free embeddings. Zero-cost embeddings vs $0.02-$0.13/M at OpenAI.
Cheapest ultra-budget option. Gemini 2.0 Flash Lite at $0.01/$0.02 undercuts everything.
Long-context pricing. $2.50/M past 200K vs $5.00/M (GPT) and $6.00/M (Claude).
Where Google Gemini Loses on Price
No batch discount. OpenAI and Anthropic offer 50% batch discounts. Google does not. For batch-heavy workloads, GPT-5.4 batch (
.25/$7.50) is cheaper than Gemini 2.5 Pro standard (
.25/
0).
Cache storage costs. Google charges per-hour storage for cached tokens. OpenAI's caching is automatic with no separate storage fee.
Thinking token economics on Flash. Gemini 2.5 Flash thinking tokens cost $3.50/M — nearly 6x the standard output rate. This makes reasoning tasks disproportionately expensive on Flash.
TokenMix.ai provides unified access to Google Gemini, OpenAI, Anthropic, and DeepSeek through a single API endpoint with automatic cost optimization — see tokenmix.ai for real-time pricing comparisons.
Gemini 2.5 Pro saves 26% vs GPT-5.4 and 27% vs Claude on this workload. Adding free Google embeddings for the retrieval layer increases the savings further.
Flash Lite is 17x cheaper than GPT-4o-mini for simple classification. If accuracy requirements are modest, this is the most economical option in the market.
Google Gemini API pricing in 2026 is designed to win developers at every scale. The free tier eliminates the barrier to entry. Flash models provide competitive quality at budget prices. Gemini 2.5 Pro undercuts GPT-5.4 and Claude Sonnet 4.6 on both input and output while delivering comparable benchmark performance.
The gaps: no batch discount, cache storage costs, and slightly lower coding benchmarks than GPT-5.4. For most workloads — especially those starting from zero or operating at moderate scale — Gemini offers the best value among Western AI API providers.
For teams using multiple providers, TokenMix.ai provides a single API that routes requests to Gemini, OpenAI, Anthropic, or DeepSeek based on cost and quality requirements — with unified billing and automatic failover across all providers.
FAQ
How much does the Google Gemini API cost?
Gemini API pricing ranges from free (1,500 requests/day) to
.25/
0 per million tokens (Gemini 2.5 Pro). The budget Gemini 2.5 Flash model costs $0.15/$0.60 per million tokens. Gemini 2.0 Flash Lite is the cheapest at $0.01/$0.02 per million tokens. All models are available on the free tier.
Is Google Gemini API free?
Yes, partially. Google AI Studio offers 1,500 free requests per day with no credit card required. All Gemini models are available on the free tier at the same quality as paid access. Rate limits are lower on the free tier (15 requests per minute). For production workloads exceeding these limits, paid access is required.
How does Gemini API pricing compare to OpenAI?
Gemini 2.5 Pro (
.25/
0) is 50% cheaper on input and 33% cheaper on output than GPT-5.4 ($2.50/
5). Gemini 2.5 Flash ($0.15/$0.60) is comparable to GPT-5.4 Nano ($0.20/
.25) with a much larger context window (1M vs 400K). Google also offers a free tier that OpenAI does not. However, OpenAI offers batch discounts (50%) and automatic caching that Google lacks.
What is the cheapest Gemini model?
Gemini 2.0 Flash Lite at $0.01 per million input tokens and $0.02 per million output tokens. This is among the cheapest model pricing from any major provider. Flash Lite is designed for simple tasks like classification, routing, and basic extraction. For more capable budget options, Gemini 2.5 Flash at $0.15/$0.60 offers significantly better quality.
Is Vertex AI more expensive than Google AI Studio for Gemini?
Per-token pricing is identical between Vertex AI and Google AI Studio. However, Vertex AI may incur additional costs for Google Cloud infrastructure: network egress, logging, VPC networking, and support plans. For API costs alone, both platforms charge the same. Vertex AI adds value through SLAs, data governance, and enterprise features.
Can I use Gemini for free in production?
The free tier allows 1,500 requests per day, which supports small-scale production applications. However, free tier data may be used by Google for product improvement. For production use requiring data privacy, use the paid tier or Vertex AI. Rate limits on the free tier (15 RPM) may also be insufficient for production traffic. Evaluate whether your application fits within these constraints before relying on the free tier at TokenMix.ai, which tracks real-time availability across all providers.