TokenMix Research Lab · 2026-04-06

Google Gemini API Pricing 2026: Free to $2.50/M — Every Model

Google Gemini API Pricing in 2026: Every Model, Free Tier, Vertex AI Costs, and How to Save

Google Gemini API pricing starts at free — 1,500 requests per day on Google AI Studio at no cost. Paid pricing ranges from $0.01/$0.02 per million tokens (Gemini Flash Lite) to $2.50/$20 per million tokens (Gemini 2.5 Pro over 200K context). This guide covers every Gemini model's pricing, the free tier limits, Google AI Studio vs Vertex AI cost differences, embedding pricing, and real-world cost comparisons against OpenAI and Anthropic. Whether you are a beginner exploring the API for the first time or an enterprise evaluating Google Cloud costs, this guide gives you the numbers. All pricing verified against Google's official pricing page and tracked by TokenMix.ai as of April 2026.

Table of Contents


Gemini API Pricing Quick Reference

All prices per 1 million tokens, Google AI Studio, April 2026:

Model Input Cached Input Output Context Free Tier
Gemini 2.5 Pro .25 $0.315 0.00 1M 1,500 req/day
Gemini 2.5 Flash $0.15 $0.0375 $0.60 1M 1,500 req/day
Gemini 2.0 Flash $0.10 $0.025 $0.40 1M 1,500 req/day
Gemini 2.0 Flash Lite $0.01 -- $0.02 1M 1,500 req/day

All models incur a 2x surcharge on requests exceeding 200K input tokens (except Flash Lite).


Google Gemini Free Tier: What You Get for $0

Google offers the most generous free tier among major AI API providers. This is often the first reason developers choose Gemini.

Free Tier Specs

Feature Free Tier Limit
Requests per day 1,500
Requests per minute 15
Tokens per minute 32,000
Models available All Gemini models
Quality restrictions None — same model as paid
Credit card required No

What 1,500 Requests Per Day Actually Means

At 1,500 requests per day, you can:

For comparison: OpenAI has no ongoing free tier. Anthropic offers limited free access through claude.ai but not API credits. DeepSeek offers a limited free tier with lower quotas.

Free Tier Limitations

The free tier uses lower rate limits (15 RPM vs higher paid limits). For development and testing, this is rarely a constraint. For production traffic, you will need to upgrade.

Important: free tier requests may be used by Google for product improvement. If data privacy is a concern, use the paid tier or Vertex AI, which explicitly does not use your data for training.

How to Access the Free Tier

  1. Go to Google AI Studio
  2. Sign in with a Google account
  3. Get an API key — no credit card, no waitlist
  4. Start making API calls immediately

This is the fastest path from zero to working AI API in the industry.


Gemini 2.5 Pro Pricing

Gemini 2.5 Pro is Google's flagship model — the direct competitor to GPT-5.4 and Claude Sonnet 4.6.

Base Pricing

Component Up to 200K tokens Over 200K tokens
Input .25/M $2.50/M
Cached Input $0.315/M $0.63/M
Output 0.00/M $20.00/M
Thinking Tokens Billed as output Billed as output

Key Pricing Details

Thinking mode tokens are billed as output. When you enable Gemini 2.5 Pro's thinking mode, the model generates internal reasoning tokens. These are billed at the output rate ( 0/M or $20/M past 200K). A request with 2K visible output might generate 5-10K thinking tokens, increasing the effective output cost 3-5x.

Context caching storage costs apply. Cached tokens cost $4.50/M tokens per hour to store. This means caching only saves money at moderate-to-high request volumes.

Gemini 2.5 Pro vs GPT-5.4 Pricing

Dimension Gemini 2.5 Pro GPT-5.4
Input .25/M $2.50/M
Output 0.00/M 5.00/M
Cached Input $0.315/M $0.25/M
Total (3K in, 1K out) $0.01375 $0.0225

Gemini 2.5 Pro is 39% cheaper than GPT-5.4 on a typical request. The input advantage (50% cheaper) and output advantage (33% cheaper) both contribute.

GPT-5.4 has slightly better benchmark scores (SWE-bench 80% vs 78%, MMLU 91% vs 90%). Whether the quality gap justifies the price gap depends on your workload.


Gemini 2.5 Flash Pricing

Gemini 2.5 Flash is the balanced mid-tier: better than Flash 2.0, much cheaper than Pro.

Base Pricing

Component Up to 200K tokens Over 200K tokens
Input $0.15/M $0.30/M
Cached Input $0.0375/M $0.075/M
Output $0.60/M .20/M
Thinking Output $3.50/M $7.00/M

Notable Detail: Thinking Tokens Cost More

Unlike Gemini 2.5 Pro where thinking tokens are billed at the standard output rate, Gemini 2.5 Flash charges $3.50/M for thinking tokens — nearly 6x its standard output rate of $0.60/M. This makes thinking mode significantly more expensive per token on Flash than on Pro relative to base output pricing.

For simple tasks without reasoning, Flash at $0.15/$0.60 is extremely cost-effective. For reasoning-heavy tasks, the thinking token premium changes the economics.

When to Use Flash vs Pro

Task Flash Cost Pro Cost Recommendation
Simple Q&A (2K in, 500 out) $0.0006 $0.0075 Flash (12x cheaper)
Summarization (10K in, 2K out) $0.0027 $0.0325 Flash (12x cheaper)
Complex reasoning (5K in, 10K thinking, 2K out) $0.0363 $0.1263 Flash (3.5x cheaper)
Code generation (8K in, 3K out) $0.003 $0.04 Flash for budget, Pro for quality

Flash delivers strong quality for most production tasks at 10-12x lower cost than Pro. Reserve Pro for tasks where the quality difference is business-critical.


Gemini 2.0 Flash and Flash Lite Pricing

Gemini 2.0 Flash

Component Up to 200K tokens Over 200K tokens
Input $0.10/M $0.20/M
Cached Input $0.025/M $0.05/M
Output $0.40/M $0.80/M

Gemini 2.0 Flash is the previous generation Flash model. It remains available and is slightly cheaper than 2.5 Flash ($0.10/$0.40 vs $0.15/$0.60). Quality is lower, but for simple tasks the difference is minimal.

Gemini 2.0 Flash Lite

Component Price
Input $0.01/M
Output $0.02/M
Caching Not supported

Flash Lite is the cheapest model in Google's lineup — and among the cheapest in the industry. At $0.01/$0.02 per million tokens, it costs:

Quality is limited. Flash Lite is designed for the simplest tasks: classification, routing, basic extraction, and keyword detection. Do not expect it to handle complex reasoning or nuanced generation.


The 200K Context Surcharge Explained

All Gemini models (except Flash Lite) apply a 2x pricing surcharge on requests with more than 200K input tokens.

How It Works

Model Input (up to 200K) Input (over 200K) Output (up to 200K) Output (over 200K)
Gemini 2.5 Pro .25/M $2.50/M 0.00/M $20.00/M
Gemini 2.5 Flash $0.15/M $0.30/M $0.60/M .20/M
Gemini 2.0 Flash $0.10/M $0.20/M $0.40/M $0.80/M

The surcharge applies to the entire request, not just the tokens above 200K. A request with 250K input tokens pays the higher rate on all 250K tokens.

Cost Impact Example

Processing a 400K token document with Gemini 2.5 Pro:

For long-context workloads, this doubles the effective cost. Compare against DeepSeek V4 which charges a flat $0.30/M at any context length.

Surcharge Comparison

Provider Threshold Post-Surcharge Input
Google Gemini 2.5 Pro 200K $2.50/M
OpenAI GPT-5.4 272K $5.00/M
Anthropic Claude Sonnet 4.6 200K $6.00/M
DeepSeek V4 None $0.30/M (flat)

Even with the surcharge, Gemini 2.5 Pro ($2.50/M) is the cheapest Western provider for long-context work — half the cost of GPT-5.4 ($5.00/M) and less than half of Claude ($6.00/M).


Google Gemini Embedding Pricing

Model Price/M tokens Dimensions
text-embedding-005 Free (up to limits) 768
text-embedding-005 (paid) $0.00 (included) 768

Google's text-embedding-005 is effectively free. Within the standard API rate limits, embedding generation incurs no per-token charge. This is a significant advantage over OpenAI's text-embedding-3 models ($0.02-$0.13/M tokens).

Embedding Cost Comparison

Embedding 1 million documents (500 tokens each, 500M total tokens):

Provider Model Cost
Google text-embedding-005 $0 (within limits)
OpenAI text-embedding-3-small 0
OpenAI text-embedding-3-large $65
Voyage AI voyage-3 $30

For RAG (Retrieval-Augmented Generation) pipelines, free embeddings from Google significantly reduce the total cost of the system. Pair free Google embeddings with Gemini 2.5 Flash for generation, and the entire RAG pipeline costs a fraction of an equivalent OpenAI setup.


Google AI Studio vs Vertex AI Pricing

Google offers two access points for Gemini models. The pricing for tokens is identical, but the surrounding costs differ.

Google AI Studio

Vertex AI (Google Cloud)

Vertex AI Additional Costs

While token pricing matches AI Studio, Vertex AI users may incur:

Cost Component Detail
Google Cloud egress Network egress charges apply
Provisioned throughput Reserved capacity at premium rates
Cloud logging Storage for request/response logs
VPC networking Private service connect costs
Support plan Google Cloud support tiers ($0- 50K+/year)

For most teams under 0K/month in API spend, Google AI Studio is the better starting point. Vertex AI becomes worthwhile when you need enterprise SLAs, data governance, or Google Cloud integration.

Enterprise Commitments

Google offers committed use discounts through Vertex AI for large-volume customers. These are negotiated directly and can reduce per-token pricing by 15-30% depending on commitment size and term length.


Context Caching: How to Cut Gemini API Costs

Context caching stores input tokens for reuse across multiple requests. Cached tokens are billed at 75% off the standard input rate.

Caching Pricing

Model Standard Input Cached Input Discount Storage Cost
Gemini 2.5 Pro .25/M $0.315/M 75% $4.50/M/hour
Gemini 2.5 Flash $0.15/M $0.0375/M 75% .00/M/hour
Gemini 2.0 Flash $0.10/M $0.025/M 75% $0.25/M/hour

When Caching Saves Money

Caching adds a per-hour storage cost. It only saves money when the cache hit volume is high enough to offset storage.

Break-even calculation for Gemini 2.5 Pro:

If you send more than 5 requests per hour against the same cached content, caching saves money. Below 5 requests per hour, the storage cost exceeds the per-token savings.

For Gemini 2.5 Flash (lower storage cost):

Caching Best Practices

  1. Cache system prompts and reference documents that are shared across many requests
  2. Monitor cache hit rates — remove caches with low utilization
  3. Keep caches as small as practical — storage is billed by token count
  4. Use Gemini 2.0 Flash for caching-heavy workloads (lowest storage cost at $0.25/M/hour)

Gemini API Pricing vs OpenAI vs Anthropic

Flagship Model Comparison

Dimension Gemini 2.5 Pro GPT-5.4 Claude Sonnet 4.6
Input/M .25 $2.50 $3.00
Output/M 0.00 5.00 5.00
Cached Input/M $0.315 $0.25 $0.30
Batch Discount No 50% 50%
Context 1M 1.1M 1M
Surcharge Start 200K 272K 200K
Free Tier 1,500 req/day No Limited
SWE-bench ~78% ~80% ~73%
MMLU ~90% ~91% ~88%

Budget Model Comparison

Dimension Gemini 2.5 Flash GPT-5.4 Nano GPT-4o-mini
Input/M $0.15 $0.20 $0.15
Output/M $0.60 .25 $0.60
Context 1M 400K 128K
Free Tier 1,500 req/day No No

Where Google Gemini Wins on Price

  1. Cheapest Western flagship. Gemini 2.5 Pro is 50% cheaper on input than GPT-5.4 and 58% cheaper than Claude.
  2. Free tier. 1,500 requests/day at no cost. No other major provider matches this.
  3. Free embeddings. Zero-cost embeddings vs $0.02-$0.13/M at OpenAI.
  4. Cheapest ultra-budget option. Gemini 2.0 Flash Lite at $0.01/$0.02 undercuts everything.
  5. Long-context pricing. $2.50/M past 200K vs $5.00/M (GPT) and $6.00/M (Claude).

Where Google Gemini Loses on Price

  1. No batch discount. OpenAI and Anthropic offer 50% batch discounts. Google does not. For batch-heavy workloads, GPT-5.4 batch ( .25/$7.50) is cheaper than Gemini 2.5 Pro standard ( .25/ 0).
  2. Cache storage costs. Google charges per-hour storage for cached tokens. OpenAI's caching is automatic with no separate storage fee.
  3. Thinking token economics on Flash. Gemini 2.5 Flash thinking tokens cost $3.50/M — nearly 6x the standard output rate. This makes reasoning tasks disproportionately expensive on Flash.

TokenMix.ai provides unified access to Google Gemini, OpenAI, Anthropic, and DeepSeek through a single API endpoint with automatic cost optimization — see tokenmix.ai for real-time pricing comparisons.


Real-World Cost Scenarios

Scenario 1: First AI Project (Hobbyist/Student)

5,000 requests/month, simple Q&A. Model: Gemini 2.5 Flash.

Approach Monthly Cost
Gemini Free Tier $0
OpenAI GPT-4o-mini $2.25
Anthropic Claude Haiku ~ .50

For beginners and learning projects, Gemini's free tier is unbeatable. Zero cost, full model quality, no credit card.

Scenario 2: Startup MVP (10K users, 50K requests/month)

Average: 3K input, 1K output. Model: Gemini 2.5 Flash.

Provider Monthly Cost
Gemini 2.5 Flash $52.50
GPT-5.4 Nano $97.50
GPT-5.4 Mini $262.50

Gemini 2.5 Flash saves 46% vs GPT-5.4 Nano and 80% vs GPT-5.4 Mini, with comparable quality for general tasks.

Scenario 3: Enterprise RAG Pipeline (200K queries/month)

Average: 20K input (retrieved context + query), 2K output. Model: Gemini 2.5 Pro.

Provider Monthly Cost (with caching)
Gemini 2.5 Pro $5,260
GPT-5.4 $7,100
Claude Sonnet 4.6 $7,200

Gemini 2.5 Pro saves 26% vs GPT-5.4 and 27% vs Claude on this workload. Adding free Google embeddings for the retrieval layer increases the savings further.

Scenario 4: High-Volume Classification (1M requests/month)

Average: 500 input, 50 output tokens. Model: Gemini 2.0 Flash Lite.

Provider Monthly Cost
Gemini 2.0 Flash Lite $6
GPT-4o-mini 05
Gemini 2.5 Flash 05

Flash Lite is 17x cheaper than GPT-4o-mini for simple classification. If accuracy requirements are modest, this is the most economical option in the market.


How to Get Started with the Gemini API

Step 1: Get Your API Key (2 minutes)

  1. Visit Google AI Studio
  2. Sign in with any Google account
  3. Click "Get API Key"
  4. Copy the key — no credit card required

Step 2: Make Your First API Call

The Gemini API uses a REST interface compatible with standard HTTP libraries. Google also provides official SDKs for Python, Node.js, Go, and Dart.

Step 3: Choose Your Model

Step 4: Scale to Paid

When you exceed free tier limits (1,500 requests/day or 15 RPM), enable billing in Google AI Studio or migrate to Vertex AI for enterprise features.


Decision Guide: Choosing the Right Gemini Model

Your Need Best Gemini Model Cost per 1K Requests (3K in, 1K out)
Free development/testing Any model (free tier) $0
Ultra-cheap classification Gemini 2.0 Flash Lite $0.05
Budget production Gemini 2.0 Flash $0.70
Balanced quality/cost Gemini 2.5 Flash .05
Maximum quality Gemini 2.5 Pro 3.75
Reasoning-heavy tasks Gemini 2.5 Pro (thinking) 3.75+

When Not to Use Gemini


Related: Compare all model pricing in our complete LLM API pricing comparison

Conclusion

Google Gemini API pricing in 2026 is designed to win developers at every scale. The free tier eliminates the barrier to entry. Flash models provide competitive quality at budget prices. Gemini 2.5 Pro undercuts GPT-5.4 and Claude Sonnet 4.6 on both input and output while delivering comparable benchmark performance.

The gaps: no batch discount, cache storage costs, and slightly lower coding benchmarks than GPT-5.4. For most workloads — especially those starting from zero or operating at moderate scale — Gemini offers the best value among Western AI API providers.

For teams using multiple providers, TokenMix.ai provides a single API that routes requests to Gemini, OpenAI, Anthropic, or DeepSeek based on cost and quality requirements — with unified billing and automatic failover across all providers.


FAQ

How much does the Google Gemini API cost?

Gemini API pricing ranges from free (1,500 requests/day) to .25/ 0 per million tokens (Gemini 2.5 Pro). The budget Gemini 2.5 Flash model costs $0.15/$0.60 per million tokens. Gemini 2.0 Flash Lite is the cheapest at $0.01/$0.02 per million tokens. All models are available on the free tier.

Is Google Gemini API free?

Yes, partially. Google AI Studio offers 1,500 free requests per day with no credit card required. All Gemini models are available on the free tier at the same quality as paid access. Rate limits are lower on the free tier (15 requests per minute). For production workloads exceeding these limits, paid access is required.

How does Gemini API pricing compare to OpenAI?

Gemini 2.5 Pro ( .25/ 0) is 50% cheaper on input and 33% cheaper on output than GPT-5.4 ($2.50/ 5). Gemini 2.5 Flash ($0.15/$0.60) is comparable to GPT-5.4 Nano ($0.20/ .25) with a much larger context window (1M vs 400K). Google also offers a free tier that OpenAI does not. However, OpenAI offers batch discounts (50%) and automatic caching that Google lacks.

What is the cheapest Gemini model?

Gemini 2.0 Flash Lite at $0.01 per million input tokens and $0.02 per million output tokens. This is among the cheapest model pricing from any major provider. Flash Lite is designed for simple tasks like classification, routing, and basic extraction. For more capable budget options, Gemini 2.5 Flash at $0.15/$0.60 offers significantly better quality.

Is Vertex AI more expensive than Google AI Studio for Gemini?

Per-token pricing is identical between Vertex AI and Google AI Studio. However, Vertex AI may incur additional costs for Google Cloud infrastructure: network egress, logging, VPC networking, and support plans. For API costs alone, both platforms charge the same. Vertex AI adds value through SLAs, data governance, and enterprise features.

Can I use Gemini for free in production?

The free tier allows 1,500 requests per day, which supports small-scale production applications. However, free tier data may be used by Google for product improvement. For production use requiring data privacy, use the paid tier or Vertex AI. Rate limits on the free tier (15 RPM) may also be insufficient for production traffic. Evaluate whether your application fits within these constraints before relying on the free tier at TokenMix.ai, which tracks real-time availability across all providers.


Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: Google AI Pricing, Google Cloud Vertex AI, TokenMix.ai