TokenMix Research Lab · 2026-04-06

Google Gemini API Pricing 2026: 3.1 Pro, Flash, Batch Costs

Google Gemini API Pricing 2026: 3.1 Pro, Flash, Batch Costs

Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30

Google Gemini API pricing now has three cost levers that matter: model tier, prompt length, and processing mode. Gemini 3.1 Pro is $2/$12 per 1M tokens under 200K prompt tokens. Gemini 3.1 Flash-Lite is $0.25/$1.50. Batch and Flex can cut several paid routes by 50%.

According to Google's official Gemini API pricing page, Gemini 3.1 Pro Preview costs $2 input and $12 output per 1M tokens for prompts up to 200K tokens, then $4 input and $18 output above 200K. Gemini 3.1 Flash-Lite Preview costs $0.25 input and $1.50 output for text, image, and video input. Gemini 3 Flash Preview costs $0.50 input and $3 output. Google also states that paid tier includes context caching, Batch API with 50% cost reduction, higher rate limits, and content not used to improve products.

Table of Contents

Quick Answer

Question Answer
Cheapest current Gemini text route Gemini 3.1 Flash-Lite Preview at $0.25 input and $1.50 output per 1M tokens.
Best Gemini reasoning route Gemini 3.1 Pro Preview at $2/$12 under 200K prompt tokens.
Main hidden pricing rule Gemini Pro prices increase above 200K prompt tokens.
Best cost lever Batch or Flex for async workloads; context caching for repeated prefixes.
Search grounding cost Gemini 3 includes 5,000 prompts per month shared free, then $14 per 1,000 search queries.
Best production rule Use Flash-Lite for bulk work, Flash for fast multimodal work, Pro for hard reasoning.

Confirmed Gemini Pricing Facts

Claim Status Practical meaning Source
Gemini 3.1 Pro is $2/$12 under 200K prompt tokens Confirmed Premium Gemini reasoning is cheaper than GPT-5.4 on input, but output is close. Google pricing
Gemini 3.1 Pro becomes $4/$18 above 200K Confirmed Long prompts change the bill materially. Google pricing
Gemini 3.1 Flash-Lite is $0.25/$1.50 Confirmed This is the cheap high-volume Gemini route. Google pricing
Gemini 3 Flash is $0.50/$3 Confirmed Better default when Flash-Lite is too light. Google pricing
Batch API can reduce cost by 50% Confirmed Use it for offline labeling, summarization, and evaluation. Google pricing
Paid tier content is not used to improve products Confirmed on pricing page Important for production privacy posture. Google pricing
Cheapest model is always best False Quality, latency, grounding, cache, and prompt length change the decision. TokenMix analysis

Gemini Price Table

All prices are per 1M tokens in USD.

Gemini model Standard input Context cache read Standard output Main prompt rule Best use
Gemini 3.1 Flash-Lite Preview $0.25 $0.025 $1.50 Text/image/video input; audio input costs more Bulk simple tasks, translation, extraction
Gemini 3 Flash Preview $0.50 $0.05 $3.00 Text/image/video input; audio input costs more Fast multimodal work
Gemini 3.1 Pro Preview $2.00 $0.20 $12.00 Price shown for prompts <= 200K Complex reasoning and coding
Gemini 2.5 Pro $1.25 $0.125 $10.00 Price shown for prompts <= 200K Stable Pro route
Gemini 3 Pro Image Preview $2.00 text/image Not listed in the same text table $12 text, $120 image tokens Image output has separate token math Image generation and multimodal output

The important correction: old Gemini pricing pages often focused on Gemini 2.0 or 2.5 Flash Lite. Current search intent is now about Gemini 3.1 Pro, Gemini 3.1 Flash-Lite, Gemini 3 Flash, Batch, Flex, and context cache storage.

Standard vs Batch vs Flex vs Priority

Google exposes multiple processing modes. The right mode depends on latency and reliability needs.

Mode Cost profile Best for Avoid when
Standard Normal price Live apps and normal production traffic You can wait and cost matters more.
Batch Often 50% of standard input/output Offline jobs, evaluations, bulk summarization A user is waiting for the response.
Flex Similar discount profile for supported Gemini 3.1 routes Lower-priority production jobs You need predictable latency.
Priority Higher price Latency-sensitive or high-priority work Cost is the main constraint.

Gemini 3.1 Pro Preview

Mode Input <=200K Input >200K Output <=200K Output >200K
Standard $2.00 $4.00 $12.00 $18.00
Batch $1.00 $2.00 $6.00 $9.00
Flex $1.00 $2.00 $6.00 $9.00
Priority $3.60 $7.20 $21.60 $32.40

Gemini 3.1 Flash-Lite Preview

Mode Text/image/video input Audio input Output
Standard $0.25 $0.50 $1.50
Batch $0.125 $0.25 $0.75
Flex $0.125 $0.25 $0.75
Priority $0.45 $0.90 $2.70

Gemini 3 Flash Preview

Mode Text/image/video input Audio input Output
Standard $0.50 $1.00 $3.00
Batch $0.25 $0.50 $1.50
Flex $0.25 $0.50 $1.50

Batch is the cleanest lever when the work is not interactive. If you are running nightly evaluation or bulk enrichment, paying Standard pricing is usually waste.

Long Context Pricing

Gemini Pro pricing changes above 200K prompt tokens. Do not use the under-200K price for every workload.

Model Prompt <=200K input Prompt >200K input Prompt <=200K output Prompt >200K output
Gemini 3.1 Pro Preview $2.00 $4.00 $12.00 $18.00
Gemini 2.5 Pro $1.25 $2.50 $10.00 $15.00

Cost example:

Workload Gemini 3.1 Pro under 200K Gemini 3.1 Pro over 200K
100K input, 5K output $0.260 Not applicable
300K input, 5K output Not applicable $1.290
900K input, 20K output Not applicable $3.960

The rule is simple: Gemini Pro can be very competitive for normal prompts. For very long prompts, check the >200K tier before you promise savings.

Context Caching Costs

Google context caching has two costs: cache read token pricing and hourly storage pricing.

Model Cache read <=200K Cache read >200K Storage
Gemini 3.1 Pro Preview $0.20 $0.40 $4.50 per 1M tokens per hour
Gemini 2.5 Pro $0.125 $0.25 $4.50 per 1M tokens per hour
Gemini 3.1 Flash-Lite Preview $0.025 text/image/video $0.05 audio $1.00 per 1M tokens per hour
Gemini 3 Flash Preview $0.05 text/image/video $0.10 audio $1.00 per 1M tokens per hour

Cache math for a repeated 100K-token prefix on Gemini 3.1 Pro:

Calls in one hour No cache input cost Cache read token cost Storage cost estimate Practical note
1 $0.200 Not useful Extra storage Do not cache for one call.
5 $1.000 $0.100 read-side equivalent $0.450 for 100K tokens/hour Starts to help if reuse is real.
20 $4.000 $0.400 read-side equivalent $0.450 for 100K tokens/hour Stronger fit.

This is different from Anthropic's simple 0.1x cache read model. With Gemini, storage duration matters.

Grounding and Search Costs

Gemini 3 pricing includes search-related add-ons. These can matter more than tokens in search-heavy agents.

Feature Free allowance Paid price Risk
Grounding with Google Search for Gemini 3 5,000 prompts per month shared across Gemini 3 $14 per 1,000 search queries One prompt can trigger more than one search query.
Grounding with Google Maps 5,000 prompts per month shared across Gemini 3 $14 per 1,000 search queries for Gemini 3.1 Pro Search-heavy local workflows can exceed token cost.
Gemini 2.5 Pro Google Search grounding 1,500 RPD free $35 per 1,000 grounded prompts Different from Gemini 3 pricing.

If a Gemini app uses grounding on every request, track search queries separately from model tokens.

Cost per Task

These examples use Standard pricing and no cache unless noted.

Task Token shape Flash-Lite Gemini 3 Flash Gemini 3.1 Pro
Simple reply 500 input / 200 output $0.000425 $0.000850 $0.003400
Support answer 2K input / 800 output $0.001700 $0.003400 $0.013600
RAG answer 8K input / 500 output $0.002750 $0.005500 $0.022000
Code explanation 20K input / 3K output $0.009500 $0.019000 $0.076000
Long Pro review 300K input / 5K output Not ideal Not ideal $1.290000

Monthly cost at 10,000 simple replies per day:

Model Daily cost Monthly cost
Gemini 3.1 Flash-Lite $4.25 $127.50
Gemini 3 Flash $8.50 $255.00
Gemini 3.1 Pro $34.00 $1,020.00

This is why routing matters. Use Flash-Lite for simple work, then escalate.

Gemini vs OpenAI vs Claude vs DeepSeek

Model Input Cache read Output Best use
Gemini 3.1 Flash-Lite $0.25 $0.025 $1.50 Cheap Gemini route
Gemini 3 Flash $0.50 $0.05 $3.00 Fast multimodal route
Gemini 3.1 Pro $2.00 $0.20 $12.00 Premium Gemini route
GPT-5.4 mini $0.75 $0.075 $4.50 Budget OpenAI route
GPT-5.4 $2.50 $0.25 $15.00 OpenAI default
Claude Sonnet 4.6 $3.00 $0.30 $15.00 Balanced Claude route
DeepSeek V4 Flash $0.14 miss $0.0028 hit $0.28 Lowest-cost text route

Gemini is strongest when you need Google's multimodal ecosystem, grounding, long-context Pro, or low-cost Flash-Lite. It is not automatically the cheapest overall route once DeepSeek cache hits enter the calculation.

When TokenMix.ai Fits

TokenMix.ai helps when Gemini is one route in a larger model policy. A real production app often needs Gemini for multimodal work, Claude for writing or coding quality, DeepSeek for low-cost reasoning, and OpenAI for ecosystem compatibility.

Need Direct Gemini API TokenMix.ai unified API
Only Gemini models Good fit Optional
Gemini plus OpenAI/Claude/DeepSeek Multiple integrations One OpenAI-compatible access layer
Cost-aware routing Build yourself Centralize routing policy
Payment flexibility Google billing path Useful when direct billing is hard
Price comparison Manual Compare across providers

For the broader comparison, use the AI API pricing hub. For OpenAI-compatible setup, read Gemini OpenAI-Compatible API.

Final Recommendation

Start with Gemini 3.1 Flash-Lite for high-volume simple tasks. Use Gemini 3 Flash when Flash-Lite is too weak. Use Gemini 3.1 Pro only when the request needs stronger reasoning, and check the >200K prompt tier before running long-context jobs.

FAQ

How much does Gemini 3.1 Pro cost?

Gemini 3.1 Pro Preview costs $2 input and $12 output per 1M tokens for prompts up to 200K tokens. Above 200K prompt tokens, it costs $4 input and $18 output.

What is the cheapest Gemini API model?

Gemini 3.1 Flash-Lite Preview is the cheapest current Gemini text route in this guide. It costs $0.25 input and $1.50 output per 1M tokens for text, image, and video input.

Does Gemini Batch API save 50%?

Yes for supported paid Gemini routes. Gemini 3.1 Pro Batch is $1/$6 under 200K prompt tokens, compared with $2/$12 in Standard mode.

Is Gemini context caching free?

No. Gemini context caching has cache read token pricing and hourly storage pricing. For Gemini 3.1 Pro, cache reads are $0.20 per 1M tokens under 200K and storage is $4.50 per 1M tokens per hour.

Does Gemini pricing increase above 200K tokens?

Yes for Gemini Pro models. Gemini 3.1 Pro increases from $2/$12 to $4/$18 above 200K prompt tokens.

How much does Gemini grounding with Google Search cost?

For Gemini 3, Google's pricing page lists 5,000 prompts per month free, shared across Gemini 3, then $14 per 1,000 search queries. A single prompt can trigger more than one search query.

Is Gemini cheaper than GPT-5.4?

Gemini 3.1 Pro is cheaper than GPT-5.4 on standard input and output under 200K prompts: $2/$12 versus $2.50/$15. Gemini 3.1 Flash-Lite is much cheaper for simple tasks.

Should I use Gemini directly or through a gateway?

Use Gemini directly if your app only needs Google models and native Gemini features. Use a gateway such as TokenMix.ai when Gemini is one route among OpenAI, Claude, DeepSeek, and other model families.

Related Articles

Sources