TokenMix Research Lab · 2026-04-06

Google Gemini API Pricing 2026: 3.1 Pro, Flash, Batch Costs
Last Updated: 2026-04-30
Author: TokenMix Research Lab
Data checked: 2026-04-30
Google Gemini API pricing now has three cost levers that matter: model tier, prompt length, and processing mode. Gemini 3.1 Pro is $2/$12 per 1M tokens under 200K prompt tokens. Gemini 3.1 Flash-Lite is $0.25/$1.50. Batch and Flex can cut several paid routes by 50%.
According to Google's official Gemini API pricing page, Gemini 3.1 Pro Preview costs $2 input and $12 output per 1M tokens for prompts up to 200K tokens, then $4 input and $18 output above 200K. Gemini 3.1 Flash-Lite Preview costs $0.25 input and $1.50 output for text, image, and video input. Gemini 3 Flash Preview costs $0.50 input and $3 output. Google also states that paid tier includes context caching, Batch API with 50% cost reduction, higher rate limits, and content not used to improve products.
Table of Contents
- Quick Answer
- Confirmed Gemini Pricing Facts
- Gemini Price Table
- Standard vs Batch vs Flex vs Priority
- Long Context Pricing
- Context Caching Costs
- Grounding and Search Costs
- Cost per Task
- Gemini vs OpenAI vs Claude vs DeepSeek
- When TokenMix.ai Fits
- Final Recommendation
- FAQ
- Related Articles
- Sources
Quick Answer
| Question | Answer |
|---|---|
| Cheapest current Gemini text route | Gemini 3.1 Flash-Lite Preview at $0.25 input and $1.50 output per 1M tokens. |
| Best Gemini reasoning route | Gemini 3.1 Pro Preview at $2/$12 under 200K prompt tokens. |
| Main hidden pricing rule | Gemini Pro prices increase above 200K prompt tokens. |
| Best cost lever | Batch or Flex for async workloads; context caching for repeated prefixes. |
| Search grounding cost | Gemini 3 includes 5,000 prompts per month shared free, then $14 per 1,000 search queries. |
| Best production rule | Use Flash-Lite for bulk work, Flash for fast multimodal work, Pro for hard reasoning. |
Confirmed Gemini Pricing Facts
| Claim | Status | Practical meaning | Source |
|---|---|---|---|
| Gemini 3.1 Pro is $2/$12 under 200K prompt tokens | Confirmed | Premium Gemini reasoning is cheaper than GPT-5.4 on input, but output is close. | Google pricing |
| Gemini 3.1 Pro becomes $4/$18 above 200K | Confirmed | Long prompts change the bill materially. | Google pricing |
| Gemini 3.1 Flash-Lite is $0.25/$1.50 | Confirmed | This is the cheap high-volume Gemini route. | Google pricing |
| Gemini 3 Flash is $0.50/$3 | Confirmed | Better default when Flash-Lite is too light. | Google pricing |
| Batch API can reduce cost by 50% | Confirmed | Use it for offline labeling, summarization, and evaluation. | Google pricing |
| Paid tier content is not used to improve products | Confirmed on pricing page | Important for production privacy posture. | Google pricing |
| Cheapest model is always best | False | Quality, latency, grounding, cache, and prompt length change the decision. | TokenMix analysis |
Gemini Price Table
All prices are per 1M tokens in USD.
| Gemini model | Standard input | Context cache read | Standard output | Main prompt rule | Best use |
|---|---|---|---|---|---|
| Gemini 3.1 Flash-Lite Preview | $0.25 | $0.025 | $1.50 | Text/image/video input; audio input costs more | Bulk simple tasks, translation, extraction |
| Gemini 3 Flash Preview | $0.50 | $0.05 | $3.00 | Text/image/video input; audio input costs more | Fast multimodal work |
| Gemini 3.1 Pro Preview | $2.00 | $0.20 | $12.00 | Price shown for prompts <= 200K | Complex reasoning and coding |
| Gemini 2.5 Pro | $1.25 | $0.125 | $10.00 | Price shown for prompts <= 200K | Stable Pro route |
| Gemini 3 Pro Image Preview | $2.00 text/image | Not listed in the same text table | $12 text, $120 image tokens | Image output has separate token math | Image generation and multimodal output |
The important correction: old Gemini pricing pages often focused on Gemini 2.0 or 2.5 Flash Lite. Current search intent is now about Gemini 3.1 Pro, Gemini 3.1 Flash-Lite, Gemini 3 Flash, Batch, Flex, and context cache storage.
Standard vs Batch vs Flex vs Priority
Google exposes multiple processing modes. The right mode depends on latency and reliability needs.
| Mode | Cost profile | Best for | Avoid when |
|---|---|---|---|
| Standard | Normal price | Live apps and normal production traffic | You can wait and cost matters more. |
| Batch | Often 50% of standard input/output | Offline jobs, evaluations, bulk summarization | A user is waiting for the response. |
| Flex | Similar discount profile for supported Gemini 3.1 routes | Lower-priority production jobs | You need predictable latency. |
| Priority | Higher price | Latency-sensitive or high-priority work | Cost is the main constraint. |
Gemini 3.1 Pro Preview
| Mode | Input <=200K | Input >200K | Output <=200K | Output >200K |
|---|---|---|---|---|
| Standard | $2.00 | $4.00 | $12.00 | $18.00 |
| Batch | $1.00 | $2.00 | $6.00 | $9.00 |
| Flex | $1.00 | $2.00 | $6.00 | $9.00 |
| Priority | $3.60 | $7.20 | $21.60 | $32.40 |
Gemini 3.1 Flash-Lite Preview
| Mode | Text/image/video input | Audio input | Output |
|---|---|---|---|
| Standard | $0.25 | $0.50 | $1.50 |
| Batch | $0.125 | $0.25 | $0.75 |
| Flex | $0.125 | $0.25 | $0.75 |
| Priority | $0.45 | $0.90 | $2.70 |
Gemini 3 Flash Preview
| Mode | Text/image/video input | Audio input | Output |
|---|---|---|---|
| Standard | $0.50 | $1.00 | $3.00 |
| Batch | $0.25 | $0.50 | $1.50 |
| Flex | $0.25 | $0.50 | $1.50 |
Batch is the cleanest lever when the work is not interactive. If you are running nightly evaluation or bulk enrichment, paying Standard pricing is usually waste.
Long Context Pricing
Gemini Pro pricing changes above 200K prompt tokens. Do not use the under-200K price for every workload.
| Model | Prompt <=200K input | Prompt >200K input | Prompt <=200K output | Prompt >200K output |
|---|---|---|---|---|
| Gemini 3.1 Pro Preview | $2.00 | $4.00 | $12.00 | $18.00 |
| Gemini 2.5 Pro | $1.25 | $2.50 | $10.00 | $15.00 |
Cost example:
| Workload | Gemini 3.1 Pro under 200K | Gemini 3.1 Pro over 200K |
|---|---|---|
| 100K input, 5K output | $0.260 | Not applicable |
| 300K input, 5K output | Not applicable | $1.290 |
| 900K input, 20K output | Not applicable | $3.960 |
The rule is simple: Gemini Pro can be very competitive for normal prompts. For very long prompts, check the >200K tier before you promise savings.
Context Caching Costs
Google context caching has two costs: cache read token pricing and hourly storage pricing.
| Model | Cache read <=200K | Cache read >200K | Storage |
|---|---|---|---|
| Gemini 3.1 Pro Preview | $0.20 | $0.40 | $4.50 per 1M tokens per hour |
| Gemini 2.5 Pro | $0.125 | $0.25 | $4.50 per 1M tokens per hour |
| Gemini 3.1 Flash-Lite Preview | $0.025 text/image/video | $0.05 audio | $1.00 per 1M tokens per hour |
| Gemini 3 Flash Preview | $0.05 text/image/video | $0.10 audio | $1.00 per 1M tokens per hour |
Cache math for a repeated 100K-token prefix on Gemini 3.1 Pro:
| Calls in one hour | No cache input cost | Cache read token cost | Storage cost estimate | Practical note |
|---|---|---|---|---|
| 1 | $0.200 | Not useful | Extra storage | Do not cache for one call. |
| 5 | $1.000 | $0.100 read-side equivalent | $0.450 for 100K tokens/hour | Starts to help if reuse is real. |
| 20 | $4.000 | $0.400 read-side equivalent | $0.450 for 100K tokens/hour | Stronger fit. |
This is different from Anthropic's simple 0.1x cache read model. With Gemini, storage duration matters.
Grounding and Search Costs
Gemini 3 pricing includes search-related add-ons. These can matter more than tokens in search-heavy agents.
| Feature | Free allowance | Paid price | Risk |
|---|---|---|---|
| Grounding with Google Search for Gemini 3 | 5,000 prompts per month shared across Gemini 3 | $14 per 1,000 search queries | One prompt can trigger more than one search query. |
| Grounding with Google Maps | 5,000 prompts per month shared across Gemini 3 | $14 per 1,000 search queries for Gemini 3.1 Pro | Search-heavy local workflows can exceed token cost. |
| Gemini 2.5 Pro Google Search grounding | 1,500 RPD free | $35 per 1,000 grounded prompts | Different from Gemini 3 pricing. |
If a Gemini app uses grounding on every request, track search queries separately from model tokens.
Cost per Task
These examples use Standard pricing and no cache unless noted.
| Task | Token shape | Flash-Lite | Gemini 3 Flash | Gemini 3.1 Pro |
|---|---|---|---|---|
| Simple reply | 500 input / 200 output | $0.000425 | $0.000850 | $0.003400 |
| Support answer | 2K input / 800 output | $0.001700 | $0.003400 | $0.013600 |
| RAG answer | 8K input / 500 output | $0.002750 | $0.005500 | $0.022000 |
| Code explanation | 20K input / 3K output | $0.009500 | $0.019000 | $0.076000 |
| Long Pro review | 300K input / 5K output | Not ideal | Not ideal | $1.290000 |
Monthly cost at 10,000 simple replies per day:
| Model | Daily cost | Monthly cost |
|---|---|---|
| Gemini 3.1 Flash-Lite | $4.25 | $127.50 |
| Gemini 3 Flash | $8.50 | $255.00 |
| Gemini 3.1 Pro | $34.00 | $1,020.00 |
This is why routing matters. Use Flash-Lite for simple work, then escalate.
Gemini vs OpenAI vs Claude vs DeepSeek
| Model | Input | Cache read | Output | Best use |
|---|---|---|---|---|
| Gemini 3.1 Flash-Lite | $0.25 | $0.025 | $1.50 | Cheap Gemini route |
| Gemini 3 Flash | $0.50 | $0.05 | $3.00 | Fast multimodal route |
| Gemini 3.1 Pro | $2.00 | $0.20 | $12.00 | Premium Gemini route |
| GPT-5.4 mini | $0.75 | $0.075 | $4.50 | Budget OpenAI route |
| GPT-5.4 | $2.50 | $0.25 | $15.00 | OpenAI default |
| Claude Sonnet 4.6 | $3.00 | $0.30 | $15.00 | Balanced Claude route |
| DeepSeek V4 Flash | $0.14 miss | $0.0028 hit | $0.28 | Lowest-cost text route |
Gemini is strongest when you need Google's multimodal ecosystem, grounding, long-context Pro, or low-cost Flash-Lite. It is not automatically the cheapest overall route once DeepSeek cache hits enter the calculation.
When TokenMix.ai Fits
TokenMix.ai helps when Gemini is one route in a larger model policy. A real production app often needs Gemini for multimodal work, Claude for writing or coding quality, DeepSeek for low-cost reasoning, and OpenAI for ecosystem compatibility.
| Need | Direct Gemini API | TokenMix.ai unified API |
|---|---|---|
| Only Gemini models | Good fit | Optional |
| Gemini plus OpenAI/Claude/DeepSeek | Multiple integrations | One OpenAI-compatible access layer |
| Cost-aware routing | Build yourself | Centralize routing policy |
| Payment flexibility | Google billing path | Useful when direct billing is hard |
| Price comparison | Manual | Compare across providers |
For the broader comparison, use the AI API pricing hub. For OpenAI-compatible setup, read Gemini OpenAI-Compatible API.
Final Recommendation
Start with Gemini 3.1 Flash-Lite for high-volume simple tasks. Use Gemini 3 Flash when Flash-Lite is too weak. Use Gemini 3.1 Pro only when the request needs stronger reasoning, and check the >200K prompt tier before running long-context jobs.
FAQ
How much does Gemini 3.1 Pro cost?
Gemini 3.1 Pro Preview costs $2 input and $12 output per 1M tokens for prompts up to 200K tokens. Above 200K prompt tokens, it costs $4 input and $18 output.
What is the cheapest Gemini API model?
Gemini 3.1 Flash-Lite Preview is the cheapest current Gemini text route in this guide. It costs $0.25 input and $1.50 output per 1M tokens for text, image, and video input.
Does Gemini Batch API save 50%?
Yes for supported paid Gemini routes. Gemini 3.1 Pro Batch is $1/$6 under 200K prompt tokens, compared with $2/$12 in Standard mode.
Is Gemini context caching free?
No. Gemini context caching has cache read token pricing and hourly storage pricing. For Gemini 3.1 Pro, cache reads are $0.20 per 1M tokens under 200K and storage is $4.50 per 1M tokens per hour.
Does Gemini pricing increase above 200K tokens?
Yes for Gemini Pro models. Gemini 3.1 Pro increases from $2/$12 to $4/$18 above 200K prompt tokens.
How much does Gemini grounding with Google Search cost?
For Gemini 3, Google's pricing page lists 5,000 prompts per month free, shared across Gemini 3, then $14 per 1,000 search queries. A single prompt can trigger more than one search query.
Is Gemini cheaper than GPT-5.4?
Gemini 3.1 Pro is cheaper than GPT-5.4 on standard input and output under 200K prompts: $2/$12 versus $2.50/$15. Gemini 3.1 Flash-Lite is much cheaper for simple tasks.
Should I use Gemini directly or through a gateway?
Use Gemini directly if your app only needs Google models and native Gemini features. Use a gateway such as TokenMix.ai when Gemini is one route among OpenAI, Claude, DeepSeek, and other model families.
Related Articles
- AI API Pricing 2026: 16 Models, Cache, Batch, Routing Hub
- Gemini OpenAI-Compatible API: 6 Setup Checks Before Switching
- GPT-5 API Pricing 2026: 5.5, 5.4, Mini Costs, Batch Math
- Claude API Pricing 2026: Opus, Sonnet, Haiku Costs Compared
- DeepSeek API Pricing 2026: V4 Costs, Cache Hits, R1 Changes
- OpenAI-Compatible API Gateway: 9 Providers, One SDK Guide
- AI API Gateway 2026: 7 LLM Routing and Fallback Options
- Official Authorized AI API Access 2026: 7 Verification Checks