TokenMix Research Lab · 2026-06-04

OpenAI API Cost 2026: GPT-5.5, 5.4, Nano, 50% Batch Savings
Last Updated: 2026-06-04 Author: TokenMix Research Lab Data verified: 2026-06-04 - OpenAI official pricing, model catalog, model comparison table, prompt caching guide, rate-limit guide, Batch API guide, GPT-5.5/5.4 model pages, and tool pricing
OpenAI API cost in June 2026 has one clear rule: use GPT-5.5 only when quality pays for a 2x premium over GPT-5.4; otherwise use GPT-5.4 mini/nano, Batch, Flex, and caching.
OpenAI's current pricing page lists gpt-5.5 at $5.00 input, $0.50 cached input, and $30.00 output per 1M short-context tokens; gpt-5.4 is exactly half at $2.50, $0.25, and $15.00; gpt-5.4-mini is $0.75/$0.075/$4.50; and gpt-5.4-nano is $0.20/$0.02/$1.25 (OpenAI pricing). Batch and Flex pricing are listed at 50% of Standard for the same flagship rows, while Priority is a premium lane: gpt-5.5 Priority is $12.50/$1.25/$75.00 and gpt-5.4 Priority is $5.00/$0.50/$30.00 (OpenAI pricing). OpenAI's model guide says to start with gpt-5.5 for complex reasoning and coding, but choose smaller variants such as gpt-5.4-mini or gpt-5.4-nano when optimizing for latency and cost (OpenAI models). Prompt caching can reduce input cost by up to 90% and latency by up to 80% when prefixes match (Prompt caching); Batch offers 50% lower cost, higher separate limits, and a 24-hour turnaround window for async work (Batch API).
Table of Contents
- Quick Verdict
- Current Price Table
- Standard Batch Flex Priority
- $10 Token Buying Power
- Monthly Cost Projection
- Cost Per Task
- Prompt Caching Math
- Tool and Hidden Cost Factors
- Rate Limits and Usage Limits
- Optimization Playbook
- Use Case Matrix
- Risks and Caveats
- Final Recommendation
- FAQ
- Sources
- Related Articles
Quick Verdict
| Claim | Status | Source |
|---|---|---|
gpt-5.5 standard short-context pricing is $5 input, $0.50 cached input, $30 output per 1M tokens |
Confirmed | OpenAI pricing |
gpt-5.4 standard short-context pricing is $2.50 input, $0.25 cached input, $15 output per 1M tokens |
Confirmed | OpenAI pricing |
gpt-5.4-mini standard pricing is $0.75 input, $0.075 cached input, $4.50 output |
Confirmed | OpenAI pricing |
gpt-5.4-nano standard pricing is $0.20 input, $0.02 cached input, $1.25 output |
Confirmed | OpenAI pricing |
| Batch pricing is 50% lower than synchronous APIs | Confirmed | Batch API, OpenAI pricing |
| Flex pricing is listed at the same 50% off level as Batch for the current flagship rows | Confirmed | OpenAI pricing |
| Priority is a cost-saving tier | False | Priority rows are priced above Standard in the current pricing table |
| Prompt caching can reduce input token costs by up to 90% | Confirmed | Prompt caching |
| Prompt caching requires exact prefix matches to help | Confirmed | Prompt caching |
gpt-5.4-nano is the cheapest OpenAI model overall |
False | This article only claims it is the cheapest model in the current GPT-5.4 flagship family table |
gpt-5.5 and gpt-5.4 compare table lists Free TPM as unavailable |
Confirmed | Compare models |
| Regional processing adds a 10% uplift for eligible models released on or after March 5, 2026 | Confirmed | OpenAI pricing |
| Long context can make GPT-5.5 and GPT-5.4 materially more expensive than the headline short-context price | Confirmed | OpenAI pricing |
| OpenAI may keep pushing more workloads toward Batch/Flex economics | Speculation | Pricing layout emphasizes Batch/Flex, but no future guarantee is published |
Current Price Table
This is the current official flagship pricing surface checked on June 4, 2026. "Nano" in this article means gpt-5.4-nano, not the older gpt-5-nano covered separately in OpenAI API Cheapest Model 2026.
| Model | Standard input / 1M | Cached input / 1M | Output / 1M | Long-context input / 1M | Long-context output / 1M | Status |
|---|---|---|---|---|---|---|
gpt-5.5 |
$5.00 | $0.50 | $30.00 | $10.00 | $45.00 | Confirmed |
gpt-5.5-pro |
$30.00 | Not listed | $180.00 | $60.00 | $270.00 | Confirmed |
gpt-5.4 |
$2.50 | $0.25 | $15.00 | $5.00 | $22.50 | Confirmed |
gpt-5.4-mini |
$0.75 | $0.075 | $4.50 | Not listed | Not listed | Confirmed |
gpt-5.4-nano |
$0.20 | $0.02 | $1.25 | Not listed | Not listed | Confirmed |
gpt-5.4-pro |
$30.00 | Not listed | $180.00 | $60.00 | $270.00 | Confirmed |
The headline: gpt-5.5 is 2x gpt-5.4 at Standard short-context prices. gpt-5.5-pro and gpt-5.4-pro are the expensive precision lanes, not default API choices.
Cost calculation 1: 100M input tokens and 20M output tokens cost $500 + $600 = $1,100 on gpt-5.5. The same workload costs $250 + $300 = $550 on gpt-5.4, $75 + $90 = $165 on gpt-5.4-mini, and $20 + $25 = $45 on gpt-5.4-nano.
Standard Batch Flex Priority
The pricing table now makes the lane decision explicit. Standard is default. Batch is cheap but async. Flex is cheap for eligible lower-priority workloads. Priority is expensive for latency-sensitive work.
| Lane | gpt-5.5 input/output |
gpt-5.4 input/output |
gpt-5.4-mini input/output |
gpt-5.4-nano input/output |
Best for | Status |
|---|---|---|---|---|---|---|
| Standard | $5.00 / $30.00 | $2.50 / $15.00 | $0.75 / $4.50 | $0.20 / $1.25 | Normal sync API calls | Confirmed |
| Batch | $2.50 / $15.00 | $1.25 / $7.50 | $0.375 / $2.25 | $0.10 / $0.625 | Eval, extraction, offline jobs | Confirmed |
| Flex | $2.50 / $15.00 | $1.25 / $7.50 | $0.375 / $2.25 | $0.10 / $0.625 | Cost-sensitive traffic that can tolerate lower priority | Confirmed |
| Priority | $12.50 / $75.00 | $5.00 / $30.00 | $1.50 / $9.00 | Not listed | Premium latency lane | Confirmed |
Cost calculation 2: a 100M input / 20M output gpt-5.4 job costs $550 on Standard. The same token volume costs $275 on Batch or Flex. Priority costs $1,100. If the task can wait, Batch is the cleanest 50% cut. If it cannot wait, test Flex only if your SLA can tolerate the tier behavior.
For the narrow GPT-5.5 tier deep dive, use GPT-5.5 Batch vs Flex vs Priority. This article is the broader cost router across the OpenAI family.
$10 Token Buying Power
This table is the easiest way to feel the price gap. Output tokens are the bill killer.
| Model and lane | $10 buys input tokens | $10 buys cached input tokens | $10 buys output tokens | Status |
|---|---|---|---|---|
gpt-5.5 Standard |
2M | 20M | 0.33M | Confirmed math |
gpt-5.5 Batch/Flex |
4M | 40M | 0.67M | Confirmed math |
gpt-5.4 Standard |
4M | 40M | 0.67M | Confirmed math |
gpt-5.4 Batch/Flex |
8M | 80M | 1.33M | Confirmed math |
gpt-5.4-mini Standard |
13.33M | 133.33M | 2.22M | Confirmed math |
gpt-5.4-mini Batch/Flex |
26.67M | 266.67M | 4.44M | Confirmed math |
gpt-5.4-nano Standard |
50M | 500M | 8M | Confirmed math |
gpt-5.4-nano Batch/Flex |
100M | 1,000M | 16M | Confirmed math |
Cost calculation 3: if your agent spends $10 on gpt-5.5 Standard output, it buys about 333K output tokens. The same $10 buys 8M output tokens on gpt-5.4-nano Standard. That is a 24x output-token spread inside the current flagship family.
Monthly Cost Projection
The real question is not "which model is cheapest?" It is "which model clears the task at the lowest monthly failure-adjusted cost?"
| Monthly workload | Token shape | gpt-5.5 Standard |
gpt-5.4 Standard |
gpt-5.4-mini Standard |
gpt-5.4-nano Standard |
Cheapest listed lane |
|---|---|---|---|---|---|---|
| Small SaaS support bot | 10M in / 2M out | $110 | $55 | $16.50 | $4.50 | Nano |
| Medium RAG assistant | 100M in / 20M out | $1,100 | $550 | $165 | $45 | Nano |
| Output-heavy writer | 50M in / 50M out | $1,750 | $875 | $262.50 | $72.50 | Nano |
| Developer agent runs | 2B in / 500M out | $25,000 | $12,500 | $3,750 | $1,025 | Nano |
| Long-context audit | 20M long in / 2M out | $290 on long-context gpt-5.5 |
$145 on long-context gpt-5.4 |
Not listed | Not listed | GPT-5.4 long |
| Offline eval batch | 100M in / 20M out | $550 on Batch | $275 on Batch | $82.50 on Batch | $22.50 on Batch | Nano Batch |
Do not read this as "always use nano." Read it as "do the quality test before paying the flagship premium." If nano fails, move up. If mini fails, move up again. If 5.4 fails and 5.5 saves engineering time or user churn, the premium can be rational.
The cross-provider version of this math is in Cheapest Frontier LLM API 2026, where OpenAI has to compete against Claude, DeepSeek, Gemini, and Groq on cost-per-task rather than raw model branding.
Cost Per Task
| Task | Assumed tokens | gpt-5.5 |
gpt-5.4 |
gpt-5.4-mini |
gpt-5.4-nano |
Practical pick |
|---|---|---|---|---|---|---|
| Classify support ticket | 1K in / 100 out | $0.0080 | $0.0040 | $0.0012 | $0.000325 | Nano first |
| Extract invoice fields | 4K in / 300 out | $0.0290 | $0.0145 | $0.00435 | $0.001175 | Nano or mini |
| Summarize 50K-token doc | 50K in / 1K out | $0.2800 | $0.1400 | $0.0420 | $0.01125 | Mini if nano weak |
| Generate long answer | 3K in / 2K out | $0.0750 | $0.0375 | $0.01125 | $0.00310 | Mini or 5.4 |
| Agent planning step | 20K in / 4K out | $0.2200 | $0.1100 | $0.0330 | $0.00900 | 5.4 or 5.5 after eval |
| Long-context legal scan | 300K in / 5K out | $3.2250 long gpt-5.5 |
$1.6125 long gpt-5.4 |
Not listed | Not listed | 5.4 long unless quality fails |
Cost calculation 4: one million support-ticket classifications at 1K input and 100 output each cost about $8,000 on gpt-5.5, $4,000 on gpt-5.4, $1,200 on gpt-5.4-mini, and $325 on gpt-5.4-nano. A routing mistake here is not academic. It is a monthly budget line.
Here is the tiny calculator behind the tables:
def openai_cost(input_tokens, output_tokens, input_per_m, output_per_m, cached_input_tokens=0, cached_per_m=None):
uncached_input = max(input_tokens - cached_input_tokens, 0)
cached_rate = input_per_m if cached_per_m is None else cached_per_m
return (
uncached_input / 1_000_000 * input_per_m
+ cached_input_tokens / 1_000_000 * cached_rate
+ output_tokens / 1_000_000 * output_per_m
)
print(openai_cost(100_000_000, 20_000_000, 2.50, 15.00, 50_000_000, 0.25))
The output is $437.50 for gpt-5.4 when half of the input is cached. Without caching, the same 100M/20M workload is $550.
Prompt Caching Math
OpenAI says prompt caching works automatically on recent models, but it only helps when the prefix matches. Put static instructions, schemas, tool definitions, examples, and shared context first. Put user-specific data last.
| Workload | Model | Baseline cost | With 50% cached input | With 90% cached input | What changed | Status |
|---|---|---|---|---|---|---|
| 100M in / 20M out | gpt-5.5 |
$1,100 | $875 | $695 | Cached input drops from $5 to $0.50 / 1M | Confirmed math |
| 100M in / 20M out | gpt-5.4 |
$550 | $437.50 | $347.50 | Cached input drops from $2.50 to $0.25 / 1M | Confirmed math |
| 100M in / 20M out | gpt-5.4-mini |
$165 | $131.25 | $104.25 | Cached input drops from $0.75 to $0.075 / 1M | Confirmed math |
| 100M in / 20M out | gpt-5.4-nano |
$45 | $36 | $28.80 | Cached input drops from $0.20 to $0.02 / 1M | Confirmed math |
Caching does not fix output-heavy costs. If the model generates long text, the output column still dominates. If your prompt is mostly dynamic, cache hits may be low. OpenAI states exact prefix matches are required, so a timestamp at the top of the prompt can sabotage the discount.
Tool and Hidden Cost Factors
The model token price is not the whole invoice once tools enter the request.
| Cost factor | Official price or rule | Real impact | Status |
|---|---|---|---|
| Web search all models | $10 / 1K calls plus search content tokens at model rates | Adds $0.01 per search before token costs | Confirmed |
| Web search preview, reasoning models | $10 / 1K calls plus search content tokens at model rates | Same base call fee for reasoning preview path | Confirmed |
| Web search preview, non-reasoning models | $25 / 1K calls, search content tokens free | Higher call fee, different token handling | Confirmed |
| Containers / Code Interpreter 1 GB | $0.03 per 20-minute session container | Cheap per session, expensive if leaked across users | Confirmed |
| Containers / Code Interpreter 64 GB | $1.92 per 20-minute session container | Heavy compute lane, not a token-only bill | Confirmed |
| File search storage | $0.10 / GB per day, 1 GB free | Long-lived indexes become daily recurring cost | Confirmed |
| File search tool call | $2.50 / 1K calls | Retrieval is not just storage | Confirmed |
Realtime audio gpt-realtime-2 |
$32 audio input, $0.40 cached, $64 audio output per 1M audio tokens | Voice agents can dwarf text-only costs | Confirmed |
| Regional processing | 10% uplift for eligible post-March 5, 2026 models | Compliance routing can change unit economics | Confirmed |
If your stack routes across vendors, this is where a gateway pays for itself. AI API Gateway 2026 covers fallback, budget caps, observability, and model routing. TokenMix vs OpenRouter vs Portkey vs LiteLLM covers gateway cost tradeoffs.
Rate Limits and Usage Limits
Do not confuse price with capacity. A workload can be affordable and still fail rate limits.
| Limit concept | What OpenAI says | Cost implication | Status |
|---|---|---|---|
| RPM | Requests per minute | High call count can throttle even with small prompts | Confirmed |
| RPD | Requests per day | Daily request ceilings can block batch-like sync traffic | Confirmed |
| TPM | Tokens per minute | Long prompts and long outputs hit capacity before request count | Confirmed |
| TPD | Tokens per day | Large daily volume needs tier planning | Confirmed |
| IPM | Images per minute | Image-heavy apps need separate capacity math | Confirmed |
| Organization/project scope | Limits apply at org and project level, not user level | One noisy project can affect shared quota | Confirmed |
| Shared limits | Some model families share limits | Fallback inside the same family may not add capacity | Confirmed |
| Unsuccessful requests | Failed retries still contribute to per-minute limit | Retry storms waste capacity | Confirmed |
| Batch queue limit | Pending batch tokens count against queue limit until completion | Async jobs still need queue planning | Confirmed |
OpenAI's model compare page lists gpt-5.5 and gpt-5.4 Tier 1 TPM at 500K and Tier 5 TPM at 40M, with Free TPM shown as unavailable for those models (Compare models). Treat your account dashboard as the source of truth before a launch.
Optimization Playbook
| Lever | Typical saving | Effort | Use when | Caveat | Status |
|---|---|---|---|---|---|
Move from gpt-5.5 to gpt-5.4 |
50% on Standard short-context text | Low | Quality delta is small | Need eval on hard tasks | Confirmed math |
Move from gpt-5.4 to gpt-5.4-mini |
70% on input, 70% on output | Medium | Task is well-defined | Reasoning may degrade | Confirmed math |
| Move from mini to nano | 73% input, 72% output | Medium | Classification, routing, extraction | Requires guardrails | Confirmed math |
| Batch async work | 50% | Medium | Eval, offline extraction, bulk generation | 24-hour turnaround window | Confirmed |
| Flex eligible traffic | 50% listed in pricing table | Medium | Lower-priority cost-sensitive calls | Behavior details depend on tier docs/account | Likely |
| Prompt caching | Up to 90% input cost reduction | Low to medium | Static prefix repeats | Exact prefix match required | Confirmed |
| Shorten outputs | Linear output savings | Low | Model over-generates | May hurt completeness | Confirmed math |
| Route by task | 50-95% possible | High | Mixed workload | Requires eval and observability | Likely |
| Avoid unnecessary tools | $2.50-$25 per 1K calls avoided | Medium | Search/retrieval/tool calls are overused | May reduce capability | Confirmed math |
| Regional processing only where needed | Avoid 10% uplift | Medium | Compliance allows global processing | Compliance may require uplift | Confirmed |
Use Case Matrix
| Use case | Start with | Escalate to | Avoid | Why |
|---|---|---|---|---|
| Ticket classification | gpt-5.4-nano |
gpt-5.4-mini |
gpt-5.5 by default |
Cheap output and simple schema |
| Field extraction | gpt-5.4-nano |
gpt-5.4-mini |
Pro models | Validate against gold set |
| Customer support answer | gpt-5.4-mini |
gpt-5.4 |
Nano without eval | Quality and tone matter |
| Coding assistant | gpt-5.4 |
gpt-5.5 |
Nano for planning | OpenAI positions 5.5 for complex coding |
| Long-horizon agent | gpt-5.4 with routing |
gpt-5.5 for hard steps |
One model for every step | Mixed tasks need routing |
| Offline eval | Batch gpt-5.4-mini or gpt-5.4 |
Batch gpt-5.5 |
Standard sync | Batch halves eligible cost |
| Realtime voice | Realtime model family | Text model plus TTS/STT only if latency allows | Token-only forecast | Audio has separate pricing |
| Compliance-region app | Eligible regional endpoint | Standard global if policy allows | Ignoring uplift | Regional processing can add 10% |
Risks and Caveats
| Risk | What goes wrong | Mitigation | Status |
|---|---|---|---|
| Using old price tables | You publish GPT-5.2 or old nano numbers after docs changed | Re-check official pricing before every PUT/POST | Confirmed |
| Treating nano as universally best | Quality misses cause retries, escalations, or support cost | Run eval before routing production traffic | Likely |
| Ignoring output tokens | Long answers dominate invoice | Cap output and summarize in stages | Confirmed math |
| Misusing Priority | Premium lane used for batchable jobs | Reserve Priority for latency-sensitive user paths | Confirmed |
| Batch where sync is needed | User waits or product breaks | Use Batch only for async jobs | Confirmed |
| Cache miss assumptions | Prompt changes prevent savings | Static prefix first, dynamic content last | Confirmed |
| Long-context surprise | Large prompts trigger long-context prices | Split, retrieve, or summarize before sending | Confirmed |
| Tool fee blind spot | Search, file search, or containers create extra bill lines | Track tool call counts separately | Confirmed |
| Rate-limit retry storm | Failed retries count against per-minute limit | Add jitter, budget caps, and circuit breakers | Confirmed |
| Future price changes | Current cost plan becomes stale | Re-verify pricing before large commitments | Speculation |
Final Recommendation
Use gpt-5.5 only for the hardest coding, reasoning, and professional-work paths. Default cost-sensitive production traffic to gpt-5.4, gpt-5.4-mini, or gpt-5.4-nano, then add Batch, Flex, caching, output caps, and routing.
FAQ
How much does GPT-5.5 API cost in 2026?
gpt-5.5 standard short-context pricing is $5 input, $0.50 cached input, and $30 output per 1M tokens. Long-context pricing is higher at $10 input, $1 cached input, and $45 output.
Is GPT-5.4 cheaper than GPT-5.5?
Yes. The current official table lists gpt-5.4 at exactly half of gpt-5.5 standard short-context prices: $2.50 input and $15 output versus $5 and $30.
What is the cheapest GPT-5.4 family model?
gpt-5.4-nano is the cheapest model in the current GPT-5.4 family table. It costs $0.20 input, $0.02 cached input, and $1.25 output per 1M tokens.
Does Batch API really save 50%?
Yes. OpenAI says Batch API has 50% lower cost than synchronous APIs and a 24-hour turnaround window. Use it for async jobs such as evals, classification, embeddings, and bulk generation.
Is Flex cheaper than Standard?
In the current pricing table, Flex rows for the latest flagship models match Batch-level prices. Treat this as Confirmed for listed prices, but verify account eligibility and behavior before routing production traffic.
Is Priority worth it?
Only for latency-sensitive paths where speed is worth the premium. It is not a cost-saving tier; the listed Priority prices are above Standard.
How much can prompt caching save?
OpenAI says prompt caching can reduce input token costs by up to 90% and latency by up to 80%. It only helps when prompts share exact prefixes, so prompt structure matters.
Should I use GPT-5.5 for every production call?
No. Use GPT-5.5 for hard reasoning, coding, and professional tasks where quality pays for the premium. For simple or high-volume work, test gpt-5.4-mini or gpt-5.4-nano first.
Sources
- OpenAI Pricing - official Standard, Batch, Flex, Priority, tool, image, video, audio, and regional pricing
- OpenAI Models - official model selection guidance and flagship model positioning
- OpenAI Compare Models - official model pricing, context, endpoints, and TPM tier comparison
- GPT-5.5 Model - official GPT-5.5 model page
- GPT-5.4 Model - official GPT-5.4 model page
- GPT-5.4 Mini Model - official mini model page
- GPT-5.4 Nano Model - official nano model page
- OpenAI Prompt Caching - official caching behavior and savings guidance
- OpenAI Batch API - official 50% cost discount, separate limits, and 24-hour turnaround
- OpenAI Rate Limits - official RPM, RPD, TPM, TPD, IPM, headers, tiers, and retry guidance
Related Articles
- GPT-5.5 Batch vs Flex vs Priority: 50% Off API Math (2026)
- OpenAI API Cheapest Model 2026: GPT-5 Nano Cost Math Table
- Cheapest Frontier LLM API 2026: DeepSeek vs Claude vs GPT Cost
- GPT-5.5 vs Opus 4.7 vs DeepSeek V4 (2026): 50x Price Gap Tested
- AI API Gateway 2026: Routing, Fallbacks, Observability, and Cost Control