TokenMix Research Lab · 2026-06-04

OpenAI API Cost 2026: GPT-5.5, 5.4, Nano, 50% Batch Savings

OpenAI API Cost 2026: GPT-5.5, 5.4, Nano, 50% Batch Savings

Last Updated: 2026-06-04 Author: TokenMix Research Lab Data verified: 2026-06-04 - OpenAI official pricing, model catalog, model comparison table, prompt caching guide, rate-limit guide, Batch API guide, GPT-5.5/5.4 model pages, and tool pricing

OpenAI API cost in June 2026 has one clear rule: use GPT-5.5 only when quality pays for a 2x premium over GPT-5.4; otherwise use GPT-5.4 mini/nano, Batch, Flex, and caching.

OpenAI's current pricing page lists gpt-5.5 at $5.00 input, $0.50 cached input, and $30.00 output per 1M short-context tokens; gpt-5.4 is exactly half at $2.50, $0.25, and $15.00; gpt-5.4-mini is $0.75/$0.075/$4.50; and gpt-5.4-nano is $0.20/$0.02/$1.25 (OpenAI pricing). Batch and Flex pricing are listed at 50% of Standard for the same flagship rows, while Priority is a premium lane: gpt-5.5 Priority is $12.50/$1.25/$75.00 and gpt-5.4 Priority is $5.00/$0.50/$30.00 (OpenAI pricing). OpenAI's model guide says to start with gpt-5.5 for complex reasoning and coding, but choose smaller variants such as gpt-5.4-mini or gpt-5.4-nano when optimizing for latency and cost (OpenAI models). Prompt caching can reduce input cost by up to 90% and latency by up to 80% when prefixes match (Prompt caching); Batch offers 50% lower cost, higher separate limits, and a 24-hour turnaround window for async work (Batch API).

Table of Contents

Quick Verdict

Claim Status Source
gpt-5.5 standard short-context pricing is $5 input, $0.50 cached input, $30 output per 1M tokens Confirmed OpenAI pricing
gpt-5.4 standard short-context pricing is $2.50 input, $0.25 cached input, $15 output per 1M tokens Confirmed OpenAI pricing
gpt-5.4-mini standard pricing is $0.75 input, $0.075 cached input, $4.50 output Confirmed OpenAI pricing
gpt-5.4-nano standard pricing is $0.20 input, $0.02 cached input, $1.25 output Confirmed OpenAI pricing
Batch pricing is 50% lower than synchronous APIs Confirmed Batch API, OpenAI pricing
Flex pricing is listed at the same 50% off level as Batch for the current flagship rows Confirmed OpenAI pricing
Priority is a cost-saving tier False Priority rows are priced above Standard in the current pricing table
Prompt caching can reduce input token costs by up to 90% Confirmed Prompt caching
Prompt caching requires exact prefix matches to help Confirmed Prompt caching
gpt-5.4-nano is the cheapest OpenAI model overall False This article only claims it is the cheapest model in the current GPT-5.4 flagship family table
gpt-5.5 and gpt-5.4 compare table lists Free TPM as unavailable Confirmed Compare models
Regional processing adds a 10% uplift for eligible models released on or after March 5, 2026 Confirmed OpenAI pricing
Long context can make GPT-5.5 and GPT-5.4 materially more expensive than the headline short-context price Confirmed OpenAI pricing
OpenAI may keep pushing more workloads toward Batch/Flex economics Speculation Pricing layout emphasizes Batch/Flex, but no future guarantee is published

Current Price Table

This is the current official flagship pricing surface checked on June 4, 2026. "Nano" in this article means gpt-5.4-nano, not the older gpt-5-nano covered separately in OpenAI API Cheapest Model 2026.

Model Standard input / 1M Cached input / 1M Output / 1M Long-context input / 1M Long-context output / 1M Status
gpt-5.5 $5.00 $0.50 $30.00 $10.00 $45.00 Confirmed
gpt-5.5-pro $30.00 Not listed $180.00 $60.00 $270.00 Confirmed
gpt-5.4 $2.50 $0.25 $15.00 $5.00 $22.50 Confirmed
gpt-5.4-mini $0.75 $0.075 $4.50 Not listed Not listed Confirmed
gpt-5.4-nano $0.20 $0.02 $1.25 Not listed Not listed Confirmed
gpt-5.4-pro $30.00 Not listed $180.00 $60.00 $270.00 Confirmed

The headline: gpt-5.5 is 2x gpt-5.4 at Standard short-context prices. gpt-5.5-pro and gpt-5.4-pro are the expensive precision lanes, not default API choices.

Cost calculation 1: 100M input tokens and 20M output tokens cost $500 + $600 = $1,100 on gpt-5.5. The same workload costs $250 + $300 = $550 on gpt-5.4, $75 + $90 = $165 on gpt-5.4-mini, and $20 + $25 = $45 on gpt-5.4-nano.

Standard Batch Flex Priority

The pricing table now makes the lane decision explicit. Standard is default. Batch is cheap but async. Flex is cheap for eligible lower-priority workloads. Priority is expensive for latency-sensitive work.

Lane gpt-5.5 input/output gpt-5.4 input/output gpt-5.4-mini input/output gpt-5.4-nano input/output Best for Status
Standard $5.00 / $30.00 $2.50 / $15.00 $0.75 / $4.50 $0.20 / $1.25 Normal sync API calls Confirmed
Batch $2.50 / $15.00 $1.25 / $7.50 $0.375 / $2.25 $0.10 / $0.625 Eval, extraction, offline jobs Confirmed
Flex $2.50 / $15.00 $1.25 / $7.50 $0.375 / $2.25 $0.10 / $0.625 Cost-sensitive traffic that can tolerate lower priority Confirmed
Priority $12.50 / $75.00 $5.00 / $30.00 $1.50 / $9.00 Not listed Premium latency lane Confirmed

Cost calculation 2: a 100M input / 20M output gpt-5.4 job costs $550 on Standard. The same token volume costs $275 on Batch or Flex. Priority costs $1,100. If the task can wait, Batch is the cleanest 50% cut. If it cannot wait, test Flex only if your SLA can tolerate the tier behavior.

For the narrow GPT-5.5 tier deep dive, use GPT-5.5 Batch vs Flex vs Priority. This article is the broader cost router across the OpenAI family.

$10 Token Buying Power

This table is the easiest way to feel the price gap. Output tokens are the bill killer.

Model and lane $10 buys input tokens $10 buys cached input tokens $10 buys output tokens Status
gpt-5.5 Standard 2M 20M 0.33M Confirmed math
gpt-5.5 Batch/Flex 4M 40M 0.67M Confirmed math
gpt-5.4 Standard 4M 40M 0.67M Confirmed math
gpt-5.4 Batch/Flex 8M 80M 1.33M Confirmed math
gpt-5.4-mini Standard 13.33M 133.33M 2.22M Confirmed math
gpt-5.4-mini Batch/Flex 26.67M 266.67M 4.44M Confirmed math
gpt-5.4-nano Standard 50M 500M 8M Confirmed math
gpt-5.4-nano Batch/Flex 100M 1,000M 16M Confirmed math

Cost calculation 3: if your agent spends $10 on gpt-5.5 Standard output, it buys about 333K output tokens. The same $10 buys 8M output tokens on gpt-5.4-nano Standard. That is a 24x output-token spread inside the current flagship family.

Monthly Cost Projection

The real question is not "which model is cheapest?" It is "which model clears the task at the lowest monthly failure-adjusted cost?"

Monthly workload Token shape gpt-5.5 Standard gpt-5.4 Standard gpt-5.4-mini Standard gpt-5.4-nano Standard Cheapest listed lane
Small SaaS support bot 10M in / 2M out $110 $55 $16.50 $4.50 Nano
Medium RAG assistant 100M in / 20M out $1,100 $550 $165 $45 Nano
Output-heavy writer 50M in / 50M out $1,750 $875 $262.50 $72.50 Nano
Developer agent runs 2B in / 500M out $25,000 $12,500 $3,750 $1,025 Nano
Long-context audit 20M long in / 2M out $290 on long-context gpt-5.5 $145 on long-context gpt-5.4 Not listed Not listed GPT-5.4 long
Offline eval batch 100M in / 20M out $550 on Batch $275 on Batch $82.50 on Batch $22.50 on Batch Nano Batch

Do not read this as "always use nano." Read it as "do the quality test before paying the flagship premium." If nano fails, move up. If mini fails, move up again. If 5.4 fails and 5.5 saves engineering time or user churn, the premium can be rational.

The cross-provider version of this math is in Cheapest Frontier LLM API 2026, where OpenAI has to compete against Claude, DeepSeek, Gemini, and Groq on cost-per-task rather than raw model branding.

Cost Per Task

Task Assumed tokens gpt-5.5 gpt-5.4 gpt-5.4-mini gpt-5.4-nano Practical pick
Classify support ticket 1K in / 100 out $0.0080 $0.0040 $0.0012 $0.000325 Nano first
Extract invoice fields 4K in / 300 out $0.0290 $0.0145 $0.00435 $0.001175 Nano or mini
Summarize 50K-token doc 50K in / 1K out $0.2800 $0.1400 $0.0420 $0.01125 Mini if nano weak
Generate long answer 3K in / 2K out $0.0750 $0.0375 $0.01125 $0.00310 Mini or 5.4
Agent planning step 20K in / 4K out $0.2200 $0.1100 $0.0330 $0.00900 5.4 or 5.5 after eval
Long-context legal scan 300K in / 5K out $3.2250 long gpt-5.5 $1.6125 long gpt-5.4 Not listed Not listed 5.4 long unless quality fails

Cost calculation 4: one million support-ticket classifications at 1K input and 100 output each cost about $8,000 on gpt-5.5, $4,000 on gpt-5.4, $1,200 on gpt-5.4-mini, and $325 on gpt-5.4-nano. A routing mistake here is not academic. It is a monthly budget line.

Here is the tiny calculator behind the tables:

def openai_cost(input_tokens, output_tokens, input_per_m, output_per_m, cached_input_tokens=0, cached_per_m=None):
    uncached_input = max(input_tokens - cached_input_tokens, 0)
    cached_rate = input_per_m if cached_per_m is None else cached_per_m
    return (
        uncached_input / 1_000_000 * input_per_m
        + cached_input_tokens / 1_000_000 * cached_rate
        + output_tokens / 1_000_000 * output_per_m
    )

print(openai_cost(100_000_000, 20_000_000, 2.50, 15.00, 50_000_000, 0.25))

The output is $437.50 for gpt-5.4 when half of the input is cached. Without caching, the same 100M/20M workload is $550.

Prompt Caching Math

OpenAI says prompt caching works automatically on recent models, but it only helps when the prefix matches. Put static instructions, schemas, tool definitions, examples, and shared context first. Put user-specific data last.

Workload Model Baseline cost With 50% cached input With 90% cached input What changed Status
100M in / 20M out gpt-5.5 $1,100 $875 $695 Cached input drops from $5 to $0.50 / 1M Confirmed math
100M in / 20M out gpt-5.4 $550 $437.50 $347.50 Cached input drops from $2.50 to $0.25 / 1M Confirmed math
100M in / 20M out gpt-5.4-mini $165 $131.25 $104.25 Cached input drops from $0.75 to $0.075 / 1M Confirmed math
100M in / 20M out gpt-5.4-nano $45 $36 $28.80 Cached input drops from $0.20 to $0.02 / 1M Confirmed math

Caching does not fix output-heavy costs. If the model generates long text, the output column still dominates. If your prompt is mostly dynamic, cache hits may be low. OpenAI states exact prefix matches are required, so a timestamp at the top of the prompt can sabotage the discount.

Tool and Hidden Cost Factors

The model token price is not the whole invoice once tools enter the request.

Cost factor Official price or rule Real impact Status
Web search all models $10 / 1K calls plus search content tokens at model rates Adds $0.01 per search before token costs Confirmed
Web search preview, reasoning models $10 / 1K calls plus search content tokens at model rates Same base call fee for reasoning preview path Confirmed
Web search preview, non-reasoning models $25 / 1K calls, search content tokens free Higher call fee, different token handling Confirmed
Containers / Code Interpreter 1 GB $0.03 per 20-minute session container Cheap per session, expensive if leaked across users Confirmed
Containers / Code Interpreter 64 GB $1.92 per 20-minute session container Heavy compute lane, not a token-only bill Confirmed
File search storage $0.10 / GB per day, 1 GB free Long-lived indexes become daily recurring cost Confirmed
File search tool call $2.50 / 1K calls Retrieval is not just storage Confirmed
Realtime audio gpt-realtime-2 $32 audio input, $0.40 cached, $64 audio output per 1M audio tokens Voice agents can dwarf text-only costs Confirmed
Regional processing 10% uplift for eligible post-March 5, 2026 models Compliance routing can change unit economics Confirmed

If your stack routes across vendors, this is where a gateway pays for itself. AI API Gateway 2026 covers fallback, budget caps, observability, and model routing. TokenMix vs OpenRouter vs Portkey vs LiteLLM covers gateway cost tradeoffs.

Rate Limits and Usage Limits

Do not confuse price with capacity. A workload can be affordable and still fail rate limits.

Limit concept What OpenAI says Cost implication Status
RPM Requests per minute High call count can throttle even with small prompts Confirmed
RPD Requests per day Daily request ceilings can block batch-like sync traffic Confirmed
TPM Tokens per minute Long prompts and long outputs hit capacity before request count Confirmed
TPD Tokens per day Large daily volume needs tier planning Confirmed
IPM Images per minute Image-heavy apps need separate capacity math Confirmed
Organization/project scope Limits apply at org and project level, not user level One noisy project can affect shared quota Confirmed
Shared limits Some model families share limits Fallback inside the same family may not add capacity Confirmed
Unsuccessful requests Failed retries still contribute to per-minute limit Retry storms waste capacity Confirmed
Batch queue limit Pending batch tokens count against queue limit until completion Async jobs still need queue planning Confirmed

OpenAI's model compare page lists gpt-5.5 and gpt-5.4 Tier 1 TPM at 500K and Tier 5 TPM at 40M, with Free TPM shown as unavailable for those models (Compare models). Treat your account dashboard as the source of truth before a launch.

Optimization Playbook

Lever Typical saving Effort Use when Caveat Status
Move from gpt-5.5 to gpt-5.4 50% on Standard short-context text Low Quality delta is small Need eval on hard tasks Confirmed math
Move from gpt-5.4 to gpt-5.4-mini 70% on input, 70% on output Medium Task is well-defined Reasoning may degrade Confirmed math
Move from mini to nano 73% input, 72% output Medium Classification, routing, extraction Requires guardrails Confirmed math
Batch async work 50% Medium Eval, offline extraction, bulk generation 24-hour turnaround window Confirmed
Flex eligible traffic 50% listed in pricing table Medium Lower-priority cost-sensitive calls Behavior details depend on tier docs/account Likely
Prompt caching Up to 90% input cost reduction Low to medium Static prefix repeats Exact prefix match required Confirmed
Shorten outputs Linear output savings Low Model over-generates May hurt completeness Confirmed math
Route by task 50-95% possible High Mixed workload Requires eval and observability Likely
Avoid unnecessary tools $2.50-$25 per 1K calls avoided Medium Search/retrieval/tool calls are overused May reduce capability Confirmed math
Regional processing only where needed Avoid 10% uplift Medium Compliance allows global processing Compliance may require uplift Confirmed

Use Case Matrix

Use case Start with Escalate to Avoid Why
Ticket classification gpt-5.4-nano gpt-5.4-mini gpt-5.5 by default Cheap output and simple schema
Field extraction gpt-5.4-nano gpt-5.4-mini Pro models Validate against gold set
Customer support answer gpt-5.4-mini gpt-5.4 Nano without eval Quality and tone matter
Coding assistant gpt-5.4 gpt-5.5 Nano for planning OpenAI positions 5.5 for complex coding
Long-horizon agent gpt-5.4 with routing gpt-5.5 for hard steps One model for every step Mixed tasks need routing
Offline eval Batch gpt-5.4-mini or gpt-5.4 Batch gpt-5.5 Standard sync Batch halves eligible cost
Realtime voice Realtime model family Text model plus TTS/STT only if latency allows Token-only forecast Audio has separate pricing
Compliance-region app Eligible regional endpoint Standard global if policy allows Ignoring uplift Regional processing can add 10%

Risks and Caveats

Risk What goes wrong Mitigation Status
Using old price tables You publish GPT-5.2 or old nano numbers after docs changed Re-check official pricing before every PUT/POST Confirmed
Treating nano as universally best Quality misses cause retries, escalations, or support cost Run eval before routing production traffic Likely
Ignoring output tokens Long answers dominate invoice Cap output and summarize in stages Confirmed math
Misusing Priority Premium lane used for batchable jobs Reserve Priority for latency-sensitive user paths Confirmed
Batch where sync is needed User waits or product breaks Use Batch only for async jobs Confirmed
Cache miss assumptions Prompt changes prevent savings Static prefix first, dynamic content last Confirmed
Long-context surprise Large prompts trigger long-context prices Split, retrieve, or summarize before sending Confirmed
Tool fee blind spot Search, file search, or containers create extra bill lines Track tool call counts separately Confirmed
Rate-limit retry storm Failed retries count against per-minute limit Add jitter, budget caps, and circuit breakers Confirmed
Future price changes Current cost plan becomes stale Re-verify pricing before large commitments Speculation

Final Recommendation

Use gpt-5.5 only for the hardest coding, reasoning, and professional-work paths. Default cost-sensitive production traffic to gpt-5.4, gpt-5.4-mini, or gpt-5.4-nano, then add Batch, Flex, caching, output caps, and routing.

FAQ

How much does GPT-5.5 API cost in 2026?

gpt-5.5 standard short-context pricing is $5 input, $0.50 cached input, and $30 output per 1M tokens. Long-context pricing is higher at $10 input, $1 cached input, and $45 output.

Is GPT-5.4 cheaper than GPT-5.5?

Yes. The current official table lists gpt-5.4 at exactly half of gpt-5.5 standard short-context prices: $2.50 input and $15 output versus $5 and $30.

What is the cheapest GPT-5.4 family model?

gpt-5.4-nano is the cheapest model in the current GPT-5.4 family table. It costs $0.20 input, $0.02 cached input, and $1.25 output per 1M tokens.

Does Batch API really save 50%?

Yes. OpenAI says Batch API has 50% lower cost than synchronous APIs and a 24-hour turnaround window. Use it for async jobs such as evals, classification, embeddings, and bulk generation.

Is Flex cheaper than Standard?

In the current pricing table, Flex rows for the latest flagship models match Batch-level prices. Treat this as Confirmed for listed prices, but verify account eligibility and behavior before routing production traffic.

Is Priority worth it?

Only for latency-sensitive paths where speed is worth the premium. It is not a cost-saving tier; the listed Priority prices are above Standard.

How much can prompt caching save?

OpenAI says prompt caching can reduce input token costs by up to 90% and latency by up to 80%. It only helps when prompts share exact prefixes, so prompt structure matters.

Should I use GPT-5.5 for every production call?

No. Use GPT-5.5 for hard reasoning, coding, and professional tasks where quality pays for the premium. For simple or high-volume work, test gpt-5.4-mini or gpt-5.4-nano first.

Sources

Related Articles