TokenMix Research Lab · 2026-05-21

GPT-5.5 Batch vs Flex vs Priority: 50% Off API Math (2026)

Last Updated: 2026-05-21 Data Checked: 2026-05-21 Author: TokenMix Research Lab

GPT-5.5 ships with five distinct pricing tiers, and most teams are paying Standard rates for workloads that would run at 50% off on Batch or Flex. The fastest cost win available right now is not a model migration — it's a tier migration on the same model.

Per OpenAI's GPT-5.5 launch post, GPT-5.5 Standard API pricing is $5 input / $30 output per million tokens. Batch API and Flex both cut that to $2.50 / $15 — the same per-token rate as the older GPT-5.4 Standard tier. Priority adds a 2.5× premium on top of Standard. Codex Fast charges 2.5× the standard cost for 1.5× generation speed. This piece breaks down exactly what each tier guarantees, when to switch, and the per-task cost math at 4 realistic workload sizes. All numbers cross-referenced against OpenAI's official pricing documentation and our own GPT-5.5 launch coverage.

Quick Verdict: All 5 GPT-5.5 Tiers Side-by-Side
What Each Tier Actually Guarantees (SLA + Latency)
Standard vs Batch: Where the 50% Off Comes From
Flex Tier: The Middle Ground for Latency-Tolerant Production
Priority Tier: When 2.5× Cost Actually Pays Off
Cost Per Task: 4 Real Workloads Calculated
Tier Selection Decision Matrix
GPT-5.5 Batch vs Claude / Gemini Batch Pricing
Migration: Standard → Batch in 30 Minutes
FAQ
Related Articles
Sources

Quick Verdict: All 5 GPT-5.5 Tiers Side-by-Side

GPT-5.5 Standard at $5/$30 is the API list price. Batch and Flex cut that 50% to $2.50/$15. Priority adds 2.5× for SLA-grade latency. Codex Fast charges 2.5× for 1.5× generation speed.

Tier	Input ($/MTok)	Output ($/MTok)	Multiplier vs Standard	Typical Latency	SLA
Batch API	$2.50	$15.00	0.5×	≤24h (async)	Within batch window
Flex	$2.50	$15.00	0.5×	Best-effort, may queue	None
Standard	$5.00	$30.00	1×	<2s first token	None published
Priority	$12.50	$75.00	2.5×	Guaranteed low-latency	Throughput floor
Codex Fast	$12.50	$75.00	2.5× cost, 1.5× speed	Fastest synchronous	Codex IDE only

The 50% Batch/Flex math: $5 × 0.5 = $2.50 input, $30 × 0.5 = $15 output. GPT-5.5 Batch is priced exactly the same as the previous GPT-5.4 Standard tier — meaning you can run the new flagship at the old flagship's price if you can tolerate async completion.

What Each Tier Actually Guarantees (SLA + Latency)

Tier choice is a latency-versus-cost trade-off, not a quality trade-off. The model weights are identical across Standard, Batch, Flex, and Priority — only the serving layer differs.

Batch API

Async only: submit a job, get results within 24 hours
Submit via Batch API endpoint, not real-time Responses or Chat Completions
No partial results — you wait for the whole batch
Best for: overnight processing, weekly evaluations, large-scale embeddings, offline classification, bulk dataset annotation

Flex

Real-time-ish: requests are served via the same endpoint as Standard, but lower priority in the queue
May see queueing delays when GPU capacity is constrained
No SLA on first-token latency — typically still <5s, but can spike
Best for: production workloads where occasional 5-30s delays are acceptable (background agents, low-traffic chatbots, content generation pipelines)

Standard

Default tier: low-latency synchronous, typically <2s first token
No published uptime or latency SLA but practical reliability is high
Best for: user-facing chat, real-time copilots, default for any production traffic

Priority

Guaranteed throughput floor + lower latency variance
2.5× the Standard cost
Best for: latency-critical paths (voice assistants, financial workflows, anything where p99 latency matters more than per-token cost)

Codex Fast

2.5× the Standard cost for 1.5× generation speed
Codex IDE workflow only, not exposed in public API
Best for: developers willing to pay for faster autocomplete inside Codex

The five tiers cover the cost-versus-latency Pareto frontier OpenAI is willing to expose. There is no "even cheaper than Batch" option — the floor is $2.50/$15 per MTok for GPT-5.5.

Standard vs Batch: Where the 50% Off Comes From

The Batch API has been around since GPT-4 era; OpenAI extended the same 50% discount to GPT-5.5 at launch. The mechanism is straightforward — async serving lets OpenAI pack jobs onto otherwise-idle GPU capacity (off-peak hours, between bursts of real-time traffic). They share that efficiency back as a discount.

Practical example — 1 million input tokens + 200,000 output tokens per day for a content moderation pipeline:

Tier	Daily Cost	Monthly Cost (30 days)	Annual
Standard	1 × $5 + 0.2 × $30 = $11.00	$330	$3,960
Batch	1 × $2.50 + 0.2 × $15 = $5.50	$165	$1,980
Savings	$5.50	$165	$1,980/year

The catch: Batch jobs return within 24 hours, not instantly. If your moderation needs to flag posts before they go live, Batch is wrong. If you batch-process the day's content overnight to generate reports, Batch is exactly right.

For workloads that don't need real-time responses, defaulting to Standard is just throwing 50% of the API budget away. The migration is a different endpoint call, not a different model.

Flex Tier: The Middle Ground for Latency-Tolerant Production

Flex is OpenAI's newer middle-tier offering, sitting between Batch (async) and Standard (real-time). The per-token cost matches Batch — $2.50 / $15 per MTok — but the request is served synchronously through the same endpoint as Standard, just at lower queue priority.

When Flex wins over Batch:

You need a response within a single conversation turn (5-30 seconds), not 24 hours
Your traffic is bursty and async pipelines add operational complexity
You're already on Standard but most requests are background tasks

When Flex falls short:

p99 latency matters: Flex requests can queue indefinitely during peak demand
User-facing real-time chat: occasional 30s delays feel broken
Hard SLA requirements: Flex has none

Real numbers: A background research agent running ~500 GPT-5.5 calls per day at 8K input + 2K output per call:

Tier	Per-call cost	Daily	Monthly
Standard	(0.008 × $5) + (0.002 × $30) = $0.10	$50	$1,500
Flex	(0.008 × $2.50) + (0.002 × $15) = $0.05	$25	$750

Same model, same prompts, same output quality. Difference: requests can queue during peak hours. For a background agent that runs every 15 minutes and isn't blocking a user, this is the highest-leverage cost cut available.

Priority Tier: When 2.5× Cost Actually Pays Off

Priority is where the math flips — you pay 2.5× more per token in exchange for latency guarantees and throughput stability. At $12.50 input / $75 output per MTok, this is the most expensive way to serve GPT-5.5.

Justification check — Priority makes sense only if:

p99 latency (not average) is critical (voice assistants, real-time translation, financial trade copilots)
You've measured Standard tier latency variance and it's hurting user experience
The cost differential is small compared to user-impact cost (e.g., enterprise contracts with latency SLAs to end customers)

Justification check — Priority is wrong if:

You're paying it "just in case" without measuring Standard variance first
Your traffic volume is low (variance is unlikely to bite at <1000 req/day)
You're using Priority because it's the most expensive, not because you measured the need

Real math for a voice assistant doing 5M input + 1M output per month:

Tier	Monthly
Batch	(5 × $2.50) + (1 × $15) = $27.50 ← async impossible for voice
Flex	(5 × $2.50) + (1 × $15) = $27.50 ← latency variance kills UX
Standard	(5 × $5) + (1 × $30) = $55
Priority	(5 × $12.50) + (1 × $75) = $137.50

If Priority's latency guarantee prevents even one production incident per quarter that would cost more than $82.50 × 3 = $247.50, it pays for itself. Otherwise, Standard with a fallback strategy is more efficient.

Cost Per Task: 4 Real Workloads Calculated

The per-MTok rate is meaningless without task-level context. Below are 4 realistic GPT-5.5 workloads at Standard, Batch, Flex, and Priority pricing.

Workload 1: Customer Support Triage (real-time chat)

Inputs: 4K input + 600 output per conversation, 10,000 conversations/month.

Tier	Cost/conversation	Monthly	Notes
Standard	$0.038	$380	Recommended default
Flex	$0.019	$190	Risky — p99 latency matters
Batch	—	—	Async impossible for chat
Priority	$0.095	$950	Only if enterprise SLA required

Workload 2: Document Summarization (overnight batch)

Inputs: 80K input + 4K output per document, 5,000 docs/month, runs nightly.

Tier	Cost/doc	Monthly	Notes
Standard	$0.520	$2,600	Wasteful — Batch is identical quality
Batch	$0.260	$1,300	Correct default
Flex	$0.260	$1,300	Works but no reason to skip Batch
Priority	$1.300	$6,500	Wrong tier for async work

Workload 3: Code Generation (Codex IDE)

Inputs: 12K input + 3K output per generation, 50,000 generations/month.

Tier	Cost/gen	Monthly	Notes
Standard	$0.150	$7,500	Default
Codex Fast	$0.375	$18,750	1.5× speed at 2.5× cost — measure user willingness
Batch	—	—	Codex needs sync, can't batch
Priority	$0.375	$18,750	Codex teams typically pick Codex Fast over Priority

Workload 4: Long-Context RAG (research agent)

Inputs: 800K input + 8K output per query, 1,000 queries/month.

Tier	Cost/query	Monthly	Notes
Standard	$4.24	$4,240	Default for production research
Flex	$2.12	$2,120	If 30s queue acceptable, big savings
Batch	$2.12	$2,120	If async OK, even simpler
Priority	$10.60	$10,600	Wrong tier unless real-time-critical

Pattern: For 3 of 4 workloads, Batch or Flex saves 50% with no quality difference. Only real-time chat and Codex IDE need Standard or above.

Tier Selection Decision Matrix

Your situation	Recommended Tier	Why
Need response within a conversation turn	Standard	Default low-latency, no surprises
Can wait up to 24h, doing bulk processing	Batch	50% off, no UX risk
Production traffic, latency-tolerant (5-30s OK)	Flex	50% off, sync endpoint
Voice / real-time / financial / p99-critical	Priority	2.5× cost buys latency floor
Codex IDE workflow, willing to pay for speed	Codex Fast	Codex-only, not general API
Mixed workload (chat + batch jobs)	Hybrid	Standard for chat, Batch for offline jobs
Unsure	Start Standard	Measure variance first, then optimize down to Flex/Batch

The default recommendation: start every new GPT-5.5 integration on Standard, measure usage patterns for 2 weeks, then migrate latency-tolerant traffic to Batch/Flex. Most teams find 40-60% of their volume qualifies for the 50% discount.

GPT-5.5 Batch vs Claude / Gemini Batch Pricing

OpenAI is not the only vendor offering tier-based discounts. Here's how GPT-5.5 Batch stacks up against Claude Opus 4.7 and Gemini 3.5 Flash on equivalent low-priority pricing.

Model	Standard ($/MTok in/out)	Batch / Equivalent ($/MTok in/out)	Discount
GPT-5.5	$5 / $30	$2.50 / $15	50%
Claude Opus 4.7	$5 / $25	$2.50 / $12.50 (per Anthropic Batch docs)	50%
Gemini 3.5 Flash	$1.50 / $9	$0.75 / $4.50 (per Google AI pricing)	50%
Gemini 3.5 Flash Flex	—	$0.75 / $4.50	50%
Gemini 3.1 Pro Preview (≤200K)	$2 / $12	$1 / $6	50%

Observations:

All three major vendors offer 50% Batch discounts — this is industry standard, not OpenAI-specific
Gemini 3.5 Flash Batch at $0.75 / $4.50 is the cheapest frontier-tier option by a wide margin (per our Gemini 3.5 Flash launch coverage)
GPT-5.5 Batch matches Claude Opus 4.7 Batch on input ($2.50) and beats it on output ($15 vs $12.50 — wait, Claude is actually cheaper on output)
Correction: Claude Opus 4.7 Batch output is $12.50 vs GPT-5.5 Batch $15 — Claude is 17% cheaper on output at the Batch tier

For pure cost-per-token at the Batch tier, the ranking is Gemini 3.5 Flash > Claude Opus 4.7 > GPT-5.5. Quality differences across these three on agentic and coding tasks favor GPT-5.5, but for translation, classification, summarization, and embedding-replacement tasks, Gemini 3.5 Flash Batch is hard to beat on per-dollar performance.

Migration: Standard → Batch in 30 Minutes

Switching GPT-5.5 traffic from Standard to Batch is a code change, not a model change. The model ID stays gpt-5.5 — only the endpoint and request shape differ.

Standard call (synchronous):

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-5.5",
    input="Summarize this document...",
    max_output_tokens=500
)
print(response.output[0].content[0].text)

Batch call (async):

# 1. Prepare a JSONL file of requests
import json
with open("batch_requests.jsonl", "w") as f:
    for i, doc in enumerate(documents):
        f.write(json.dumps({
            "custom_id": f"req-{i}",
            "method": "POST",
            "url": "/v1/responses",
            "body": {
                "model": "gpt-5.5",
                "input": f"Summarize: {doc}",
                "max_output_tokens": 500
            }
        }) + "\n")

# 2. Upload and submit batch
batch_file = client.files.create(file=open("batch_requests.jsonl", "rb"), purpose="batch")
batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/responses",
    completion_window="24h"
)

# 3. Poll for completion (typically <2h for most batches)
import time
while True:
    batch = client.batches.retrieve(batch.id)
    if batch.status == "completed":
        results = client.files.content(batch.output_file_id).text
        break
    time.sleep(60)

Migration checklist:

Identify which call paths in your code are latency-tolerant (don't block a user response)
Refactor those paths to enqueue requests rather than call synchronously
Add a polling worker (or webhook handler if available) to consume Batch outputs
Run side-by-side for 1 week — measure cost-per-completed-job and any quality delta
Cut over fully once cost validation completes

For mixed workloads where part of traffic is real-time chat and part is offline reports, run both Standard and Batch in parallel. Many production stacks already follow this pattern with Celery / RQ / Cloud Tasks queues — Batch API maps cleanly onto the same architecture.

For teams routing GPT-5.5 alongside Claude and Gemini through a single endpoint, the TokenMix.ai unified API exposes the same Batch / Flex tier semantics across providers, so you can pick the cheapest qualified tier per workload without writing provider-specific code paths.

FAQ

Is GPT-5.5 Batch the same quality as GPT-5.5 Standard?

Yes. Same model weights, same outputs. The only difference is serving infrastructure — Batch runs jobs asynchronously, often during off-peak GPU windows. No quality degradation has been documented in OpenAI's launch post or independent testing.

What's the difference between Batch and Flex?

Batch is async (submit, wait up to 24h, receive results). Flex is synchronous (same endpoint as Standard) but lower priority — can queue during peak demand. Both cost the same: $2.50 / $15 per MTok. Use Batch for bulk offline work, Flex for production workloads that tolerate occasional latency spikes.

Can I use Batch for real-time chat?

No. Batch jobs return within 24 hours, not within a conversation turn. For chat, use Standard, or Flex if you can tolerate 5-30 second queue delays.

Is Priority tier worth 2.5× Standard cost?

Only if p99 latency matters more than per-token cost. Voice assistants, financial workflows, and applications with end-user SLAs are typical Priority candidates. For most chat and content generation, Standard latency is sufficient.

How does GPT-5.5 Batch compare to Claude Opus 4.7 Batch?

GPT-5.5 Batch: $2.50 input / $15 output. Claude Opus 4.7 Batch: $2.50 / $12.50. Claude is 17% cheaper on output. Choose based on task fit — GPT-5.5 leads on Terminal-Bench, Opus 4.7 leads on SWE-Bench Pro.

Can I mix tiers in one application?

Yes, and you should. Route real-time user requests to Standard, route background tasks to Batch, route latency-critical paths to Priority. Most production stacks find 40-60% of total volume can move to Batch or Flex with no UX impact.

Does Codex Fast count as a fifth tier?

Codex Fast is a Codex-IDE-only pricing variant — 2.5× cost for 1.5× generation speed. It's not exposed in the public Responses or Chat Completions API, so most developers won't encounter it directly.

What happens if a Batch job fails partway through?

Failed individual requests within a batch are returned with error codes in the output file; the batch as a whole completes when the 24h window closes. You re-submit only the failed requests, not the whole batch. OpenAI does not charge for failed requests in a completed batch.

Sources

OpenAI GPT-5.5 launch post — Standard / Batch / Flex / Priority pricing structure
OpenAI GPT-5.5 Instant announcement — free-tier rollout
Anthropic API pricing — Claude Opus 4.7 Batch comparison
Google AI Gemini API pricing — Gemini 3.5 Flash Batch comparison
TokenMix.ai pricing observability (internal cross-vendor pricing tracker)

By TokenMix Research Lab · Published 2026-05-21 · Last Updated 2026-05-21 · Data Checked 2026-05-21