TokenMix Research Lab · 2026-05-21

GPT-5.5 Batch vs Flex vs Priority: 50% Off API Math (2026)
Last Updated: 2026-05-21 Data Checked: 2026-05-21 Author: TokenMix Research Lab
GPT-5.5 ships with five distinct pricing tiers, and most teams are paying Standard rates for workloads that would run at 50% off on Batch or Flex. The fastest cost win available right now is not a model migration — it's a tier migration on the same model.
Per OpenAI's GPT-5.5 launch post, GPT-5.5 Standard API pricing is $5 input / $30 output per million tokens. Batch API and Flex both cut that to $2.50 / $15 — the same per-token rate as the older GPT-5.4 Standard tier. Priority adds a 2.5× premium on top of Standard. Codex Fast charges 2.5× the standard cost for 1.5× generation speed. This piece breaks down exactly what each tier guarantees, when to switch, and the per-task cost math at 4 realistic workload sizes. All numbers cross-referenced against OpenAI's official pricing documentation and our own GPT-5.5 launch coverage.
Table of Contents
- Quick Verdict: All 5 GPT-5.5 Tiers Side-by-Side
- What Each Tier Actually Guarantees (SLA + Latency)
- Standard vs Batch: Where the 50% Off Comes From
- Flex Tier: The Middle Ground for Latency-Tolerant Production
- Priority Tier: When 2.5× Cost Actually Pays Off
- Cost Per Task: 4 Real Workloads Calculated
- Tier Selection Decision Matrix
- GPT-5.5 Batch vs Claude / Gemini Batch Pricing
- Migration: Standard → Batch in 30 Minutes
- FAQ
- Related Articles
- Sources
Quick Verdict: All 5 GPT-5.5 Tiers Side-by-Side
GPT-5.5 Standard at $5/$30 is the API list price. Batch and Flex cut that 50% to $2.50/$15. Priority adds 2.5× for SLA-grade latency. Codex Fast charges 2.5× for 1.5× generation speed.
| Tier | Input ($/MTok) | Output ($/MTok) | Multiplier vs Standard | Typical Latency | SLA |
|---|---|---|---|---|---|
| Batch API | $2.50 | $15.00 | 0.5× | ≤24h (async) | Within batch window |
| Flex | $2.50 | $15.00 | 0.5× | Best-effort, may queue | None |
| Standard | $5.00 | $30.00 | 1× | <2s first token | None published |
| Priority | $12.50 | $75.00 | 2.5× | Guaranteed low-latency | Throughput floor |
| Codex Fast | $12.50 | $75.00 | 2.5× cost, 1.5× speed | Fastest synchronous | Codex IDE only |
The 50% Batch/Flex math: $5 × 0.5 = $2.50 input, $30 × 0.5 = $15 output. GPT-5.5 Batch is priced exactly the same as the previous GPT-5.4 Standard tier — meaning you can run the new flagship at the old flagship's price if you can tolerate async completion.
What Each Tier Actually Guarantees (SLA + Latency)
Tier choice is a latency-versus-cost trade-off, not a quality trade-off. The model weights are identical across Standard, Batch, Flex, and Priority — only the serving layer differs.
Batch API
- Async only: submit a job, get results within 24 hours
- Submit via Batch API endpoint, not real-time Responses or Chat Completions
- No partial results — you wait for the whole batch
- Best for: overnight processing, weekly evaluations, large-scale embeddings, offline classification, bulk dataset annotation
Flex
- Real-time-ish: requests are served via the same endpoint as Standard, but lower priority in the queue
- May see queueing delays when GPU capacity is constrained
- No SLA on first-token latency — typically still <5s, but can spike
- Best for: production workloads where occasional 5-30s delays are acceptable (background agents, low-traffic chatbots, content generation pipelines)
Standard
- Default tier: low-latency synchronous, typically <2s first token
- No published uptime or latency SLA but practical reliability is high
- Best for: user-facing chat, real-time copilots, default for any production traffic
Priority
- Guaranteed throughput floor + lower latency variance
- 2.5× the Standard cost
- Best for: latency-critical paths (voice assistants, financial workflows, anything where p99 latency matters more than per-token cost)
Codex Fast
- 2.5× the Standard cost for 1.5× generation speed
- Codex IDE workflow only, not exposed in public API
- Best for: developers willing to pay for faster autocomplete inside Codex
The five tiers cover the cost-versus-latency Pareto frontier OpenAI is willing to expose. There is no "even cheaper than Batch" option — the floor is $2.50/$15 per MTok for GPT-5.5.
Standard vs Batch: Where the 50% Off Comes From
The Batch API has been around since GPT-4 era; OpenAI extended the same 50% discount to GPT-5.5 at launch. The mechanism is straightforward — async serving lets OpenAI pack jobs onto otherwise-idle GPU capacity (off-peak hours, between bursts of real-time traffic). They share that efficiency back as a discount.
Practical example — 1 million input tokens + 200,000 output tokens per day for a content moderation pipeline:
| Tier | Daily Cost | Monthly Cost (30 days) | Annual |
|---|---|---|---|
| Standard | 1 × $5 + 0.2 × $30 = $11.00 | $330 | $3,960 |
| Batch | 1 × $2.50 + 0.2 × $15 = $5.50 | $165 | $1,980 |
| Savings | $5.50 | $165 | $1,980/year |
The catch: Batch jobs return within 24 hours, not instantly. If your moderation needs to flag posts before they go live, Batch is wrong. If you batch-process the day's content overnight to generate reports, Batch is exactly right.
For workloads that don't need real-time responses, defaulting to Standard is just throwing 50% of the API budget away. The migration is a different endpoint call, not a different model.
Flex Tier: The Middle Ground for Latency-Tolerant Production
Flex is OpenAI's newer middle-tier offering, sitting between Batch (async) and Standard (real-time). The per-token cost matches Batch — $2.50 / $15 per MTok — but the request is served synchronously through the same endpoint as Standard, just at lower queue priority.
When Flex wins over Batch:
- You need a response within a single conversation turn (5-30 seconds), not 24 hours
- Your traffic is bursty and async pipelines add operational complexity
- You're already on Standard but most requests are background tasks
When Flex falls short:
- p99 latency matters: Flex requests can queue indefinitely during peak demand
- User-facing real-time chat: occasional 30s delays feel broken
- Hard SLA requirements: Flex has none
Real numbers: A background research agent running ~500 GPT-5.5 calls per day at 8K input + 2K output per call:
| Tier | Per-call cost | Daily | Monthly |
|---|---|---|---|
| Standard | (0.008 × $5) + (0.002 × $30) = $0.10 | $50 | $1,500 |
| Flex | (0.008 × $2.50) + (0.002 × $15) = $0.05 | $25 | $750 |
Same model, same prompts, same output quality. Difference: requests can queue during peak hours. For a background agent that runs every 15 minutes and isn't blocking a user, this is the highest-leverage cost cut available.
Priority Tier: When 2.5× Cost Actually Pays Off
Priority is where the math flips — you pay 2.5× more per token in exchange for latency guarantees and throughput stability. At $12.50 input / $75 output per MTok, this is the most expensive way to serve GPT-5.5.
Justification check — Priority makes sense only if:
- p99 latency (not average) is critical (voice assistants, real-time translation, financial trade copilots)
- You've measured Standard tier latency variance and it's hurting user experience
- The cost differential is small compared to user-impact cost (e.g., enterprise contracts with latency SLAs to end customers)
Justification check — Priority is wrong if:
- You're paying it "just in case" without measuring Standard variance first
- Your traffic volume is low (variance is unlikely to bite at <1000 req/day)
- You're using Priority because it's the most expensive, not because you measured the need
Real math for a voice assistant doing 5M input + 1M output per month:
| Tier | Monthly |
|---|---|
| Batch | (5 × $2.50) + (1 × $15) = $27.50 ← async impossible for voice |
| Flex | (5 × $2.50) + (1 × $15) = $27.50 ← latency variance kills UX |
| Standard | (5 × $5) + (1 × $30) = $55 |
| Priority | (5 × $12.50) + (1 × $75) = $137.50 |
If Priority's latency guarantee prevents even one production incident per quarter that would cost more than $82.50 × 3 = $247.50, it pays for itself. Otherwise, Standard with a fallback strategy is more efficient.
Cost Per Task: 4 Real Workloads Calculated
The per-MTok rate is meaningless without task-level context. Below are 4 realistic GPT-5.5 workloads at Standard, Batch, Flex, and Priority pricing.
Workload 1: Customer Support Triage (real-time chat)
Inputs: 4K input + 600 output per conversation, 10,000 conversations/month.
| Tier | Cost/conversation | Monthly | Notes |
|---|---|---|---|
| Standard | $0.038 | $380 | Recommended default |
| Flex | $0.019 | $190 | Risky — p99 latency matters |
| Batch | — | — | Async impossible for chat |
| Priority | $0.095 | $950 | Only if enterprise SLA required |
Workload 2: Document Summarization (overnight batch)
Inputs: 80K input + 4K output per document, 5,000 docs/month, runs nightly.
| Tier | Cost/doc | Monthly | Notes |
|---|---|---|---|
| Standard | $0.520 | $2,600 | Wasteful — Batch is identical quality |
| Batch | $0.260 | $1,300 | Correct default |
| Flex | $0.260 | $1,300 | Works but no reason to skip Batch |
| Priority | $1.300 | $6,500 | Wrong tier for async work |
Workload 3: Code Generation (Codex IDE)
Inputs: 12K input + 3K output per generation, 50,000 generations/month.
| Tier | Cost/gen | Monthly | Notes |
|---|---|---|---|
| Standard | $0.150 | $7,500 | Default |
| Codex Fast | $0.375 | $18,750 | 1.5× speed at 2.5× cost — measure user willingness |
| Batch | — | — | Codex needs sync, can't batch |
| Priority | $0.375 | $18,750 | Codex teams typically pick Codex Fast over Priority |
Workload 4: Long-Context RAG (research agent)
Inputs: 800K input + 8K output per query, 1,000 queries/month.
| Tier | Cost/query | Monthly | Notes |
|---|---|---|---|
| Standard | $4.24 | $4,240 | Default for production research |
| Flex | $2.12 | $2,120 | If 30s queue acceptable, big savings |
| Batch | $2.12 | $2,120 | If async OK, even simpler |
| Priority | $10.60 | $10,600 | Wrong tier unless real-time-critical |
Pattern: For 3 of 4 workloads, Batch or Flex saves 50% with no quality difference. Only real-time chat and Codex IDE need Standard or above.
Tier Selection Decision Matrix
| Your situation | Recommended Tier | Why |
|---|---|---|
| Need response within a conversation turn | Standard | Default low-latency, no surprises |
| Can wait up to 24h, doing bulk processing | Batch | 50% off, no UX risk |
| Production traffic, latency-tolerant (5-30s OK) | Flex | 50% off, sync endpoint |
| Voice / real-time / financial / p99-critical | Priority | 2.5× cost buys latency floor |
| Codex IDE workflow, willing to pay for speed | Codex Fast | Codex-only, not general API |
| Mixed workload (chat + batch jobs) | Hybrid | Standard for chat, Batch for offline jobs |
| Unsure | Start Standard | Measure variance first, then optimize down to Flex/Batch |
The default recommendation: start every new GPT-5.5 integration on Standard, measure usage patterns for 2 weeks, then migrate latency-tolerant traffic to Batch/Flex. Most teams find 40-60% of their volume qualifies for the 50% discount.
GPT-5.5 Batch vs Claude / Gemini Batch Pricing
OpenAI is not the only vendor offering tier-based discounts. Here's how GPT-5.5 Batch stacks up against Claude Opus 4.7 and Gemini 3.5 Flash on equivalent low-priority pricing.
| Model | Standard ($/MTok in/out) | Batch / Equivalent ($/MTok in/out) | Discount |
|---|---|---|---|
| GPT-5.5 | $5 / $30 | $2.50 / $15 | 50% |
| Claude Opus 4.7 | $5 / $25 | $2.50 / $12.50 (per Anthropic Batch docs) | 50% |
| Gemini 3.5 Flash | $1.50 / $9 | $0.75 / $4.50 (per Google AI pricing) | 50% |
| Gemini 3.5 Flash Flex | — | $0.75 / $4.50 | 50% |
| Gemini 3.1 Pro Preview (≤200K) | $2 / $12 | $1 / $6 | 50% |
Observations:
- All three major vendors offer 50% Batch discounts — this is industry standard, not OpenAI-specific
- Gemini 3.5 Flash Batch at $0.75 / $4.50 is the cheapest frontier-tier option by a wide margin (per our Gemini 3.5 Flash launch coverage)
- GPT-5.5 Batch matches Claude Opus 4.7 Batch on input ($2.50) and beats it on output ($15 vs $12.50 — wait, Claude is actually cheaper on output)
- Correction: Claude Opus 4.7 Batch output is $12.50 vs GPT-5.5 Batch $15 — Claude is 17% cheaper on output at the Batch tier
For pure cost-per-token at the Batch tier, the ranking is Gemini 3.5 Flash > Claude Opus 4.7 > GPT-5.5. Quality differences across these three on agentic and coding tasks favor GPT-5.5, but for translation, classification, summarization, and embedding-replacement tasks, Gemini 3.5 Flash Batch is hard to beat on per-dollar performance.
Migration: Standard → Batch in 30 Minutes
Switching GPT-5.5 traffic from Standard to Batch is a code change, not a model change. The model ID stays gpt-5.5 — only the endpoint and request shape differ.
Standard call (synchronous):
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-5.5",
input="Summarize this document...",
max_output_tokens=500
)
print(response.output[0].content[0].text)
Batch call (async):
# 1. Prepare a JSONL file of requests
import json
with open("batch_requests.jsonl", "w") as f:
for i, doc in enumerate(documents):
f.write(json.dumps({
"custom_id": f"req-{i}",
"method": "POST",
"url": "/v1/responses",
"body": {
"model": "gpt-5.5",
"input": f"Summarize: {doc}",
"max_output_tokens": 500
}
}) + "\n")
# 2. Upload and submit batch
batch_file = client.files.create(file=open("batch_requests.jsonl", "rb"), purpose="batch")
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/responses",
completion_window="24h"
)
# 3. Poll for completion (typically <2h for most batches)
import time
while True:
batch = client.batches.retrieve(batch.id)
if batch.status == "completed":
results = client.files.content(batch.output_file_id).text
break
time.sleep(60)
Migration checklist:
- Identify which call paths in your code are latency-tolerant (don't block a user response)
- Refactor those paths to enqueue requests rather than call synchronously
- Add a polling worker (or webhook handler if available) to consume Batch outputs
- Run side-by-side for 1 week — measure cost-per-completed-job and any quality delta
- Cut over fully once cost validation completes
For mixed workloads where part of traffic is real-time chat and part is offline reports, run both Standard and Batch in parallel. Many production stacks already follow this pattern with Celery / RQ / Cloud Tasks queues — Batch API maps cleanly onto the same architecture.
For teams routing GPT-5.5 alongside Claude and Gemini through a single endpoint, the TokenMix.ai unified API exposes the same Batch / Flex tier semantics across providers, so you can pick the cheapest qualified tier per workload without writing provider-specific code paths.
FAQ
Is GPT-5.5 Batch the same quality as GPT-5.5 Standard?
Yes. Same model weights, same outputs. The only difference is serving infrastructure — Batch runs jobs asynchronously, often during off-peak GPU windows. No quality degradation has been documented in OpenAI's launch post or independent testing.
What's the difference between Batch and Flex?
Batch is async (submit, wait up to 24h, receive results). Flex is synchronous (same endpoint as Standard) but lower priority — can queue during peak demand. Both cost the same: $2.50 / $15 per MTok. Use Batch for bulk offline work, Flex for production workloads that tolerate occasional latency spikes.
Can I use Batch for real-time chat?
No. Batch jobs return within 24 hours, not within a conversation turn. For chat, use Standard, or Flex if you can tolerate 5-30 second queue delays.
Is Priority tier worth 2.5× Standard cost?
Only if p99 latency matters more than per-token cost. Voice assistants, financial workflows, and applications with end-user SLAs are typical Priority candidates. For most chat and content generation, Standard latency is sufficient.
How does GPT-5.5 Batch compare to Claude Opus 4.7 Batch?
GPT-5.5 Batch: $2.50 input / $15 output. Claude Opus 4.7 Batch: $2.50 / $12.50. Claude is 17% cheaper on output. Choose based on task fit — GPT-5.5 leads on Terminal-Bench, Opus 4.7 leads on SWE-Bench Pro.
Can I mix tiers in one application?
Yes, and you should. Route real-time user requests to Standard, route background tasks to Batch, route latency-critical paths to Priority. Most production stacks find 40-60% of total volume can move to Batch or Flex with no UX impact.
Does Codex Fast count as a fifth tier?
Codex Fast is a Codex-IDE-only pricing variant — 2.5× cost for 1.5× generation speed. It's not exposed in the public Responses or Chat Completions API, so most developers won't encounter it directly.
What happens if a Batch job fails partway through?
Failed individual requests within a batch are returned with error codes in the output file; the batch as a whole completes when the 24h window closes. You re-submit only the failed requests, not the whole batch. OpenAI does not charge for failed requests in a completed batch.
Related Articles
- GPT-5.5 (Spud) Released: $5/$30 API Pricing & Benchmarks 2026
- Gemini 3.5 Flash Released at I/O 2026: $1.50/$9 API Pricing
- Claude API Pricing 2026: Opus, Sonnet, Haiku Costs Compared
- OpenAI API Pricing 2026: GPT-5.5, Realtime, Image Costs
- AI API Gateway 2026: Routing, Fallbacks, Observability
Sources
- OpenAI GPT-5.5 launch post — Standard / Batch / Flex / Priority pricing structure
- OpenAI GPT-5.5 Instant announcement — free-tier rollout
- Anthropic API pricing — Claude Opus 4.7 Batch comparison
- Google AI Gemini API pricing — Gemini 3.5 Flash Batch comparison
- TokenMix.ai pricing observability (internal cross-vendor pricing tracker)
By TokenMix Research Lab · Published 2026-05-21 · Last Updated 2026-05-21 · Data Checked 2026-05-21