TokenMix Research Lab · 2026-05-21

GPT-5.5 Batch vs Flex vs Priority: 50% Off API Math (2026)

GPT-5.5 Batch vs Flex vs Priority: 50% Off API Math (2026)

Last Updated: 2026-05-21 Data Checked: 2026-05-21 Author: TokenMix Research Lab

GPT-5.5 ships with five distinct pricing tiers, and most teams are paying Standard rates for workloads that would run at 50% off on Batch or Flex. The fastest cost win available right now is not a model migration — it's a tier migration on the same model.

Per OpenAI's GPT-5.5 launch post, GPT-5.5 Standard API pricing is $5 input / $30 output per million tokens. Batch API and Flex both cut that to $2.50 / $15 — the same per-token rate as the older GPT-5.4 Standard tier. Priority adds a 2.5× premium on top of Standard. Codex Fast charges 2.5× the standard cost for 1.5× generation speed. This piece breaks down exactly what each tier guarantees, when to switch, and the per-task cost math at 4 realistic workload sizes. All numbers cross-referenced against OpenAI's official pricing documentation and our own GPT-5.5 launch coverage.

Table of Contents


Quick Verdict: All 5 GPT-5.5 Tiers Side-by-Side

GPT-5.5 Standard at $5/$30 is the API list price. Batch and Flex cut that 50% to $2.50/$15. Priority adds 2.5× for SLA-grade latency. Codex Fast charges 2.5× for 1.5× generation speed.

Tier Input ($/MTok) Output ($/MTok) Multiplier vs Standard Typical Latency SLA
Batch API $2.50 $15.00 0.5× ≤24h (async) Within batch window
Flex $2.50 $15.00 0.5× Best-effort, may queue None
Standard $5.00 $30.00 <2s first token None published
Priority $12.50 $75.00 2.5× Guaranteed low-latency Throughput floor
Codex Fast $12.50 $75.00 2.5× cost, 1.5× speed Fastest synchronous Codex IDE only

The 50% Batch/Flex math: $5 × 0.5 = $2.50 input, $30 × 0.5 = $15 output. GPT-5.5 Batch is priced exactly the same as the previous GPT-5.4 Standard tier — meaning you can run the new flagship at the old flagship's price if you can tolerate async completion.

What Each Tier Actually Guarantees (SLA + Latency)

Tier choice is a latency-versus-cost trade-off, not a quality trade-off. The model weights are identical across Standard, Batch, Flex, and Priority — only the serving layer differs.

Batch API

Flex

Standard

Priority

Codex Fast

The five tiers cover the cost-versus-latency Pareto frontier OpenAI is willing to expose. There is no "even cheaper than Batch" option — the floor is $2.50/$15 per MTok for GPT-5.5.

Standard vs Batch: Where the 50% Off Comes From

The Batch API has been around since GPT-4 era; OpenAI extended the same 50% discount to GPT-5.5 at launch. The mechanism is straightforward — async serving lets OpenAI pack jobs onto otherwise-idle GPU capacity (off-peak hours, between bursts of real-time traffic). They share that efficiency back as a discount.

Practical example — 1 million input tokens + 200,000 output tokens per day for a content moderation pipeline:

Tier Daily Cost Monthly Cost (30 days) Annual
Standard 1 × $5 + 0.2 × $30 = $11.00 $330 $3,960
Batch 1 × $2.50 + 0.2 × $15 = $5.50 $165 $1,980
Savings $5.50 $165 $1,980/year

The catch: Batch jobs return within 24 hours, not instantly. If your moderation needs to flag posts before they go live, Batch is wrong. If you batch-process the day's content overnight to generate reports, Batch is exactly right.

For workloads that don't need real-time responses, defaulting to Standard is just throwing 50% of the API budget away. The migration is a different endpoint call, not a different model.

Flex Tier: The Middle Ground for Latency-Tolerant Production

Flex is OpenAI's newer middle-tier offering, sitting between Batch (async) and Standard (real-time). The per-token cost matches Batch — $2.50 / $15 per MTok — but the request is served synchronously through the same endpoint as Standard, just at lower queue priority.

When Flex wins over Batch:

When Flex falls short:

Real numbers: A background research agent running ~500 GPT-5.5 calls per day at 8K input + 2K output per call:

Tier Per-call cost Daily Monthly
Standard (0.008 × $5) + (0.002 × $30) = $0.10 $50 $1,500
Flex (0.008 × $2.50) + (0.002 × $15) = $0.05 $25 $750

Same model, same prompts, same output quality. Difference: requests can queue during peak hours. For a background agent that runs every 15 minutes and isn't blocking a user, this is the highest-leverage cost cut available.

Priority Tier: When 2.5× Cost Actually Pays Off

Priority is where the math flips — you pay 2.5× more per token in exchange for latency guarantees and throughput stability. At $12.50 input / $75 output per MTok, this is the most expensive way to serve GPT-5.5.

Justification check — Priority makes sense only if:

  1. p99 latency (not average) is critical (voice assistants, real-time translation, financial trade copilots)
  2. You've measured Standard tier latency variance and it's hurting user experience
  3. The cost differential is small compared to user-impact cost (e.g., enterprise contracts with latency SLAs to end customers)

Justification check — Priority is wrong if:

  1. You're paying it "just in case" without measuring Standard variance first
  2. Your traffic volume is low (variance is unlikely to bite at <1000 req/day)
  3. You're using Priority because it's the most expensive, not because you measured the need

Real math for a voice assistant doing 5M input + 1M output per month:

Tier Monthly
Batch (5 × $2.50) + (1 × $15) = $27.50 ← async impossible for voice
Flex (5 × $2.50) + (1 × $15) = $27.50 ← latency variance kills UX
Standard (5 × $5) + (1 × $30) = $55
Priority (5 × $12.50) + (1 × $75) = $137.50

If Priority's latency guarantee prevents even one production incident per quarter that would cost more than $82.50 × 3 = $247.50, it pays for itself. Otherwise, Standard with a fallback strategy is more efficient.

Cost Per Task: 4 Real Workloads Calculated

The per-MTok rate is meaningless without task-level context. Below are 4 realistic GPT-5.5 workloads at Standard, Batch, Flex, and Priority pricing.

Workload 1: Customer Support Triage (real-time chat)

Inputs: 4K input + 600 output per conversation, 10,000 conversations/month.

Tier Cost/conversation Monthly Notes
Standard $0.038 $380 Recommended default
Flex $0.019 $190 Risky — p99 latency matters
Batch Async impossible for chat
Priority $0.095 $950 Only if enterprise SLA required

Workload 2: Document Summarization (overnight batch)

Inputs: 80K input + 4K output per document, 5,000 docs/month, runs nightly.

Tier Cost/doc Monthly Notes
Standard $0.520 $2,600 Wasteful — Batch is identical quality
Batch $0.260 $1,300 Correct default
Flex $0.260 $1,300 Works but no reason to skip Batch
Priority $1.300 $6,500 Wrong tier for async work

Workload 3: Code Generation (Codex IDE)

Inputs: 12K input + 3K output per generation, 50,000 generations/month.

Tier Cost/gen Monthly Notes
Standard $0.150 $7,500 Default
Codex Fast $0.375 $18,750 1.5× speed at 2.5× cost — measure user willingness
Batch Codex needs sync, can't batch
Priority $0.375 $18,750 Codex teams typically pick Codex Fast over Priority

Workload 4: Long-Context RAG (research agent)

Inputs: 800K input + 8K output per query, 1,000 queries/month.

Tier Cost/query Monthly Notes
Standard $4.24 $4,240 Default for production research
Flex $2.12 $2,120 If 30s queue acceptable, big savings
Batch $2.12 $2,120 If async OK, even simpler
Priority $10.60 $10,600 Wrong tier unless real-time-critical

Pattern: For 3 of 4 workloads, Batch or Flex saves 50% with no quality difference. Only real-time chat and Codex IDE need Standard or above.

Tier Selection Decision Matrix

Your situation Recommended Tier Why
Need response within a conversation turn Standard Default low-latency, no surprises
Can wait up to 24h, doing bulk processing Batch 50% off, no UX risk
Production traffic, latency-tolerant (5-30s OK) Flex 50% off, sync endpoint
Voice / real-time / financial / p99-critical Priority 2.5× cost buys latency floor
Codex IDE workflow, willing to pay for speed Codex Fast Codex-only, not general API
Mixed workload (chat + batch jobs) Hybrid Standard for chat, Batch for offline jobs
Unsure Start Standard Measure variance first, then optimize down to Flex/Batch

The default recommendation: start every new GPT-5.5 integration on Standard, measure usage patterns for 2 weeks, then migrate latency-tolerant traffic to Batch/Flex. Most teams find 40-60% of their volume qualifies for the 50% discount.

GPT-5.5 Batch vs Claude / Gemini Batch Pricing

OpenAI is not the only vendor offering tier-based discounts. Here's how GPT-5.5 Batch stacks up against Claude Opus 4.7 and Gemini 3.5 Flash on equivalent low-priority pricing.

Model Standard ($/MTok in/out) Batch / Equivalent ($/MTok in/out) Discount
GPT-5.5 $5 / $30 $2.50 / $15 50%
Claude Opus 4.7 $5 / $25 $2.50 / $12.50 (per Anthropic Batch docs) 50%
Gemini 3.5 Flash $1.50 / $9 $0.75 / $4.50 (per Google AI pricing) 50%
Gemini 3.5 Flash Flex $0.75 / $4.50 50%
Gemini 3.1 Pro Preview (≤200K) $2 / $12 $1 / $6 50%

Observations:

For pure cost-per-token at the Batch tier, the ranking is Gemini 3.5 Flash > Claude Opus 4.7 > GPT-5.5. Quality differences across these three on agentic and coding tasks favor GPT-5.5, but for translation, classification, summarization, and embedding-replacement tasks, Gemini 3.5 Flash Batch is hard to beat on per-dollar performance.

Migration: Standard → Batch in 30 Minutes

Switching GPT-5.5 traffic from Standard to Batch is a code change, not a model change. The model ID stays gpt-5.5 — only the endpoint and request shape differ.

Standard call (synchronous):

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-5.5",
    input="Summarize this document...",
    max_output_tokens=500
)
print(response.output[0].content[0].text)

Batch call (async):

# 1. Prepare a JSONL file of requests
import json
with open("batch_requests.jsonl", "w") as f:
    for i, doc in enumerate(documents):
        f.write(json.dumps({
            "custom_id": f"req-{i}",
            "method": "POST",
            "url": "/v1/responses",
            "body": {
                "model": "gpt-5.5",
                "input": f"Summarize: {doc}",
                "max_output_tokens": 500
            }
        }) + "\n")

# 2. Upload and submit batch
batch_file = client.files.create(file=open("batch_requests.jsonl", "rb"), purpose="batch")
batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/responses",
    completion_window="24h"
)

# 3. Poll for completion (typically <2h for most batches)
import time
while True:
    batch = client.batches.retrieve(batch.id)
    if batch.status == "completed":
        results = client.files.content(batch.output_file_id).text
        break
    time.sleep(60)

Migration checklist:

  1. Identify which call paths in your code are latency-tolerant (don't block a user response)
  2. Refactor those paths to enqueue requests rather than call synchronously
  3. Add a polling worker (or webhook handler if available) to consume Batch outputs
  4. Run side-by-side for 1 week — measure cost-per-completed-job and any quality delta
  5. Cut over fully once cost validation completes

For mixed workloads where part of traffic is real-time chat and part is offline reports, run both Standard and Batch in parallel. Many production stacks already follow this pattern with Celery / RQ / Cloud Tasks queues — Batch API maps cleanly onto the same architecture.

For teams routing GPT-5.5 alongside Claude and Gemini through a single endpoint, the TokenMix.ai unified API exposes the same Batch / Flex tier semantics across providers, so you can pick the cheapest qualified tier per workload without writing provider-specific code paths.

FAQ

Is GPT-5.5 Batch the same quality as GPT-5.5 Standard?

Yes. Same model weights, same outputs. The only difference is serving infrastructure — Batch runs jobs asynchronously, often during off-peak GPU windows. No quality degradation has been documented in OpenAI's launch post or independent testing.

What's the difference between Batch and Flex?

Batch is async (submit, wait up to 24h, receive results). Flex is synchronous (same endpoint as Standard) but lower priority — can queue during peak demand. Both cost the same: $2.50 / $15 per MTok. Use Batch for bulk offline work, Flex for production workloads that tolerate occasional latency spikes.

Can I use Batch for real-time chat?

No. Batch jobs return within 24 hours, not within a conversation turn. For chat, use Standard, or Flex if you can tolerate 5-30 second queue delays.

Is Priority tier worth 2.5× Standard cost?

Only if p99 latency matters more than per-token cost. Voice assistants, financial workflows, and applications with end-user SLAs are typical Priority candidates. For most chat and content generation, Standard latency is sufficient.

How does GPT-5.5 Batch compare to Claude Opus 4.7 Batch?

GPT-5.5 Batch: $2.50 input / $15 output. Claude Opus 4.7 Batch: $2.50 / $12.50. Claude is 17% cheaper on output. Choose based on task fit — GPT-5.5 leads on Terminal-Bench, Opus 4.7 leads on SWE-Bench Pro.

Can I mix tiers in one application?

Yes, and you should. Route real-time user requests to Standard, route background tasks to Batch, route latency-critical paths to Priority. Most production stacks find 40-60% of total volume can move to Batch or Flex with no UX impact.

Does Codex Fast count as a fifth tier?

Codex Fast is a Codex-IDE-only pricing variant — 2.5× cost for 1.5× generation speed. It's not exposed in the public Responses or Chat Completions API, so most developers won't encounter it directly.

What happens if a Batch job fails partway through?

Failed individual requests within a batch are returned with error codes in the output file; the batch as a whole completes when the 24h window closes. You re-submit only the failed requests, not the whole batch. OpenAI does not charge for failed requests in a completed batch.

Sources


By TokenMix Research Lab · Published 2026-05-21 · Last Updated 2026-05-21 · Data Checked 2026-05-21