TokenMix Research Lab · 2026-04-07

OpenAI Batch API 2026: 50% Off Every Model, 24-Hour Guide

OpenAI Batch API Pricing Guide: How to Get 50% Off GPT-5.4 and Every Model (2026)

Last Updated: 2026-04-29
Author: TokenMix Research Lab

Batch API gives flat 50% off every OpenAI model — GPT-5.4 drops to $1.25/$7.50, Nano to $0.10/$0.625 — in exchange for a 24-hour completion window. Stacks with prompt caching for up to 75% total savings.

OpenAI's Batch API gives you a flat 50% discount on every model — GPT-5.4, Mini, Nano, o3, o4-mini, and embeddings — in exchange for accepting a 24-hour completion window. That turns GPT-5.4 from $2.50/$15.00 into $1.25/$7.50 per million tokens. For async workloads like batch classification, content generation, data processing, and evaluation pipelines, this is the single biggest cost reduction available from any LLM provider. TokenMix.ai's cost tracking shows teams using the Batch API save 35-48% on total monthly API spend when at least half their workload qualifies for batch processing.

This guide covers exactly how OpenAI Batch API pricing works, which workloads qualify, step-by-step implementation, and real cost savings calculations.

Table of Contents


OpenAI Batch API Pricing: Complete Price Table

Every model gets exactly 50% off — GPT-5.4 batch ($1.25/$7.50) is cheaper than Claude Sonnet standard ($3/$15) on input. Nano batch ($0.10/$0.625) competes with Groq Llama on price, with GPT-class quality.

Every OpenAI model gets exactly 50% off through the Batch API. No exceptions, no complex tier calculations.

Model Standard Input Batch Input (50% off) Standard Output Batch Output (50% off)
GPT-5.4 $2.50 $1.25 $15.00 $7.50
GPT-5.4 Mini $0.75 $0.375 $4.50 $2.25
GPT-5.4 Nano $0.20 $0.10 $1.25 $0.625
o3 $2.50 $1.25 $15.00 $7.50
o4-mini $0.75 $0.375 $4.50 $2.25
o3-deep-research $2.50 $1.25 $15.00 $7.50
o4-mini-deep-research $0.75 $0.375 $4.50 $2.25
text-embedding-3-large $0.13 $0.065
text-embedding-3-small $0.02 $0.01

GPT-5.4 batch pricing at $1.25/$7.50 makes it cheaper than Claude Sonnet's standard pricing ($3/$15) for input and the same for output. This changes competitive dynamics significantly — teams that previously chose cheaper alternatives for cost reasons can now use GPT-5.4 at budget-tier pricing.

GPT-5.4 Nano at batch pricing ($0.10/$0.625) becomes one of the cheapest production APIs available, competing directly with Groq's Llama pricing while offering GPT-class quality.


How the Batch API Works

Three steps: prepare a JSONL file (up to 50K requests), upload + create batch job, poll status and download results. Most batches finish in 1-6 hours despite the 24-hour SLA window. The Batch API is conceptually simple: you submit a file containing multiple API requests, OpenAI processes them in bulk within a 24-hour window, and you download the results.

The Three-Step Process

Step 1: Prepare a JSONL file. Each line in the file is a complete API request — model, messages, temperature, max_tokens, and a custom ID for tracking. One file can contain up to 50,000 requests.

Step 2: Upload and create a batch. Upload the JSONL file to OpenAI's Files API, then create a batch job pointing to that file. You specify the endpoint (e.g., /v1/chat/completions) and the completion window (currently only 24h is available).

Step 3: Poll and download results. The batch processes asynchronously. You poll the batch status or set up a webhook. Once complete, download the output file — another JSONL file with responses matched to your custom IDs.

Processing Guarantees


Which Workloads Qualify for Batch Processing

40-60% of typical team workloads are batch-eligible — nightly classification, content pipelines, evaluation runs, embedding generation. Most teams underestimate this. Real-time chat, agents, search are NOT batch candidates. The only requirement: your workload does not need real-time responses. If you can wait up to 24 hours, you qualify.

Workload Batch Suitable? Why
Nightly data classification Yes Scheduled, no real-time need
Content generation pipeline Yes Write today, publish tomorrow
Evaluation/testing runs Yes No user waiting for results
Training data generation Yes Purely async
Embedding generation Yes Batch indexing is standard
Customer support chatbot No Real-time response needed
Live coding assistant No User waiting for each response
Interactive search No Latency-sensitive
Agent loops (multi-step) Partial Individual steps need chaining

TokenMix.ai analysis of typical team workloads shows that 40-60% of total API requests are batch-eligible — scheduled processing, evaluation pipelines, content generation, data enrichment, and embedding indexing. Most teams underestimate how much of their workload can be shifted to batch.


Step-by-Step: Implementing the Batch API

Five-step implementation: prepare JSONL with custom_id per request, upload to Files API with purpose="batch", create batch job pointing to file_id, poll batch status, download output JSONL with response per custom_id.

Prerequisites

Step 1: Prepare Your JSONL Request File

Create a .jsonl file where each line is a JSON object containing:

Example structure for each line:

{"custom_id": "req-001", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-5.4-mini", "messages": [{"role": "system", "content": "Classify the sentiment as positive, negative, or neutral."}, {"role": "user", "content": "This product exceeded my expectations."}], "max_tokens": 50}}

Each line must be valid JSON. No trailing commas, no line breaks within a single request.

Step 2: Upload the File

Use the Files API to upload your JSONL file with purpose set to "batch":

from openai import OpenAI
client = OpenAI()

batch_file = client.files.create(
    file=open("batch_requests.jsonl", "rb"),
    purpose="batch"
)
file_id = batch_file.id

Step 3: Create the Batch Job

batch_job = client.batches.create(
    input_file_id=file_id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)
batch_id = batch_job.id

Step 4: Monitor Batch Status

status = client.batches.retrieve(batch_id)
print(f"Status: {status.status}")
print(f"Completed: {status.request_counts.completed}")
print(f"Failed: {status.request_counts.failed}")
print(f"Total: {status.request_counts.total}")

Possible statuses: validating, in_progress, completed, failed, expired, cancelled.

Step 5: Download Results

if status.status == "completed":
    result_file = client.files.content(status.output_file_id)
    results = result_file.text
    # Each line is a JSON response with custom_id, response, and error fields

Each result line contains the custom_id you provided, the complete API response, and any error information if that specific request failed.

Error Handling

# Check for individual request failures
if status.error_file_id:
    errors = client.files.content(status.error_file_id).text
    for line in errors.strip().split('\n'):
        error = json.loads(line)
        print(f"Request {error['custom_id']} failed: {error['error']}")

Cost Savings Calculations: Real Numbers

Median team saves $800-$3,000/month on batch eligible work; high-volume content pipeline (5K req/day on GPT-5.4) saves $11,062/month or $132K/year — flat 50% off compounds at scale.

Here is exactly how much the Batch API saves across different workload volumes and models.

Scenario 1: 10,000 Classification Requests per Day

Task profile: 800 input tokens, 30 output tokens per request.

Model Standard Monthly Cost Batch Monthly Cost Monthly Savings
GPT-5.4 $1,950 $975 $975 (50%)
GPT-5.4 Mini $585 $293 $292 (50%)
GPT-5.4 Nano $149 $75 $74 (50%)

Scenario 2: 5,000 Content Generation Requests per Day

Task profile: 1,000 input tokens, 800 output tokens per request.

Model Standard Monthly Cost Batch Monthly Cost Monthly Savings
GPT-5.4 $22,125 $11,063 $11,062 (50%)
GPT-5.4 Mini $4,500 $2,250 $2,250 (50%)
GPT-5.4 Nano $1,530 $765 $765 (50%)

Scenario 3: 1,000 Document Processing Requests per Day

Task profile: 15,000 input tokens, 1,000 output tokens per request.

Model Standard Monthly Cost Batch Monthly Cost Monthly Savings
GPT-5.4 $24,375 $12,188 $12,187 (50%)
GPT-5.4 Mini $4,725 $2,363 $2,362 (50%)
GPT-5.4 Nano $1,275 $638 $637 (50%)

At scale, the savings are substantial. A team processing 5,000 content generation requests per day on GPT-5.4 saves over $11,000 per month — $132,000 per year — just by switching to the Batch API.

TokenMix.ai cost tracking data shows the median team saves $800-$3,000 per month by moving eligible workloads to batch processing.


Batch API vs Standard API: Full Comparison

Trade-offs: 50% price cut for 24-hour latency. Batch keeps tool use, vision, JSON mode, and prompt caching; loses streaming and multi-turn within a single batch. Higher rate limits than standard tiers.

Dimension Standard API Batch API
Pricing Full price 50% off all models
Latency 1-30 seconds 1-24 hours
Max Requests per File 1 50,000
Rate Limits Standard tier limits Separate, higher limits
Streaming Supported Not supported
Tool Use / Function Calling Supported Supported
Vision (Image Input) Supported Supported
Response Format (JSON mode) Supported Supported
Prompt Caching Supported Supported (stacks with batch)
SLA Standard SLA 24-hour completion guarantee
Error Handling Per-request Batch-level with individual error reporting
Minimum Volume 1 request No minimum (1+ requests)

Combining Batch API with Prompt Caching

Cache + batch stack multiplicatively: GPT-5.4 input drops 75% from $2.50/M to $0.625/M; Nano batch + 80% cache hit hits $0.06/M — cheaper than Groq Llama 8B at GPT-class quality. This is where the math gets interesting. Prompt caching and batch pricing stack.

Standard GPT-5.4 input: $2.50/M tokens.

With prompt caching (50% off cached tokens): Effective $1.25/M on cached portions.

With Batch API (50% off everything): $1.25/M input.

With both (cache + batch): $0.625/M on cached input tokens.

That is a 75% reduction from the standard $2.50 rate.

For workloads with high cache hit rates (repeated system prompts, RAG with shared context), the combined discount makes GPT-5.4 competitive with budget models on input cost.

Effective Input Cost per 1M Tokens No Cache 50% Cache Hit 80% Cache Hit
GPT-5.4 Standard $2.50 $1.875 $1.50
GPT-5.4 Batch $1.25 $0.9375 $0.75
GPT-5.4 Mini Standard $0.75 $0.5625 $0.45
GPT-5.4 Mini Batch $0.375 $0.28125 $0.225
GPT-5.4 Nano Standard $0.20 $0.15 $0.12
GPT-5.4 Nano Batch $0.10 $0.075 $0.06

GPT-5.4 Nano with Batch API and 80% cache hit: $0.06/M input tokens. That is cheaper than Groq Llama 8B's $0.05/M input — from a GPT-class model. TokenMix.ai's real-time pricing dashboard shows this as the effective cost after all discounts are applied.


Limitations and When Not to Use Batch

Five hard constraints: 24-hour max window (no faster), no streaming, max 50K requests/file, no multi-turn within batch, jobs immutable after creation. Real-time apps and chained agent steps must use standard API.

Hard limitations:

When batch does not make sense:

When batch is optimal:


Should You Use Batch API or Standard?

Audit your traffic — most teams have 40-60% batch-eligible work running through standard API at full price. Move evaluation suites, nightly pipelines, content generation to batch first; save 25-48% of total monthly API spend.

Your Situation Recommendation Expected Savings
>50% of workload is non-real-time Move non-RT to Batch API 25-48% total API cost
All workload is real-time Standard API only 0% (Batch not applicable)
Running nightly data pipelines Batch API for all pipelines 50% on pipeline costs
Evaluation suite (>500 test cases) Batch API 50% on eval costs
Content generation at scale Batch API + prompt caching 50-75% combined savings
Mixed: real-time + batch eligible Split workload across both 20-35% total savings

The simplest optimization most teams miss: Audit your API calls and identify which ones do not need real-time responses. Move those to Batch API. TokenMix.ai data shows the average team has 40-60% batch-eligible traffic they are running through the standard API at full price.


Related: Compare all model pricing in our complete LLM API pricing comparison

What's the Bottom Line on Batch API Pricing?

Batch API is the simplest cost optimization in the LLM market: flat 50% off, no quality reduction, no model changes — just 24-hour latency tolerance. Combined with prompt caching, GPT-5.4 Nano hits $0.06/M input — cheaper than virtually every alternative. OpenAI's Batch API is the simplest cost optimization available in the LLM market: 50% off everything, no quality reduction, no model changes. The only trade-off is accepting up to 24 hours of latency.

For GPT-5.4 batch pricing at $1.25/$7.50, you get frontier-model quality at mid-tier pricing. Combined with prompt caching, GPT-5.4 Nano reaches $0.06/M input tokens — cheaper than virtually every alternative.

Three action items:

  1. Audit your current API usage and identify batch-eligible workloads.
  2. Start with evaluation and testing pipelines — they are always batch-eligible.
  3. Track cost savings using TokenMix.ai's cost dashboard to measure real impact versus projections.

For real-time pricing comparisons between batch and standard costs across all providers and models, check TokenMix.ai.


FAQ

How much does OpenAI Batch API save compared to standard pricing?

Exactly 50% on all models. GPT-5.4 drops from $2.50/$15.00 to $1.25/$7.50 per million tokens. GPT-5.4 Mini drops from $0.75/$4.50 to $0.375/$2.25. This is a flat discount with no volume requirements or complex tier structures.

How long does a batch job take to complete?

OpenAI guarantees completion within 24 hours, but most batches finish in 1-6 hours depending on size and current system load. There is no way to request faster processing — the 24-hour window is the only option.

Can I use GPT-5.4 Batch API pricing for real-time applications?

No. The Batch API is asynchronous only — you submit requests in bulk and retrieve results later. For real-time applications, you must use the standard API at full price. The Batch API is designed for workloads that can tolerate hours of latency.

Does the Batch API 50% discount stack with prompt caching?

Yes. Prompt caching and batch pricing are independent discounts that stack. With both active, cached input tokens on GPT-5.4 cost $0.625 per million — a 75% reduction from the standard $2.50 rate. This makes GPT-5.4 competitive with budget models for workloads with high prompt reuse.

What is the maximum number of requests in a batch?

Each batch file supports up to 50,000 requests. For larger volumes, create multiple batch jobs. There is no limit on the number of concurrent batch jobs you can run, subject to your account's rate limits.

Is GPT-5.4 batch pricing cheaper than DeepSeek V4?

On input tokens, GPT-5.4 batch ($1.25/M) is more expensive than DeepSeek V4 ($0.30/M). On output tokens, GPT-5.4 batch ($7.50/M) is significantly more expensive than DeepSeek V4 ($0.50/M). DeepSeek V4 remains cheaper per token even against batch pricing. However, GPT-5.4 Nano batch ($0.10/$0.625) is competitive with DeepSeek V4 on input and only slightly more expensive on output.


Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: TokenMix.ai Cost Tracker, OpenAI Batch API Documentation, OpenAI Pricing