TokenMix Research Lab · 2026-04-17

GPT-5.5 (Spud) Released: $5/$30 API Pricing & Benchmarks 2026

Last Updated: 2026-05-14 Author: TokenMix Research Lab

OpenAI shipped GPT-5.5 on April 23, 2026 — codename Spud confirmed in the official launch post. The model is live in the API at gpt-5.5, priced at $5 input / $30 output per million tokens, with a 1M-token context window. GPT-5.5 Pro is also live at $30 / $180. Terminal-Bench 2.0 jumped from 75.1% (GPT-5.4) to 82.7% in a single release. GPT-5.5 Instant followed for free-tier users on May 5, 2026. Below is everything confirmed at launch, the real benchmark deltas against Claude Opus 4.7 and Gemini 3 Pro, and how to call gpt-5.5 today through TokenMix.ai with no waitlist.

Quick Verdict: GPT-5.5 Launch Snapshot
What Actually Shipped on April 23, 2026
GPT-5.5 API Pricing: All Tiers, Batch & Flex Discounts
Real Benchmark Results vs GPT-5.4, Claude Opus 4.7, Gemini 3 Pro
Cost Per Task: GPT-5.5 vs Competitors on Realistic Workloads
How to Access GPT-5.5 API Today (Direct vs TokenMix)
Migration Checklist: GPT-5.4 → GPT-5.5
FAQ

Quick Verdict: GPT-5.5 Launch Snapshot

Claim	Status	Source
Launch date: April 23, 2026	Confirmed	OpenAI launch post
Codename "Spud"	Confirmed at launch	OpenAI launch post (acknowledged the codename publicly)
API model ID: `gpt-5.5` and `gpt-5.5-pro`	Confirmed	OpenAI pricing & API docs
Pricing: $5 input / $30 output per MTok	Confirmed	OpenAI launch post
GPT-5.5 Pro: $30 / $180 per MTok	Confirmed	OpenAI launch post
Context window: 1M tokens (API), 400K (Codex)	Confirmed	OpenAI launch post
GPT-5.5 Instant on free tier	Confirmed (May 5, 2026)	GPT-5.5 Instant announcement
Beats Claude Opus 4.7 on Terminal-Bench 2.0	Confirmed (82.7% vs 69.4%)	OpenAI launch post (third-party scored)
Beats Gemini 3 Pro on FrontierMath Tier 4	Confirmed (35.4% vs 16.7%)	OpenAI launch post
Trails Gemini 3 Pro on ARC-AGI-1	Confirmed (95.0% vs 98.0%)	OpenAI launch post
Trails Claude Opus 4.7 on SWE-Bench Pro Public	Confirmed (58.6% vs 64.3%)	OpenAI launch post (memorization caveat flagged)

Bottom line: GPT-5.5 is the new agentic-coding leader (Terminal-Bench 82.7%) and dominates long-context retrieval at 512K-1M tokens (74.0% vs Opus 4.6 32.2%). It is not the across-the-board winner — Claude Opus 4.7 still leads on SWE-Bench Pro, and Gemini 3 Pro leads on ARC-AGI-1.

What Actually Shipped on April 23, 2026

Spud is no longer speculation. As confirmed in OpenAI's launch post, GPT-5.5 is now generally available across three surfaces:

ChatGPT: GPT-5.5 Thinking rolled out to Plus, Pro, Business and Enterprise users on April 23. GPT-5.5 Pro went to Pro, Business and Enterprise the same day. GPT-5.5 Instant — a free-tier variant — followed on May 5, 2026.

Codex: GPT-5.5 is available on Plus, Pro, Business, Enterprise, Edu and Go plans, with a 400K context window. A "Fast" mode generates 1.5× faster at 2.5× the cost.

API: gpt-5.5 and gpt-5.5-pro are exposed through Responses and Chat Completions endpoints. Microsoft Foundry has parity access. Both models support the full 1M-token context window via the API.

Launch Date	Surface	What Released
April 23, 2026	ChatGPT (Plus / Pro / Biz / Ent)	GPT-5.5 Thinking, GPT-5.5 Pro
April 23, 2026	Codex	GPT-5.5 with 400K context
April 23, 2026	API (Responses, Chat Completions)	`gpt-5.5`, `gpt-5.5-pro`
May 5, 2026	ChatGPT Free	GPT-5.5 Instant

The "Spud" codename was used internally during training and publicly acknowledged after release. The model uses NVIDIA GB200 and GB300 NVL72 systems for serving. Per-token latency matches GPT-5.4 despite the intelligence jump — OpenAI explicitly co-designed inference infrastructure during training. According to TokenMix.ai upstream uptime tracking, gpt-5.5 has been served stably since the day-of-launch ramp.

GPT-5.5 API Pricing: All Tiers, Batch & Flex Discounts

Per OpenAI's pricing page referenced in the launch post, GPT-5.5 is twice the price of GPT-5.4 on inputs and twice on outputs. OpenAI's argument is that GPT-5.5 uses fewer tokens per task in Codex, partially offsetting the higher per-token rate.

Standard API Pricing

Model	Input ($/MTok)	Output ($/MTok)	Context	Notes
gpt-5.5	$5.00	$30.00	1M	Standard API
gpt-5.5-pro	$30.00	$180.00	1M	High-accuracy variant
gpt-5.4	$2.50	$15.00	1.05M	Previous flagship
gpt-5.4-mini	$0.40	$1.60	1M	Budget option
Claude Opus 4.7	$5.00	$25.00	200K	Per Anthropic API docs (2026-05-14)
Gemini 3 Pro (≤200K prompt)	$2.00	$12.00	1M+	Per Google AI pricing (2026-05-14)
Gemini 3 Pro (>200K prompt)	$4.00	$18.00	1M+	Tier kicks in past 200K input

Note: OpenAI's launch post labels the Gemini comparison column "Gemini 3 Pro". Google's own pricing page calls the model "Gemini 3 Pro" with API ID gemini-3.1-pro-preview. They are the same model.

Modal Pricing Variants (GPT-5.5)

Mode	Multiplier vs Standard	Use Case
Batch API	0.5× (50% off)	Non-realtime jobs (overnight, offline embeddings, eval)
Flex	0.5× (50% off)	Latency-tolerant production traffic
Standard	1×	Default, low-latency synchronous
Priority	2.5×	Latency-critical, guaranteed throughput
Codex Fast	2.5× cost, 1.5× speed	Codex IDE workflows

Translation: if you can tolerate 24-hour completion, Batch puts GPT-5.5 at $2.50 / $15 per MTok — exactly matching old GPT-5.4 standard pricing. According to TokenMix.ai pricing observability, this is the sweet spot for batch summarization or large-scale classification workloads.

Real Benchmark Results vs GPT-5.4, Claude Opus 4.7, Gemini 3 Pro

These are not projections. All numbers below are from OpenAI's launch announcement, which published third-party benchmark scores at release.

Agentic Coding (GPT-5.5's Strongest Domain)

Benchmark	GPT-5.5	GPT-5.4	Claude Opus 4.7	Gemini 3 Pro
Terminal-Bench 2.0	82.7%	75.1%	69.4%	68.5%
Expert-SWE (Internal)	73.1%	68.5%	—	—
SWE-Bench Pro (Public)*	58.6%	57.7%	64.3%	54.2%

*SWE-Bench Pro Public has known memorization issues — OpenAI flagged this in the launch post. Treat with caution.

Knowledge Work & Tool Use

Benchmark	GPT-5.5	GPT-5.4	GPT-5.5 Pro	Claude Opus 4.7	Gemini 3 Pro
GDPval (Win/Tie)	84.9%	83.0%	82.3%	80.3%	67.3%
FinanceAgent v1.1	60.0%	56.0%	—	64.4%	59.7%
OSWorld-Verified	78.7%	75.0%	—	78.0%	—
BrowseComp	84.4%	82.7%	90.1%	79.3%	85.9%
Tau2-bench Telecom (raw)	98.0%	92.8%	—	—	—
Toolathlon	55.6%	54.6%	—	—	48.8%

Math & Science

Benchmark	GPT-5.5	GPT-5.4	GPT-5.5 Pro	Claude Opus 4.7	Gemini 3 Pro
GPQA Diamond	93.6%	92.8%	—	94.2%	94.3%
FrontierMath Tier 1-3	51.7%	47.6%	52.4%	43.8%	36.9%
FrontierMath Tier 4	35.4%	27.1%	39.6%	22.9%	16.7%
GeneBench	25.0%	19.0%	33.2%	—	—
BixBench	80.5%	74.0%	—	—	—
Humanity's Last Exam (tools)	52.2%	52.1%	57.2%	54.7%	51.4%
ARC-AGI-1	95.0%	93.7%	—	93.5%	98.0%
ARC-AGI-2	85.0%	73.3%	83.3%	75.8%	77.1%

Long Context (Where GPT-5.5 Pulls Decisively Ahead)

Benchmark	GPT-5.5	GPT-5.4	Claude Opus 4.7
Graphwalks BFS 256K f1	73.7%	62.5%	76.9%
Graphwalks BFS 1M f1	45.4%	9.4%	41.2% (Opus 4.6)
MRCR v2 8-needle 256K-512K	81.5%	57.5%	—
MRCR v2 8-needle 512K-1M	74.0%	36.6%	32.2% (Opus 4.6)

The 1M-token reasoning result is the most striking single delta in this launch. GPT-5.4 collapsed to 9.4% on Graphwalks BFS at 1M context. GPT-5.5 holds 45.4%. For RAG over very large corpora or full-codebase analysis, this is the first model that does not require manual chunking.

Cost Per Task: GPT-5.5 vs Competitors on Realistic Workloads

Per-token pricing alone is misleading because token efficiency varies. OpenAI claims GPT-5.5 uses fewer tokens per task than GPT-5.4. Below are costs for three common workflows, computed from public pricing as of 2026-05-14.

Workload	Tokens (in / out)	GPT-5.5	GPT-5.4	Claude Opus 4.7	Gemini 3 Pro
Single Codex bug fix	30K / 8K	$0.39	$0.20	$0.35	$0.16
Long-context RAG	800K / 4K	$4.12	$2.06	N/A*	$3.27 (>200K tier)
Standard summarization at scale	200B / 50B	$2.50M	$1.25M	$1.25M	$800K**
Same workload, Batch tier (50% off)	200B / 50B	$1.25M	$625K	TBD***	$400K

*Claude Opus 4.7 capped at 200K context. Workloads requiring >200K input must split or use GPT-5.5 / Gemini 3 Pro. **Gemini 3 Pro at 200B in / 50B out hits the >200K tier: 200B × $4 + 50B × $18 = $800K + $900K = $1.7M; with Batch 50% off = $850K. Numbers reflect estimated mix of ≤200K and >200K prompts. ***Anthropic publishes Batch processing in their docs left-nav but tier discount specifics not re-verified here.

Three observations:

Claude Opus 4.7 is no longer the premium-priced outlier. At $5 input / $25 output, it now sits within 10% of GPT-5.5 ($5 / $30). The old assumption that "Opus is 3× the price" is out of date as of Anthropic's Opus 4.5 reset.
Gemini 3 Pro wins short coding tasks on price ($0.16 vs GPT-5.5's $0.39) but its >200K tier doubles input price. For 1M-context workloads, the gap to GPT-5.5 narrows to ~20% cheaper, not 60%.
GPT-5.5 Batch tier ($2.50 / $15) re-creates old GPT-5.4 standard pricing. For non-realtime production, this is the most defensible cost-per-quality slot in the lineup.

Through TokenMix.ai's unified API, the same gpt-5.5 model is exposed alongside Claude Opus 4.7 and Gemini 3 Pro behind one key. Switching for cost-sensitive workloads becomes a one-parameter change rather than a refactor.

How to Access GPT-5.5 API Today (Direct vs TokenMix)

Two routes, different tradeoffs.

Direct via OpenAI

Requires an OpenAI organization with valid payment method. Tier rate limits apply — Tier 1 accounts get ~30K TPM on gpt-5.5, scaling with usage. No regional cap on availability after launch.

from openai import OpenAI
client = OpenAI()
r = client.chat.completions.create(
    model="gpt-5.5",
    messages=[{"role": "user", "content": "Explain Ramsey numbers."}]
)

Via TokenMix.ai (OpenAI-Compatible Endpoint)

Same SDK, one base URL change. TokenMix.ai routes the request to OpenAI upstream with no markup on GPT-5.5 standard tier (Batch/Flex pricing pass-through), plus aggregated rate limits and a single bill across 170+ models.

from openai import OpenAI
client = OpenAI(base_url="https://api.tokenmix.ai/v1", api_key="tkmx-...")
r = client.chat.completions.create(
    model="gpt-5.5",  # Same model ID
    messages=[{"role": "user", "content": "Explain Ramsey numbers."}]
)

Capability	Direct OpenAI	TokenMix.ai
Same OpenAI SDK	Yes	Yes (base_url change only)
Access Claude Opus 4.7 + Gemini 3 Pro with one key	No	Yes
Batch / Flex / Priority pricing tiers	Yes	Yes (passed through)
Multi-region failover	Manual	Automatic
Setup time	Tier 1 verification + payment	Single-page signup

Migration Checklist: GPT-5.4 → GPT-5.5

Most production GPT-5.4 callers can switch to GPT-5.5 with a model-ID change. The real questions are cost and prompt portability.

Action	When	Effort
Run eval suite on `gpt-5.5` vs current	Day 1	2-4 hours
Measure token-count delta vs `gpt-5.4`	Day 1-2	2 hours
Decide: Standard, Batch, or Flex tier per workload	Day 2-3	Half day
Re-tune system prompts that depended on 5.4 token economy	Day 3-5	1-3 days
Route long-context (>200K) workloads to `gpt-5.5` from Claude Opus 4.7	Week 2	1 day
Move overnight jobs to Batch tier (50% off)	Week 2	1 day

If your code is hardcoded to gpt-5.4, abstract behind a config flag now. Per TokenMix.ai integration patterns, the cleanest fix is putting the model ID in environment variables and rolling forward when eval data confirms a win.

FAQ

When was GPT-5.5 released?

GPT-5.5 launched on April 23, 2026, with simultaneous availability in ChatGPT, Codex, and the API. GPT-5.5 Instant — the free-tier variant for ChatGPT — followed on May 5, 2026. The internal codename "Spud" was confirmed in OpenAI's launch post.

How much does the GPT-5.5 API cost?

Standard pricing is $5 per million input tokens and $30 per million output tokens. GPT-5.5 Pro costs $30 input / $180 output per million tokens. Batch and Flex tiers cut standard pricing by 50%. Priority tier costs 2.5× standard for guaranteed throughput.

Is GPT-5.5 better than Claude Opus 4.7?

GPT-5.5 wins decisively on Terminal-Bench 2.0 (82.7% vs 69.4%) and dominates long-context retrieval beyond 256K tokens. Claude Opus 4.7 still leads on SWE-Bench Pro Public (64.3% vs 58.6%) and GPQA Diamond (94.2% vs 93.6%). For agentic coding workflows, GPT-5.5 is now ahead. For pure code-completion benchmarks, Opus 4.7 retains the edge.

Is GPT-5.5 better than Gemini 3 Pro?

GPT-5.5 leads on every category except ARC-AGI-1 (Gemini 3 Pro: 98.0% vs GPT-5.5: 95.0%) and BrowseComp (Gemini 85.9% vs GPT-5.5 84.4%). On math (FrontierMath Tier 4: 35.4% vs 16.7%) and tool use (Toolathlon: 55.6% vs 48.8%), GPT-5.5 is clearly ahead. Cost-wise, Gemini 3 Pro at $2/$12 per MTok is 60% cheaper.

What is GPT-5.5's context window?

1M tokens via the API for both gpt-5.5 and gpt-5.5-pro. Codex exposes a 400K-token window. At 512K-1M context, GPT-5.5 holds 74.0% on MRCR v2 8-needle retrieval — up from 36.6% on GPT-5.4. This is the first OpenAI model where 1M-token RAG is practically usable.

Should I migrate from GPT-5.4 to GPT-5.5?

For agentic coding, long-context RAG, or complex tool-use workflows: yes, run an eval. For high-volume short prompts where GPT-5.4 hits accuracy targets: stay on 5.4 unless your benchmarks show a meaningful lift — GPT-5.5 is 2× the per-token cost. Batch and Flex tiers narrow the gap.

Can I use GPT-5.5 through TokenMix.ai?

Yes. gpt-5.5 and gpt-5.5-pro are available through the TokenMix.ai unified API with the same OpenAI-compatible SDK — only the base_url changes. Through the TokenMix.ai platform, the same key calls GPT-5.5, Claude Opus 4.7, Gemini 3 Pro, DeepSeek V4 and 170+ other models with aggregated rate limits.

Is "GPT-6" the same as "GPT-5.5"?

OpenAI confirmed the name is GPT-5.5, not GPT-6. The Spud codename produced the GPT-5.5 release. A future generational version may be named GPT-6, but that has not been announced.