TokenMix Research Lab · 2026-04-17

GPT-5.5 (Spud) Released: $5/$30 API Pricing & Benchmarks 2026
Last Updated: 2026-05-14 Author: TokenMix Research Lab
OpenAI shipped GPT-5.5 on April 23, 2026 — codename Spud confirmed in the official launch post. The model is live in the API at gpt-5.5, priced at $5 input / $30 output per million tokens, with a 1M-token context window. GPT-5.5 Pro is also live at $30 / $180. Terminal-Bench 2.0 jumped from 75.1% (GPT-5.4) to 82.7% in a single release. GPT-5.5 Instant followed for free-tier users on May 5, 2026. Below is everything confirmed at launch, the real benchmark deltas against Claude Opus 4.7 and Gemini 3 Pro, and how to call gpt-5.5 today through TokenMix.ai with no waitlist.
Table of Contents
- Quick Verdict: GPT-5.5 Launch Snapshot
- What Actually Shipped on April 23, 2026
- GPT-5.5 API Pricing: All Tiers, Batch & Flex Discounts
- Real Benchmark Results vs GPT-5.4, Claude Opus 4.7, Gemini 3 Pro
- Cost Per Task: GPT-5.5 vs Competitors on Realistic Workloads
- How to Access GPT-5.5 API Today (Direct vs TokenMix)
- Migration Checklist: GPT-5.4 → GPT-5.5
- FAQ
Quick Verdict: GPT-5.5 Launch Snapshot
| Claim | Status | Source |
|---|---|---|
| Launch date: April 23, 2026 | Confirmed | OpenAI launch post |
| Codename "Spud" | Confirmed at launch | OpenAI launch post (acknowledged the codename publicly) |
API model ID: gpt-5.5 and gpt-5.5-pro |
Confirmed | OpenAI pricing & API docs |
| Pricing: $5 input / $30 output per MTok | Confirmed | OpenAI launch post |
| GPT-5.5 Pro: $30 / $180 per MTok | Confirmed | OpenAI launch post |
| Context window: 1M tokens (API), 400K (Codex) | Confirmed | OpenAI launch post |
| GPT-5.5 Instant on free tier | Confirmed (May 5, 2026) | GPT-5.5 Instant announcement |
| Beats Claude Opus 4.7 on Terminal-Bench 2.0 | Confirmed (82.7% vs 69.4%) | OpenAI launch post (third-party scored) |
| Beats Gemini 3 Pro on FrontierMath Tier 4 | Confirmed (35.4% vs 16.7%) | OpenAI launch post |
| Trails Gemini 3 Pro on ARC-AGI-1 | Confirmed (95.0% vs 98.0%) | OpenAI launch post |
| Trails Claude Opus 4.7 on SWE-Bench Pro Public | Confirmed (58.6% vs 64.3%) | OpenAI launch post (memorization caveat flagged) |
Bottom line: GPT-5.5 is the new agentic-coding leader (Terminal-Bench 82.7%) and dominates long-context retrieval at 512K-1M tokens (74.0% vs Opus 4.6 32.2%). It is not the across-the-board winner — Claude Opus 4.7 still leads on SWE-Bench Pro, and Gemini 3 Pro leads on ARC-AGI-1.
What Actually Shipped on April 23, 2026
Spud is no longer speculation. As confirmed in OpenAI's launch post, GPT-5.5 is now generally available across three surfaces:
ChatGPT: GPT-5.5 Thinking rolled out to Plus, Pro, Business and Enterprise users on April 23. GPT-5.5 Pro went to Pro, Business and Enterprise the same day. GPT-5.5 Instant — a free-tier variant — followed on May 5, 2026.
Codex: GPT-5.5 is available on Plus, Pro, Business, Enterprise, Edu and Go plans, with a 400K context window. A "Fast" mode generates 1.5× faster at 2.5× the cost.
API: gpt-5.5 and gpt-5.5-pro are exposed through Responses and Chat Completions endpoints. Microsoft Foundry has parity access. Both models support the full 1M-token context window via the API.
| Launch Date | Surface | What Released |
|---|---|---|
| April 23, 2026 | ChatGPT (Plus / Pro / Biz / Ent) | GPT-5.5 Thinking, GPT-5.5 Pro |
| April 23, 2026 | Codex | GPT-5.5 with 400K context |
| April 23, 2026 | API (Responses, Chat Completions) | gpt-5.5, gpt-5.5-pro |
| May 5, 2026 | ChatGPT Free | GPT-5.5 Instant |
The "Spud" codename was used internally during training and publicly acknowledged after release. The model uses NVIDIA GB200 and GB300 NVL72 systems for serving. Per-token latency matches GPT-5.4 despite the intelligence jump — OpenAI explicitly co-designed inference infrastructure during training. According to TokenMix.ai upstream uptime tracking, gpt-5.5 has been served stably since the day-of-launch ramp.
GPT-5.5 API Pricing: All Tiers, Batch & Flex Discounts
Per OpenAI's pricing page referenced in the launch post, GPT-5.5 is twice the price of GPT-5.4 on inputs and twice on outputs. OpenAI's argument is that GPT-5.5 uses fewer tokens per task in Codex, partially offsetting the higher per-token rate.
Standard API Pricing
| Model | Input ($/MTok) | Output ($/MTok) | Context | Notes |
|---|---|---|---|---|
| gpt-5.5 | $5.00 | $30.00 | 1M | Standard API |
| gpt-5.5-pro | $30.00 | $180.00 | 1M | High-accuracy variant |
| gpt-5.4 | $2.50 | $15.00 | 1.05M | Previous flagship |
| gpt-5.4-mini | $0.40 | $1.60 | 1M | Budget option |
| Claude Opus 4.7 | $5.00 | $25.00 | 200K | Per Anthropic API docs (2026-05-14) |
| Gemini 3 Pro (≤200K prompt) | $2.00 | $12.00 | 1M+ | Per Google AI pricing (2026-05-14) |
| Gemini 3 Pro (>200K prompt) | $4.00 | $18.00 | 1M+ | Tier kicks in past 200K input |
Note: OpenAI's launch post labels the Gemini comparison column "Gemini 3 Pro". Google's own pricing page calls the model "Gemini 3 Pro" with API ID gemini-3.1-pro-preview. They are the same model.
Modal Pricing Variants (GPT-5.5)
| Mode | Multiplier vs Standard | Use Case |
|---|---|---|
| Batch API | 0.5× (50% off) | Non-realtime jobs (overnight, offline embeddings, eval) |
| Flex | 0.5× (50% off) | Latency-tolerant production traffic |
| Standard | 1× | Default, low-latency synchronous |
| Priority | 2.5× | Latency-critical, guaranteed throughput |
| Codex Fast | 2.5× cost, 1.5× speed | Codex IDE workflows |
Translation: if you can tolerate 24-hour completion, Batch puts GPT-5.5 at $2.50 / $15 per MTok — exactly matching old GPT-5.4 standard pricing. According to TokenMix.ai pricing observability, this is the sweet spot for batch summarization or large-scale classification workloads.
Real Benchmark Results vs GPT-5.4, Claude Opus 4.7, Gemini 3 Pro
These are not projections. All numbers below are from OpenAI's launch announcement, which published third-party benchmark scores at release.
Agentic Coding (GPT-5.5's Strongest Domain)
| Benchmark | GPT-5.5 | GPT-5.4 | Claude Opus 4.7 | Gemini 3 Pro |
|---|---|---|---|---|
| Terminal-Bench 2.0 | 82.7% | 75.1% | 69.4% | 68.5% |
| Expert-SWE (Internal) | 73.1% | 68.5% | — | — |
| SWE-Bench Pro (Public)* | 58.6% | 57.7% | 64.3% | 54.2% |
*SWE-Bench Pro Public has known memorization issues — OpenAI flagged this in the launch post. Treat with caution.
Knowledge Work & Tool Use
| Benchmark | GPT-5.5 | GPT-5.4 | GPT-5.5 Pro | Claude Opus 4.7 | Gemini 3 Pro |
|---|---|---|---|---|---|
| GDPval (Win/Tie) | 84.9% | 83.0% | 82.3% | 80.3% | 67.3% |
| FinanceAgent v1.1 | 60.0% | 56.0% | — | 64.4% | 59.7% |
| OSWorld-Verified | 78.7% | 75.0% | — | 78.0% | — |
| BrowseComp | 84.4% | 82.7% | 90.1% | 79.3% | 85.9% |
| Tau2-bench Telecom (raw) | 98.0% | 92.8% | — | — | — |
| Toolathlon | 55.6% | 54.6% | — | — | 48.8% |
Math & Science
| Benchmark | GPT-5.5 | GPT-5.4 | GPT-5.5 Pro | Claude Opus 4.7 | Gemini 3 Pro |
|---|---|---|---|---|---|
| GPQA Diamond | 93.6% | 92.8% | — | 94.2% | 94.3% |
| FrontierMath Tier 1-3 | 51.7% | 47.6% | 52.4% | 43.8% | 36.9% |
| FrontierMath Tier 4 | 35.4% | 27.1% | 39.6% | 22.9% | 16.7% |
| GeneBench | 25.0% | 19.0% | 33.2% | — | — |
| BixBench | 80.5% | 74.0% | — | — | — |
| Humanity's Last Exam (tools) | 52.2% | 52.1% | 57.2% | 54.7% | 51.4% |
| ARC-AGI-1 | 95.0% | 93.7% | — | 93.5% | 98.0% |
| ARC-AGI-2 | 85.0% | 73.3% | 83.3% | 75.8% | 77.1% |
Long Context (Where GPT-5.5 Pulls Decisively Ahead)
| Benchmark | GPT-5.5 | GPT-5.4 | Claude Opus 4.7 |
|---|---|---|---|
| Graphwalks BFS 256K f1 | 73.7% | 62.5% | 76.9% |
| Graphwalks BFS 1M f1 | 45.4% | 9.4% | 41.2% (Opus 4.6) |
| MRCR v2 8-needle 256K-512K | 81.5% | 57.5% | — |
| MRCR v2 8-needle 512K-1M | 74.0% | 36.6% | 32.2% (Opus 4.6) |
The 1M-token reasoning result is the most striking single delta in this launch. GPT-5.4 collapsed to 9.4% on Graphwalks BFS at 1M context. GPT-5.5 holds 45.4%. For RAG over very large corpora or full-codebase analysis, this is the first model that does not require manual chunking.
Cost Per Task: GPT-5.5 vs Competitors on Realistic Workloads
Per-token pricing alone is misleading because token efficiency varies. OpenAI claims GPT-5.5 uses fewer tokens per task than GPT-5.4. Below are costs for three common workflows, computed from public pricing as of 2026-05-14.
| Workload | Tokens (in / out) | GPT-5.5 | GPT-5.4 | Claude Opus 4.7 | Gemini 3 Pro |
|---|---|---|---|---|---|
| Single Codex bug fix | 30K / 8K | $0.39 | $0.20 | $0.35 | $0.16 |
| Long-context RAG | 800K / 4K | $4.12 | $2.06 | N/A* | $3.27 (>200K tier) |
| Standard summarization at scale | 200B / 50B | $2.50M | $1.25M | $1.25M | $800K** |
| Same workload, Batch tier (50% off) | 200B / 50B | $1.25M | $625K | TBD*** | $400K |
*Claude Opus 4.7 capped at 200K context. Workloads requiring >200K input must split or use GPT-5.5 / Gemini 3 Pro. **Gemini 3 Pro at 200B in / 50B out hits the >200K tier: 200B × $4 + 50B × $18 = $800K + $900K = $1.7M; with Batch 50% off = $850K. Numbers reflect estimated mix of ≤200K and >200K prompts. ***Anthropic publishes Batch processing in their docs left-nav but tier discount specifics not re-verified here.
Three observations:
- Claude Opus 4.7 is no longer the premium-priced outlier. At $5 input / $25 output, it now sits within 10% of GPT-5.5 ($5 / $30). The old assumption that "Opus is 3× the price" is out of date as of Anthropic's Opus 4.5 reset.
- Gemini 3 Pro wins short coding tasks on price ($0.16 vs GPT-5.5's $0.39) but its >200K tier doubles input price. For 1M-context workloads, the gap to GPT-5.5 narrows to ~20% cheaper, not 60%.
- GPT-5.5 Batch tier ($2.50 / $15) re-creates old GPT-5.4 standard pricing. For non-realtime production, this is the most defensible cost-per-quality slot in the lineup.
Through TokenMix.ai's unified API, the same gpt-5.5 model is exposed alongside Claude Opus 4.7 and Gemini 3 Pro behind one key. Switching for cost-sensitive workloads becomes a one-parameter change rather than a refactor.
How to Access GPT-5.5 API Today (Direct vs TokenMix)
Two routes, different tradeoffs.
Direct via OpenAI
Requires an OpenAI organization with valid payment method. Tier rate limits apply — Tier 1 accounts get ~30K TPM on gpt-5.5, scaling with usage. No regional cap on availability after launch.
from openai import OpenAI
client = OpenAI()
r = client.chat.completions.create(
model="gpt-5.5",
messages=[{"role": "user", "content": "Explain Ramsey numbers."}]
)
Via TokenMix.ai (OpenAI-Compatible Endpoint)
Same SDK, one base URL change. TokenMix.ai routes the request to OpenAI upstream with no markup on GPT-5.5 standard tier (Batch/Flex pricing pass-through), plus aggregated rate limits and a single bill across 170+ models.
from openai import OpenAI
client = OpenAI(base_url="https://api.tokenmix.ai/v1", api_key="tkmx-...")
r = client.chat.completions.create(
model="gpt-5.5", # Same model ID
messages=[{"role": "user", "content": "Explain Ramsey numbers."}]
)
| Capability | Direct OpenAI | TokenMix.ai |
|---|---|---|
| Same OpenAI SDK | Yes | Yes (base_url change only) |
| Access Claude Opus 4.7 + Gemini 3 Pro with one key | No | Yes |
| Batch / Flex / Priority pricing tiers | Yes | Yes (passed through) |
| Multi-region failover | Manual | Automatic |
| Setup time | Tier 1 verification + payment | Single-page signup |
Migration Checklist: GPT-5.4 → GPT-5.5
Most production GPT-5.4 callers can switch to GPT-5.5 with a model-ID change. The real questions are cost and prompt portability.
| Action | When | Effort |
|---|---|---|
Run eval suite on gpt-5.5 vs current |
Day 1 | 2-4 hours |
Measure token-count delta vs gpt-5.4 |
Day 1-2 | 2 hours |
| Decide: Standard, Batch, or Flex tier per workload | Day 2-3 | Half day |
| Re-tune system prompts that depended on 5.4 token economy | Day 3-5 | 1-3 days |
Route long-context (>200K) workloads to gpt-5.5 from Claude Opus 4.7 |
Week 2 | 1 day |
| Move overnight jobs to Batch tier (50% off) | Week 2 | 1 day |
If your code is hardcoded to gpt-5.4, abstract behind a config flag now. Per TokenMix.ai integration patterns, the cleanest fix is putting the model ID in environment variables and rolling forward when eval data confirms a win.
FAQ
When was GPT-5.5 released?
GPT-5.5 launched on April 23, 2026, with simultaneous availability in ChatGPT, Codex, and the API. GPT-5.5 Instant — the free-tier variant for ChatGPT — followed on May 5, 2026. The internal codename "Spud" was confirmed in OpenAI's launch post.
How much does the GPT-5.5 API cost?
Standard pricing is $5 per million input tokens and $30 per million output tokens. GPT-5.5 Pro costs $30 input / $180 output per million tokens. Batch and Flex tiers cut standard pricing by 50%. Priority tier costs 2.5× standard for guaranteed throughput.
Is GPT-5.5 better than Claude Opus 4.7?
GPT-5.5 wins decisively on Terminal-Bench 2.0 (82.7% vs 69.4%) and dominates long-context retrieval beyond 256K tokens. Claude Opus 4.7 still leads on SWE-Bench Pro Public (64.3% vs 58.6%) and GPQA Diamond (94.2% vs 93.6%). For agentic coding workflows, GPT-5.5 is now ahead. For pure code-completion benchmarks, Opus 4.7 retains the edge.
Is GPT-5.5 better than Gemini 3 Pro?
GPT-5.5 leads on every category except ARC-AGI-1 (Gemini 3 Pro: 98.0% vs GPT-5.5: 95.0%) and BrowseComp (Gemini 85.9% vs GPT-5.5 84.4%). On math (FrontierMath Tier 4: 35.4% vs 16.7%) and tool use (Toolathlon: 55.6% vs 48.8%), GPT-5.5 is clearly ahead. Cost-wise, Gemini 3 Pro at $2/$12 per MTok is 60% cheaper.
What is GPT-5.5's context window?
1M tokens via the API for both gpt-5.5 and gpt-5.5-pro. Codex exposes a 400K-token window. At 512K-1M context, GPT-5.5 holds 74.0% on MRCR v2 8-needle retrieval — up from 36.6% on GPT-5.4. This is the first OpenAI model where 1M-token RAG is practically usable.
Should I migrate from GPT-5.4 to GPT-5.5?
For agentic coding, long-context RAG, or complex tool-use workflows: yes, run an eval. For high-volume short prompts where GPT-5.4 hits accuracy targets: stay on 5.4 unless your benchmarks show a meaningful lift — GPT-5.5 is 2× the per-token cost. Batch and Flex tiers narrow the gap.
Can I use GPT-5.5 through TokenMix.ai?
Yes. gpt-5.5 and gpt-5.5-pro are available through the TokenMix.ai unified API with the same OpenAI-compatible SDK — only the base_url changes. Through the TokenMix.ai platform, the same key calls GPT-5.5, Claude Opus 4.7, Gemini 3 Pro, DeepSeek V4 and 170+ other models with aggregated rate limits.
Is "GPT-6" the same as "GPT-5.5"?
OpenAI confirmed the name is GPT-5.5, not GPT-6. The Spud codename produced the GPT-5.5 release. A future generational version may be named GPT-6, but that has not been announced.