TokenMix Research Lab · 2026-04-25

gpt-4-1106-preview: Retired March 2026 — Migration Guide
Last Updated: 2026-04-25
Author: TokenMix Research Lab
gpt-4-1106-preview was retired from OpenAI's API on March 26, 2026, as the final step of the GPT-4 legacy deprecation wave. If you're searching for it, you're likely either migrating a legacy codebase or trying to reproduce results on a model that's no longer callable. This guide covers what the model was, why it mattered, what replaced it, and the concrete migration path for code that still references it. All data verified against OpenAI's official deprecations page and model retirement announcements as of April 2026.
Table of Contents
- What Happened to gpt-4-1106-preview
- Why It Mattered Historically
- Direct Replacements
- Migration Path for Existing Code
- Supported LLM Providers and Model Routing
- When You'd Still Want 128K Context
- Behavior Differences After Migration
- FAQ
What Happened to gpt-4-1106-preview
Timeline:
- November 6, 2023: Released as the first GPT-4 Turbo preview at OpenAI DevDay
- February 13, 2026: GPT-4 (original, non-preview) retired from OpenAI's active API
- March 26, 2026: Preview variants including
gpt-4-1106-previewretired
As of April 2026, calls to gpt-4-1106-preview return an error. The model identifier no longer resolves to any active endpoint. OpenAI's deprecation notices pointed developers to migrate to gpt-4o or gpt-4.1.
Why It Mattered Historically
gpt-4-1106-preview was a landmark release for three reasons:
1. First GPT-4 with 128K context. Before this, GPT-4 was limited to 8K tokens. The 128K jump enabled document analysis, long-form content, and multi-file code analysis in ways that previously required chunking workarounds.
2. First JSON mode. Native structured output generation. Before this, developers relied on fragile "please return JSON" prompt engineering.
3. First GPT-4 Turbo pricing reduction. The Turbo series lowered API cost per token vs the original GPT-4, opening up use cases that were economically prohibitive.
For almost two years (late 2023 through early 2026), this model powered a large share of GPT-4-based applications.
Direct Replacements
OpenAI's official migration recommendations:
| Legacy use case | Recommended replacement |
|---|---|
| General chat / reasoning | gpt-4o or gpt-4.1 |
| 128K context needed | gpt-4.1 (supports 1M context) or gpt-5.5 (1M context) |
| JSON mode | gpt-4o, gpt-5.4, gpt-5.5 (all have native structured output) |
| Cost-optimized | gpt-5.4-mini ($0.25/$1.00 per MTok) |
| Frontier capability | gpt-5.5 ($5/$30 per MTok) |
| Cross-provider diversification | Claude Opus 4.7, DeepSeek V4-Pro |
The simplest migration: replace gpt-4-1106-preview with gpt-4.1 in your model identifier. Most existing prompts work unchanged. GPT-4.1 is stronger across the board with 1M context vs the retired 128K.
Migration Path for Existing Code
Step 1 — Identify all references:
grep -r "gpt-4-1106-preview" .
Check Python files, config files, environment variables, CI/CD pipelines, feature flags.
Step 2 — Choose replacement based on workload:
For routine replacement with minimum code changes:
# Before
model = "gpt-4-1106-preview"
# After — direct replacement
model = "gpt-4.1" # or "gpt-4o" for slightly cheaper
For cost optimization:
# Before
model = "gpt-4-1106-preview"
# After — use tiered routing
if task_complexity == "simple":
model = "gpt-5.4-mini" # $0.25/$1.00
elif task_complexity == "standard":
model = "gpt-5.4" # $2.50/$15
else:
model = "gpt-5.5" # $5/$30 frontier
For cross-provider diversification:
# Before — OpenAI-only dependency
client = OpenAI()
response = client.chat.completions.create(model="gpt-4-1106-preview", ...)
# After — aggregator with fallback
from openai import OpenAI
client = OpenAI(
api_key="your-tokenmix-key",
base_url="https://api.tokenmix.ai/v1",
)
# Now has access to gpt-5.4, Claude Opus 4.7, DeepSeek V4-Pro, Kimi K2.6
# Through one API key
Step 3 — Test output quality on your specific tasks:
Run 50-100 representative prompts through old (snapshot outputs from before March 26) and new models. Compare quality. GPT-4.1 is generally better, but specific prompts may need minor adjustments.
Step 4 — Migrate prompts if needed:
Some prompts tuned for GPT-4-1106's quirks need adjustments for newer models. Common patterns:
- Remove explicit "please return JSON" instructions — newer models handle structured output better natively
- Simplify verbose system prompts — newer models follow instructions more reliably
- Adjust temperature if output verbosity differs
Step 5 — Deploy and monitor:
Roll out gradually if high-traffic. Monitor output quality, user feedback, any regressions.
Supported LLM Providers and Model Routing
Post-retirement, gpt-4-1106-preview is not accessible anywhere through legitimate means. Replacements are:
- OpenAI direct — gpt-4.1, gpt-4o, gpt-5.4, gpt-5.5
- Azure OpenAI — same GPT-4 family minus the retired versions
- AWS Bedrock — Anthropic Claude (not GPT-4, different provider)
- Google Vertex AI — Gemini models or Model Garden partnerships
- OpenAI-compatible aggregators — TokenMix.ai, OpenRouter, and similar
Through TokenMix.ai, you can replace gpt-4-1106-preview with any of gpt-4.1, gpt-5.4, gpt-5.5, Claude Opus 4.7, DeepSeek V4-Pro, Kimi K2.6, or 300+ other models through a single API key — useful for comparing which replacement gives best results on your specific prompts before committing to a migration.
from openai import OpenAI
client = OpenAI(
api_key="your-tokenmix-key",
base_url="https://api.tokenmix.ai/v1",
)
# Test multiple candidates
candidates = ["gpt-4.1", "gpt-5.4", "gpt-5.5", "claude-opus-4-7"]
for model in candidates:
response = client.chat.completions.create(
model=model,
messages=your_test_messages,
)
print(f"{model}: {response.choices[0].message.content[:200]}")
When You'd Still Want 128K Context
If you specifically needed 128K context and nothing more:
- gpt-4.1 — 1M context (overkill but same price point as GPT-4 Turbo)
- gpt-4o — 128K context, more recent training
- Claude Haiku 4.5 — 200K+ context, strong reasoning, $0.80/$4.00
- DeepSeek V4-Flash — 128K+ context, cheapest option at $0.14/$0.28
- Gemini 2.5 Flash — 1M context, good balance
For most teams, jumping to 1M context with gpt-4.1 or Claude Opus 4.7 is worth it. The extra capacity rarely hurts, often helps.
Behavior Differences After Migration
What you should expect to change when migrating from gpt-4-1106-preview:
Better:
- More reliable instruction-following
- Better tool calling / function calling
- Stronger code generation
- Fewer hallucinations on structured outputs
- Better long-context reasoning (not just longer context)
- Lower cost per capability at most tiers
Different (may need prompt adjustments):
- Slightly different response style
- Different refusal patterns (newer safety tuning)
- Different default verbosity
- Different default structure for list outputs
Potentially worse:
- Some very specific GPT-4-1106 quirks won't reproduce — if your prompt relied on a specific way the old model worded things, you'll need to adjust
- Legacy prompt hacks may backfire
Plan a 1-2 week stabilization period where you monitor output and adjust prompts for behavior changes.
FAQ
Is there any way to still use gpt-4-1106-preview?
No. The model was fully retired on March 26, 2026. API calls error out. Third parties don't have unauthorized access either.
What about Azure OpenAI — did it deprecate on the same timeline?
Similar but with some variance. Azure OpenAI tracks OpenAI deprecations but sometimes with 2-4 week lag. Check Azure's specific model retirement schedule for your region. As of April 2026, Azure's GPT-4 preview variants are also retired or scheduled for retirement.
Is my old output still good or should I regenerate with gpt-4.1?
For most purposes, old output from gpt-4-1106 is fine if stored. For new generations on updated content, use gpt-4.1 or gpt-4o.
What's the cheapest direct replacement?
gpt-5.4-mini at $0.25/$1.00 per MTok, roughly 10× cheaper than gpt-4-1106's original pricing. Capability is equivalent or better for most tasks.
Does the migration break JSON mode?
No. gpt-4o, gpt-4.1, gpt-5.4, and gpt-5.5 all support native JSON mode (response_format={"type": "json_object"}) with the same or better quality than gpt-4-1106.
Should I migrate to a non-OpenAI alternative?
Consider it. Claude Opus 4.7 ($5/$25) and DeepSeek V4-Pro ($1.74/$3.48) are both credible alternatives. Access all of them alongside gpt-4.1 through TokenMix.ai for A/B testing before committing.
What about 128K context specifically?
If you only needed 128K, jumping to 1M (gpt-4.1 or gpt-5.5) is free upside. Both support your old 128K-length prompts natively. No downside.
How long before other GPT-4 variants retire?
gpt-4o and gpt-4.1 are on OpenAI's current-tier list as of April 2026. Expect them to remain supported for at least 18-24 months. gpt-4 (original) and all preview variants are retired. Plan your long-term tech stack with newer GPT-5.x models as primary.
Does Azure's retirement timeline differ?
Yes, typically 2-4 weeks offset from OpenAI direct. Always check Azure's specific model retirement page for your region's schedule.
Related Articles
- Ultimate LLM Comparison Hub 2026: Every Major Model Benchmarked
- text-embedding-3-small: $0.02/MTok, 1536 Dims, MTEB 62.26 Guide
- GPT-5 Nano: $0.05/$0.40 Pricing, 400K Context, Should You Still Use It?
- gpt-4o-transcribe: Speech-to-Text API Guide ($0.006/Min, 2026)
- claude-sonnet-4-5-20250929 vs 4-20250514: Version Diff Guide
Author: TokenMix Research Lab | Last Updated: April 25, 2026 | Data Sources: OpenAI Deprecations API docs, OpenAI retiring GPT-4o announcement, Azure OpenAI model retirements, GPT-4 1106 Preview review (Telnyx), TokenMix.ai migration path