gpt-4-1106-preview: Retired March 2026 — Migration Guide
gpt-4-1106-preview was retired from OpenAI's API on March 26, 2026, as the final step of the GPT-4 legacy deprecation wave. If you're searching for it, you're likely either migrating a legacy codebase or trying to reproduce results on a model that's no longer callable. This guide covers what the model was, why it mattered, what replaced it, and the concrete migration path for code that still references it. All data verified against OpenAI's official deprecations page and model retirement announcements as of April 2026.
November 6, 2023: Released as the first GPT-4 Turbo preview at OpenAI DevDay
February 13, 2026: GPT-4 (original, non-preview) retired from OpenAI's active API
March 26, 2026: Preview variants including gpt-4-1106-preview retired
As of April 2026, calls to gpt-4-1106-preview return an error. The model identifier no longer resolves to any active endpoint. OpenAI's deprecation notices pointed developers to migrate to gpt-4o or gpt-4.1.
Why It Mattered Historically
gpt-4-1106-preview was a landmark release for three reasons:
1. First GPT-4 with 128K context. Before this, GPT-4 was limited to 8K tokens. The 128K jump enabled document analysis, long-form content, and multi-file code analysis in ways that previously required chunking workarounds.
2. First JSON mode. Native structured output generation. Before this, developers relied on fragile "please return JSON" prompt engineering.
3. First GPT-4 Turbo pricing reduction. The Turbo series lowered API cost per token vs the original GPT-4, opening up use cases that were economically prohibitive.
For almost two years (late 2023 through early 2026), this model powered a large share of GPT-4-based applications.
Direct Replacements
OpenAI's official migration recommendations:
Legacy use case
Recommended replacement
General chat / reasoning
gpt-4o or gpt-4.1
128K context needed
gpt-4.1 (supports 1M context) or gpt-5.5 (1M context)
JSON mode
gpt-4o, gpt-5.4, gpt-5.5 (all have native structured output)
Cost-optimized
gpt-5.4-mini ($0.25/
.00 per MTok)
Frontier capability
gpt-5.5 ($5/$30 per MTok)
Cross-provider diversification
Claude Opus 4.7, DeepSeek V4-Pro
The simplest migration: replace gpt-4-1106-preview with gpt-4.1 in your model identifier. Most existing prompts work unchanged. GPT-4.1 is stronger across the board with 1M context vs the retired 128K.
For routine replacement with minimum code changes:
# Before
model = "gpt-4-1106-preview"
# After — direct replacement
model = "gpt-4.1" # or "gpt-4o" for slightly cheaper
For cost optimization:
# Before
model = "gpt-4-1106-preview"
# After — use tiered routing
if task_complexity == "simple":
model = "gpt-5.4-mini" # $0.25/
.00
elif task_complexity == "standard":
model = "gpt-5.4" # $2.50/
5
else:
model = "gpt-5.5" # $5/$30 frontier
For cross-provider diversification:
# Before — OpenAI-only dependency
client = OpenAI()
response = client.chat.completions.create(model="gpt-4-1106-preview", ...)
# After — aggregator with fallback
from openai import OpenAI
client = OpenAI(
api_key="your-tokenmix-key",
base_url="https://api.tokenmix.ai/v1",
)
# Now has access to gpt-5.4, Claude Opus 4.7, DeepSeek V4-Pro, Kimi K2.6
# Through one API key
Step 3 — Test output quality on your specific tasks:
Run 50-100 representative prompts through old (snapshot outputs from before March 26) and new models. Compare quality. GPT-4.1 is generally better, but specific prompts may need minor adjustments.
Step 4 — Migrate prompts if needed:
Some prompts tuned for GPT-4-1106's quirks need adjustments for newer models. Common patterns:
Simplify verbose system prompts — newer models follow instructions more reliably
Adjust temperature if output verbosity differs
Step 5 — Deploy and monitor:
Roll out gradually if high-traffic. Monitor output quality, user feedback, any regressions.
Supported LLM Providers and Model Routing
Post-retirement, gpt-4-1106-preview is not accessible anywhere through legitimate means. Replacements are:
OpenAI direct — gpt-4.1, gpt-4o, gpt-5.4, gpt-5.5
Azure OpenAI — same GPT-4 family minus the retired versions
AWS Bedrock — Anthropic Claude (not GPT-4, different provider)
Google Vertex AI — Gemini models or Model Garden partnerships
OpenAI-compatible aggregators — TokenMix.ai, OpenRouter, and similar
Through TokenMix.ai, you can replace gpt-4-1106-preview with any of gpt-4.1, gpt-5.4, gpt-5.5, Claude Opus 4.7, DeepSeek V4-Pro, Kimi K2.6, or 300+ other models through a single API key — useful for comparing which replacement gives best results on your specific prompts before committing to a migration.
from openai import OpenAI
client = OpenAI(
api_key="your-tokenmix-key",
base_url="https://api.tokenmix.ai/v1",
)
# Test multiple candidates
candidates = ["gpt-4.1", "gpt-5.4", "gpt-5.5", "claude-opus-4-7"]
for model in candidates:
response = client.chat.completions.create(
model=model,
messages=your_test_messages,
)
print(f"{model}: {response.choices[0].message.content[:200]}")
When You'd Still Want 128K Context
If you specifically needed 128K context and nothing more:
gpt-4.1 — 1M context (overkill but same price point as GPT-4 Turbo)
gpt-4o — 128K context, more recent training
Claude Haiku 4.5 — 200K+ context, strong reasoning, $0.80/$4.00
DeepSeek V4-Flash — 128K+ context, cheapest option at $0.14/$0.28
Gemini 2.5 Flash — 1M context, good balance
For most teams, jumping to 1M context with gpt-4.1 or Claude Opus 4.7 is worth it. The extra capacity rarely hurts, often helps.
Behavior Differences After Migration
What you should expect to change when migrating from gpt-4-1106-preview:
Better:
More reliable instruction-following
Better tool calling / function calling
Stronger code generation
Fewer hallucinations on structured outputs
Better long-context reasoning (not just longer context)
Lower cost per capability at most tiers
Different (may need prompt adjustments):
Slightly different response style
Different refusal patterns (newer safety tuning)
Different default verbosity
Different default structure for list outputs
Potentially worse:
Some very specific GPT-4-1106 quirks won't reproduce — if your prompt relied on a specific way the old model worded things, you'll need to adjust
Legacy prompt hacks may backfire
Plan a 1-2 week stabilization period where you monitor output and adjust prompts for behavior changes.
FAQ
Is there any way to still use gpt-4-1106-preview?
No. The model was fully retired on March 26, 2026. API calls error out. Third parties don't have unauthorized access either.
What about Azure OpenAI — did it deprecate on the same timeline?
Similar but with some variance. Azure OpenAI tracks OpenAI deprecations but sometimes with 2-4 week lag. Check Azure's specific model retirement schedule for your region. As of April 2026, Azure's GPT-4 preview variants are also retired or scheduled for retirement.
Is my old output still good or should I regenerate with gpt-4.1?
For most purposes, old output from gpt-4-1106 is fine if stored. For new generations on updated content, use gpt-4.1 or gpt-4o.
What's the cheapest direct replacement?
gpt-5.4-mini at $0.25/
.00 per MTok, roughly 10× cheaper than gpt-4-1106's original pricing. Capability is equivalent or better for most tasks.
Does the migration break JSON mode?
No. gpt-4o, gpt-4.1, gpt-5.4, and gpt-5.5 all support native JSON mode (response_format={"type": "json_object"}) with the same or better quality than gpt-4-1106.
Should I migrate to a non-OpenAI alternative?
Consider it. Claude Opus 4.7 ($5/$25) and DeepSeek V4-Pro (
.74/$3.48) are both credible alternatives. Access all of them alongside gpt-4.1 through TokenMix.ai for A/B testing before committing.
What about 128K context specifically?
If you only needed 128K, jumping to 1M (gpt-4.1 or gpt-5.5) is free upside. Both support your old 128K-length prompts natively. No downside.
How long before other GPT-4 variants retire?
gpt-4o and gpt-4.1 are on OpenAI's current-tier list as of April 2026. Expect them to remain supported for at least 18-24 months. gpt-4 (original) and all preview variants are retired. Plan your long-term tech stack with newer GPT-5.x models as primary.
Does Azure's retirement timeline differ?
Yes, typically 2-4 weeks offset from OpenAI direct. Always check Azure's specific model retirement page for your region's schedule.