TokenMix Research Lab · 2026-04-24

Claude Rate Exceeded Error: 5 Fixes That Work 2026

Claude Rate Exceeded Error: 5 Fixes That Work 2026

"Rate exceeded" errors on Claude's API — officially rate_limit_error with HTTP 429 — are the most common production pain point with Anthropic. Three sub-categories: requests per minute (RPM) limit, tokens per minute (TPM) limit, and daily quota exhaustion. Different triggers, same error message, different fixes. This guide covers the 5 fixes ranked by impact: tier upgrade, exponential backoff, model fallback, batch API, multi-provider routing. Plus specific fix for each sub-category and when to stop retrying. All data verified against Anthropic's rate limit docs as of April 24, 2026. TokenMix.ai implements auto-failover so "Rate exceeded" becomes invisible.

Table of Contents


Confirmed vs Speculation

Claim Status
Claude rate limits per-model, per-tier Confirmed
Tier 1 Opus 20 RPM Confirmed
Tier 4 Opus 400 RPM Confirmed
Tier upgrades based on spend threshold Confirmed
retry-after header provided Sometimes
Different limits for extended context mode Yes
Rate limit separate from daily quota Yes

Snapshot note (2026-04-24): Tier thresholds ($40 / $200 / $400) and RPM values (Tier 1-4) reflect Anthropic's published rate-limit tiers at snapshot. Anthropic revises these periodically — check the linked rate-limits docs for current figures before planning tier-upgrade strategy. 1M extended context mode has stricter limits (typically 100 RPM on Tier 4 Opus vs 400 on standard) and is subject to beta changes.

Which "Rate Exceeded" Did You Get?

Check response error body:

Error message Meaning Fix priority
"type": "rate_limit_error", "message": "...requests per minute..." RPM hit Backoff + tier
"type": "rate_limit_error", "message": "...input tokens per minute..." TPM hit Reduce prompt size
"type": "overloaded_error" Anthropic capacity issue Retry + fallback
HTTP 529 Temporary outage Retry after 1 minute
Daily token quota exceeded Very rare Contact Anthropic

Different root cause = different fix. Don't apply random backoff if problem is TPM (your prompt is too big).

Fix 1: Tier Upgrade (Biggest Impact)

Anthropic tiers:

Tier Min cumulative spend Opus 4.7 RPM Sonnet 4.6 RPM Haiku 4.5 RPM
Tier 1 $0 (signup) 20 50 100
Tier 2 $40 (within 7 days of signup) 40 100 200
Tier 3 $200 80 200 400
Tier 4 $400 + 7 days 400 1,000 2,000
Custom Enterprise contract Custom Custom Custom

Fastest unlock: spend $40 in first week → Tier 2 (2× rate limits). Spend $400 over a week → Tier 4 (20× RPM).

Auto-upgrades usually trigger within hours of hitting threshold. Contact Anthropic support if stuck.

Fix 2: Exponential Backoff + Jitter

import time, random
from anthropic import Anthropic, RateLimitError

client = Anthropic()

def call_with_backoff(prompt, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                messages=[{"role": "user", "content": prompt}]
            )
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            # Respect retry-after if provided, else exponential + jitter
            wait = getattr(e, 'retry_after', 2 ** attempt + random.uniform(0, 1))
            time.sleep(min(wait, 60))

Key: always add jitter (random offset). Without it, many clients retry in lockstep and immediately re-hit the limit.

Fix 3: Downgrade to Cheaper Tier

Opus 4.7 RPM is most restrictive. If traffic pushes limit:

  1. Identify queries that don't need Opus. Most chat queries work fine on Sonnet 4.6 or Haiku 4.5.
  2. Tier routing: 70% Haiku, 25% Sonnet, 5% Opus typical.
  3. Haiku 4.5 at 2,000 RPM (Tier 4) is 10× Opus — hard to exhaust.

See Claude Sonnet vs Opus and Haiku vs Sonnet for routing rules.

Fix 4: Batch API (Async Pool)

Anthropic's Batch API has separate rate limit pool:

batch = client.messages.batches.create(
    requests=[
        {"custom_id": f"task-{i}", "params": {
            "model": "claude-opus-4-7",
            "max_tokens": 1024,
            "messages": [{"role": "user", "content": prompt}]
        }} for i, prompt in enumerate(prompts)
    ]
)
# Runs over hours, doesn't touch sync rate limits
# 50% pricing discount

For async workflows (daily digests, bulk processing), batch bypass rate limits + saves 50%.

Fix 5: Multi-Provider Failover

When Anthropic's rate limit is the bottleneck, fall over to GPT-5.4 or Gemini 3.1 Pro transparently:

Via TokenMix.ai:

# Configure fallback chain in gateway
# Primary: anthropic/claude-opus-4-7
# Fallback 1: openai/gpt-5-4  (on rate limit)
# Fallback 2: google/gemini-3-1-pro

Automatic — your code doesn't handle fallback logic. Gateway swaps model based on provider health.

FAQ

How do I check my current tier?

Anthropic console → Settings → Limits. Shows your tier + current RPM/TPM limits per model. Also shown in failed 429 response headers.

Does 1M extended context mode have separate rate limits?

Yes — stricter. Tier 4 Opus 1M context: typically 100 RPM vs 400 on standard. Budget accordingly.

Can I request custom rate limits?

Yes via Anthropic sales. For enterprise contracts ≥ 0K/month commit, negotiable. Below that, tier-based is standard.

Why do I get rate limited even though I'm below stated RPM?

Two common causes: (1) concurrent requests count per-connection, not per-minute-window (Anthropic sometimes throttles burst); (2) per-organization limits if multiple keys share one org.

What about Anthropic's overloaded_error?

Different from rate limit — means Anthropic's cluster is full. Usually transient (1-30 seconds). Retry with backoff, or fall over to different provider via TokenMix.ai.

Does prompt caching help with rate limits?

Cached tokens don't count against input TPM limits. Worth implementing if you have repeated large system prompts. See Claude Opus 4 pricing breakdown.

Can I pool rate limits across multiple API keys?

Only if same organization. Multiple keys in one Anthropic org share the same pool. Creating multiple orgs to bypass is a TOS gray area.


Sources

By TokenMix Research Lab · Updated 2026-04-24