TokenMix Research Lab · 2026-06-22

Claude Rate Exceeded 2026: claude.ai Limits & API 429 Fix

Last Updated: 2026-06-22 Author: TokenMix Research Lab Data verified: 2026-06-22 - Anthropic rate-limit docs, error docs, service tiers, prompt caching docs, Anthropic Help Center usage-limit and API rate-limit articles, May 2026 usage-limit announcement, Claude Code error reference, and TokenMix Claude pricing cluster

"Claude rate exceeded" means two different things. On the claude.ai app it is a subscription usage limit — a 5-hour rolling window plus weekly caps on Pro and Max — that resets on a clock. On the API it is HTTP 429 rate_limit_error across RPM, ITPM, OTPM, spend, workspace, or fast-mode buckets. The fixes are completely different, and this guide covers both.

On the consumer app, Anthropic does not publish exact message counts; usage depends on model, conversation length, attachments, and features, and usage across claude.ai, Claude Code, and Claude Desktop all counts toward the same limit (Anthropic Help Center). On the API, Anthropic documents three Messages API rate-limit dimensions: requests per minute (RPM), input tokens per minute (ITPM), and output tokens per minute (OTPM). If any bucket is exceeded, the API returns HTTP 429 with rate_limit_error and a retry-after header (Anthropic rate limits, Anthropic errors). For most Claude models, cached input reads do not count toward ITPM, while input tokens after the last cache breakpoint and cache creation tokens do count (Anthropic rate limits). So the correct API fix is bucket-aware throttling, token-aware concurrency, prompt caching, and fallback routing — not just "sleep and retry." Claims below are tagged Confirmed, Likely, or False so you can act on the certain ones first.

Quick Verdict
Claude.ai Rate Exceeded vs API 429
How to Fix claude.ai Rate Exceeded
What 429 Means
RPM ITPM OTPM Explained
Response Headers
Five Fixes That Actually Work
Backoff and Jitter Code
Cost and Capacity Math
Workspace Service Tier and Fast Mode Traps
Fallback Routing
Risks and Caveats
Final Recommendation
FAQ
Sources
Related Articles

Quick Verdict

Claim	Status	Source
"Claude rate exceeded" on claude.ai is a subscription usage limit, not the API 429	Confirmed	Anthropic Help Center
Usage across claude.ai, Claude Code, and Claude Desktop shares one limit	Confirmed	Anthropic Help Center
Pro and Max have weekly caps on top of the 5-hour rolling window	Confirmed	TechCrunch
Anthropic publishes an exact claude.ai message count	False	Help Center lists only factors: model, length, attachments, features
Claude API 429 maps to `rate_limit_error`	Confirmed	Anthropic errors
Anthropic rate limits are measured by RPM, ITPM, and OTPM for Messages API model classes	Confirmed	Anthropic rate limits
A 429 response includes `retry-after` telling how long to wait	Confirmed	Anthropic rate limits
Short bursts can trigger rate limits even when the minute average looks valid	Confirmed	Anthropic rate limits
Cached input reads do not count toward ITPM for most Claude models	Confirmed	Anthropic rate limits
`max_tokens` increases OTPM rate-limit usage before tokens are generated	False	Anthropic says OTPM is evaluated on actual generated output, not `max_tokens`
529 and 429 are the same failure	False	Anthropic errors separates 429 `rate_limit_error` from 529 `overloaded_error`
Priority Tier removes all regular rate limits	False	Service tiers says Priority Tier still observes regular rate limits
Fast mode uses the same Opus rate-limit bucket	False	Anthropic documents dedicated fast mode rate limits and `anthropic-fast-*` headers
Random exponential backoff is enough for production	Likely false	It helps, but headers, token accounting, caching, and fallback are still needed

Claude.ai Rate Exceeded vs API 429

"Rate exceeded" on the claude.ai app and a 429 on the API are different problems with different fixes. The app shows a subscription usage notice tied to time windows; the API returns HTTP 429 rate_limit_error tied to per-minute throughput.

Where you see it	What it actually is	What resets it	Status
claude.ai web, desktop, mobile	Subscription usage limit, often shown as "rate exceeded" or "usage limit reached" with a reset time	5-hour rolling window; weekly caps on Pro and Max	Confirmed
Claude Code on a Pro or Max seat	The same shared usage limit as the app; `/status` shows remaining	Same windows	Confirmed
Developer API (Messages)	HTTP 429 `rate_limit_error` with a `retry-after` header	Per-minute RPM, ITPM, OTPM buckets	Confirmed

The one-line distinction: the consumer app answers "how much can I use in this window," while the API answers "how fast can I push tokens right now." Anthropic states that usage across claude.ai, Claude Code, and Claude Desktop all counts toward the same usage limit (Anthropic Help Center), so a heavy Claude Code session can lock you out of web chat. The next section fixes the app limit; the rest of the guide fixes the API 429.

How to Fix claude.ai Rate Exceeded

On the claude.ai app, "rate exceeded" is a subscription usage cap, so the fix is about the reset clock and your plan, not retry code. Anthropic does not publish exact message counts; usage depends on model, conversation length, attachments, and features (Anthropic Help Center).

Tier	Per-session allowance	Weekly cap	Opus access
Free	5-hour rolling window, smallest reserve	None (5-hour only)	Limited / lighter model
Pro (~$20/mo)	At least 5x Free	Overall + separate Opus weekly	Yes
Max 5x ($100/mo)	~5x Pro	Higher	More Opus headroom
Max 20x ($200/mo)	~20x Pro	Highest	Most Opus headroom

The fixes below are confirmed in Anthropic's Help Center and announcements. None of them is "retry the request" — that is an API pattern, not a consumer one.

Fix	What it does	Best when
Wait for the reset	The 5-hour window resets on a rolling clock; weekly caps reset every 7 days	You are briefly over
Switch to a lighter model	Sonnet or Haiku burn the allowance slower than Opus, which has its own weekly cap	Mid-session lockout
Start a new chat	Drops accumulated context that inflates per-message cost on long threads	Deep conversations
Upgrade tier	Free to Pro to Max gives 5x or 20x headroom	Hitting limits often
Buy extra usage (Max)	Max users can purchase more at standard API rates	Occasional overflow
Move to the API	No 5-hour or weekly consumer cap; pay per token instead	Developers and automation

Two timing notes for 2026. First, on May 6, 2026 Anthropic doubled Claude Code's 5-hour limits and removed peak-hour throttling for Pro and Max, after tightening peak-hour usage earlier in the year (Anthropic). Second, the weekly caps for Pro and Max took effect August 28, 2025 and, by Anthropic's estimate, were expected to affect under 5% of subscribers (TechCrunch). If you are a developer who never wants a usage-window lockout, the durable fix is the API plus a gateway fallback, covered below.

What 429 Means

Surface symptom	Likely real cause	Best first check	Status
`rate_limit_error` in API JSON	RPM, ITPM, OTPM, workspace, spend, fast mode, or acceleration	Error body + response headers	Confirmed
`retry-after` present	Server tells exact wait window	Sleep at least that many seconds	Confirmed
Requests fail in bursts	Short-interval enforcement	Concurrency and queue shape	Confirmed
Long prompts fail even with low request count	ITPM exhaustion	Input-token headers	Confirmed
Long completions fail or stall	OTPM pressure	Output-token headers	Confirmed
New org suddenly scales traffic and gets 429	Acceleration limit	Ramp traffic gradually	Confirmed
Claude Code shows `API Error: Request rejected (429)`	Claude Code / API capacity or account limit	Claude Code error reference + account state	Confirmed
529 instead of 429	Anthropic overloaded globally	Retry or fallback provider	Confirmed

The first diagnostic question is not "how long do I sleep?" It is "which bucket did I exceed?"

RPM ITPM OTPM Explained

Bucket	What it measures	What breaks it	Fix
RPM	Requests per minute	Too many API calls, especially bursty parallel calls	Queue, leaky bucket, concurrency cap
ITPM	Input tokens per minute	Large prompts, long RAG contexts, cache writes	Prompt compression, caching, chunking
OTPM	Output tokens per minute	Long generations, many streamed completions	Lower target output, route long jobs to batch
Spend limit	Monthly dollar cap by usage tier or custom org cap	Normal traffic after monthly spend ceiling	Raise cap, wait next month, use cheaper model
Workspace limit	Workspace-level custom cap	One workspace exceeds local cap	Rebalance workspace caps
Fast mode limit	Dedicated fast mode bucket	`speed: "fast"` traffic exceeds preview lane	Fall back to standard mode
Acceleration limit	Sharp traffic increase	Sudden launch or retry storm	Gradual ramp and adaptive backoff

Anthropic warns that a nominal 60 RPM limit can be enforced as 1 request per second, so dumping 60 requests at once can still fail. That is why queue shape matters as much as the published number.

Response Headers

Header family	Meaning	How to use it
`retry-after`	Seconds to wait before retrying	Treat as minimum sleep time
`anthropic-ratelimit-requests-limit`	Request limit	Size queue and concurrency
`anthropic-ratelimit-requests-remaining`	Remaining requests before rate limit	Slow down before zero
`anthropic-ratelimit-requests-reset`	When request limit replenishes	Schedule retry
`anthropic-ratelimit-input-tokens-limit`	Input-token cap	Gate large prompts
`anthropic-ratelimit-input-tokens-remaining`	Input tokens left, rounded	Refuse large RAG calls before failure
`anthropic-ratelimit-input-tokens-reset`	Input-token reset time	Retry token-heavy work later
`anthropic-ratelimit-output-tokens-limit`	Output-token cap	Cap long completions
`anthropic-ratelimit-output-tokens-remaining`	Output tokens left, rounded	Route long generation elsewhere
`anthropic-fast-*`	Fast mode rate status	Only applies to fast mode preview
`request-id`	Unique request identifier	Include in support/debug logs

Do not parse only the HTTP status. Save the headers. They are the difference between a one-line retry loop and a production throttle.

Five Fixes That Actually Work

Fix	Solves	Implementation	Confidence
Respect `retry-after`	Ordinary 429	Sleep at least the header value before retry	Confirmed
Add jitter	Retry storms	Add random delay on top of `retry-after` or exponential backoff	Likely
Token-aware queue	ITPM/OTPM	Estimate input/output tokens before dispatch	Confirmed
Prompt caching	ITPM for repeated context	Cache long system prompts, tool definitions, docs, conversation state	Confirmed
Concurrency cap	RPM and bursts	Limit per-model and per-workspace parallel calls	Confirmed
Workspace caps	Multi-team fairness	Set per-workspace spend/rate limits below org maximum	Confirmed
Model fallback	Provider or model saturation	Route to cheaper/faster fallback model	Likely
Batch API	Async non-user-facing work	Move evals, summaries, offline transforms to batches	Confirmed
Priority Tier	Production availability	Use committed-spend tier when SLA matters	Confirmed

The first five are the default fix stack. The last four are architecture choices.

Backoff and Jitter Code

curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 256,
    "messages": [{"role": "user", "content": "Say hello"}]
  }' -i

import random
import time

def retry_delay_seconds(response, attempt):
    retry_after = response.headers.get("retry-after")
    if retry_after:
        base = float(retry_after)
    else:
        base = min(60.0, 2 ** attempt)
    jitter = random.uniform(0.1, 0.4) * base
    return base + jitter

def should_retry(status_code, error_type):
    if status_code == 429 and error_type == "rate_limit_error":
        return True
    if status_code == 529:
        return True
    if status_code in (400, 401, 403, 404, 413):
        return False
    return status_code >= 500

Code rule	Why
Retry 429 only after `retry-after`	Earlier retries fail and amplify load
Retry 529 with provider fallback	It is overloaded capacity, not your quota
Never retry 401/403 blindly	Auth/permission errors do not heal with sleep
Never retry 413 blindly	Request too large must be resized
Log `request-id`	Anthropic support needs it
Store rate-limit headers	Header state tells which limiter fired

Cost and Capacity Math

Scenario	Math	Result	Status
Burst shape	60 RPM can be enforced as 1 RPS	60 calls at once can fail; 1/sec queue passes	Confirmed principle
Cache-aware ITPM	2M ITPM with 80% cache hit	Effective 10M total input tokens/minute	Confirmed example
Workspace cap	Org 40K ITPM, workspace 30K ITPM	Other workspaces still have at least 10K ITPM if unused tokens remain	Confirmed example
Retry storm	100 failed workers retry immediately	100 more failures plus load spike	Likely
Long output	20 calls x 4K output	80K output tokens pressure OTPM	Confirmed math

Cost calculation 1: If your org has 40,000 ITPM and one workspace is capped at 30,000 ITPM, that workspace cannot consume the full org bucket. Anthropic uses this exact pattern to explain workspace limits: the remaining unused tokens are available to other workspaces.

Cost calculation 2: With a 2,000,000 ITPM limit and 80% cache hit rate, Anthropic's docs say you can effectively process 10,000,000 total input tokens per minute because cached reads do not count toward ITPM for most models. That is a 5x effective throughput gain, not a pricing discount alone.

Cost calculation 3: If a job launches 200 parallel requests against a lane that effectively accepts 1 request/second, a naive retry loop can create minutes of self-inflicted 429s. Queueing those 200 requests at 1/sec completes dispatch in about 200 seconds without turning every retry into another rate-limit event.

For token price tradeoffs after the error is fixed, use Claude API Pricing 2026 and Claude API Cache Pricing 2026. A 429 fix that doubles cache hit rate can be worth more than a model downgrade.

Workspace Service Tier and Fast Mode Traps

Feature	Common mistake	Correct read	Source
Usage tier	Assuming higher tier means no limits	Higher tiers raise limits but still enforce them	Anthropic rate limits
Spend limit	Treating 429 as purely RPM/TPM	Spend caps can halt usage until next month	Anthropic rate limits
Workspace limit	Looking only at org limits	Workspace caps can be lower than org caps	Anthropic rate limits
Priority Tier	Expecting no rate limits	Requests still pull from regular rate limits	Anthropic service tiers
Batch API	Using live API for offline jobs	Batch has separate queue limits	Anthropic rate limits
Fast mode	Assuming standard Opus limits apply	Fast mode has dedicated limits and headers	Anthropic rate limits
Claude Platform on AWS	Expecting Anthropic automatic tier advancement	AWS path has different billing/spend behavior	Anthropic rate limits

The dangerous pattern is treating every 429 as a code bug. Sometimes it is spend ceiling. Sometimes it is workspace policy. Sometimes it is acceleration. The fix changes.

Fallback Routing

Primary failure	Better fallback	Why
Claude RPM exhausted	Same Claude model later	If the model is required, delay is safest
Claude ITPM exhausted	Cached prompt or smaller-context model	Reduce input pressure
Claude OTPM exhausted	Shorter output or cheaper long-output model	Reduce output pressure
529 overloaded	Another provider/model	This is shared platform load
Fast mode 429	Standard mode	Different lane
Workspace cap	Different workspace only if policy allows	Avoid bypassing governance
User-facing SLA	Gateway fallback	Protect UX
Batch job	Batch API	Remove from live rate pool

The routing layer matters because one provider's rate limit should not take down the product. TokenMix covers this pattern in AI API Gateway 2026, and the Claude/OpenAI-compatible setup path is covered in Anthropic OpenAI-Compatible API 2026.

Risks and Caveats

Risk	Status	Mitigation
Published limits are maximum allowed usage, not guaranteed minimums	Confirmed	Build headroom
Headers may show the most restrictive active limiter	Confirmed	Log all headers, not only one
Short bursts can fail under minute limits	Confirmed	Smooth traffic
Cached-read behavior differs for marked models	Confirmed	Check dagger notes in docs
`max_tokens` is mistaken for OTPM usage	False	OTPM counts actual generated output
529 is handled like 429	False	Use fallback and retry separately
Priority Tier is treated as unlimited	False	Still obeys regular limits
Claude Code errors are assumed to be raw API errors	Likely	Check Claude Code docs and account state

Final Recommendation

For Claude 429, do not start with a bigger sleep. Start with the bucket. Log retry-after, request headers, token headers, request ID, model, workspace, cache hit rate, and fast mode state. Then apply queueing, caching, jitter, and fallback in that order.

FAQ

What does Claude rate exceeded mean?

It depends where you see it. On the claude.ai app it means you hit a subscription usage limit (a 5-hour rolling window, or a weekly cap on Pro and Max) and must wait for the reset. On the API it means HTTP 429 rate_limit_error from RPM, ITPM, OTPM, spend caps, workspace caps, fast mode, or acceleration limits.

Why does claude.ai say rate exceeded?

You reached your plan's usage limit for the current window. Anthropic does not publish exact counts; usage rises with the model you pick (Opus burns fastest), conversation length, attachments, and features. Usage from claude.ai, Claude Code, and Claude Desktop all counts toward the same limit.

How long does claude.ai rate exceeded last?

The 5-hour session limit resets on a rolling basis, and the app shows when you can resume. Weekly caps on Pro and Max reset every 7 days. Switching to a lighter model or starting a new chat can let you keep working before the reset.

Is claude.ai rate exceeded the same as API 429?

No. The claude.ai app limit is a subscription usage cap with a reset clock and no HTTP code. The API 429 is a per-minute throughput limit returned as rate_limit_error with a retry-after header. The app fix is to wait, downgrade the model, or upgrade your plan; the API fix is backoff, caching, and fallback.

What is the difference between RPM and TPM for Claude?

Anthropic splits token rate limits into ITPM and OTPM, not one generic TPM bucket. ITPM covers input tokens per minute; OTPM covers actual output tokens generated per minute.

Should I retry every Claude 429?

Retry only after respecting retry-after. Add jitter and a maximum retry count. If the same bucket keeps failing, reduce concurrency, compress prompts, cache input, or route elsewhere.

Does prompt caching help Claude rate limits?

Yes. For most Claude models, cached input reads do not count toward ITPM. Cache creation tokens still count, so caching helps most when repeated context is reused across many requests.

Why do I get 429 even below my RPM limit?

You may be hitting ITPM, OTPM, workspace limits, spend limits, fast mode limits, or short-interval burst enforcement. Anthropic says a 60 RPM limit can be enforced as 1 request per second.

Is 529 the same as 429?

No. 429 means your account hit a rate limit or acceleration limit. 529 means the API is temporarily overloaded across users.

Does Priority Tier fix Claude rate limits?

No. Priority Tier improves service level and capacity priority, but Anthropic says requests still observe regular rate limits. Use it for production predictability, not unlimited throughput.

What should I log for Claude 429 debugging?

Log status code, error type, message, request-id, retry-after, all anthropic-ratelimit-* headers, model, workspace, cache hit rate, and whether fast mode or batch API was used.

Sources

Anthropic Help Center: How usage and length limits work - official consumer 5-hour and weekly usage limits, shared surfaces
Anthropic: Higher usage limits and a compute deal with SpaceX - May 6, 2026 doubling of Claude Code 5-hour limits, peak-hour throttling removed
TechCrunch: Anthropic unveils new rate limits - weekly caps effective Aug 28, 2025, under 5% of subscribers
Anthropic Rate Limits - official RPM, ITPM, OTPM, spend, workspace, batch, fast mode, and headers
Anthropic Errors - official HTTP error code and error shape reference
Anthropic Service Tiers - official Standard, Priority, and Batch tier behavior
Anthropic Prompt Caching - official prompt caching behavior
Anthropic Help Center: API Rate Limits - official support article on RPM, ITPM, OTPM and retry-after
Claude Code Error Reference - official Claude Code error wording and retry guidance
Claude API Usage and Cost - official usage and cost reference
Anthropic Batch Processing - official batch workflow reference
Anthropic OpenAI SDK Compatibility - official OpenAI-compatible SDK path
Anthropic Status - official status page for outages and overloaded periods