TokenMix Research Lab · 2026-04-25

The Complete Claude Limits Guide 2026: Tokens, Uploads, 5-Hour Cap Explained

Anthropic enforces six distinct usage limits on Claude across free, Pro, Max, and API tiers. They interact in non-obvious ways — hitting one doesn't tell you which cap is actually active. This complete guide covers every Claude limit as of April 2026: the 5-hour rolling cap, context window, output tokens, file uploads, per-tier message quotas, and API rate limits. Plus the exact math on when to upgrade vs when to switch to API vs when to route through an aggregator. Verified against Claude.ai, Claude Desktop, Claude API, Claude Code, and Amazon Bedrock as of this week.

The Six Limits at a Glance

Limit	Applies to	Free	Pro ($20/mo)	Max ($200/mo)	API
Context window	per conversation	200K	200K	1M	1M (Opus/Sonnet), 200K (Haiku)
Output tokens	per response	4K	8K	8K	8K
Messages per 5h	web cap	~10-20	~45	~225	N/A
File uploads	per message	5 files	5 files	5 files	N/A (content-based)
File size	per file	30MB	30MB	30MB	varies by format
API rate limit	tokens/min	N/A	N/A	N/A	tier-dependent

These interact. Hitting your messages cap doesn't free up context window. Hitting context window doesn't affect daily message count.

Limit 1 — Context Window

The maximum tokens in a single conversation's total history (system prompt + all messages + expected response).

Free/Pro: 200,000 tokens ≈ 150,000 English words ≈ 500-pages of text Max: 1,000,000 tokens ≈ 750,000 words ≈ equivalent of several full books API: 1M for Opus 4.7/Sonnet 4.6, 200K for Haiku 4.5

Common failure mode: you paste a long document, have 10 back-and-forth messages, then get a "context limit exceeded" error. The document + messages + response buffer exceeded your tier's limit.

Fix:

Start a new conversation (resets context budget)
Upgrade to Max for 1M context
Use API directly — 1M available on all API tiers for Opus/Sonnet

Limit 2 — Output Tokens

Max length of a single response.

Free: 4,096 tokens (~~3K words) Pro/Max/API: 8,192 tokens (~~6K words)

Note: Claude will often stop earlier when it thinks the response is complete. Hitting 8K is rare unless explicitly requesting very long output.

Workaround for longer outputs: ask Claude to continue (it picks up where it left off). For truly long outputs, split the task into chunks.

Limit 3 — 5-Hour Rolling Message Cap

The most frustrating limit. Applies to Claude.ai web/desktop only, not API.

Mechanics:

Rolling 5-hour window, not "reset at midnight"
Opus 4.7 counts against the cap faster than Sonnet or Haiku
Limits vary by plan and observed usage patterns

Observed caps (April 2026):

Free: ~10-20 messages in a 5-hour window
Pro: ~40-50 Opus messages or ~150-200 Sonnet messages per window
Max: approximately 5× Pro limits

When hit, you're locked out of Claude.ai until the oldest messages age out. Max plan users sometimes still hit this during marathon coding sessions.

Fix options:

Wait for the rolling window to reset
Switch to API (no 5-hour cap — different rate limit model)
Use Cursor/Claude Code subscriptions (separate quota pool)
Route through TokenMix.ai for cross-provider failover when Anthropic is rate-limited

Limit 4 — File Uploads

Maximum files per message (web/desktop):

All tiers: 5 files per message
Size: 30MB per file

Supported formats: PDF, DOCX, CSV, TXT, code files, images (PNG, JPG, GIF, WEBP).

Common failure: trying to upload 10 documents at once. Batch them into groups of 5, or combine into a single PDF.

API equivalent: file uploads use the Files API or inline base64. Different quota mechanics — content size matters more than file count.

Limit 5 — API Rate Limits

Separate system for developer API access. Tiered based on spend and verification:

Tier 1 (new accounts):

50 requests/min on Opus 4.7, Sonnet 4.6
50,000 tokens/min input
8,000 tokens/min output

Tier 2-4: increase with sustained usage. Tier 4 includes 4,000 req/min and several million tokens/min.

Rate limit errors: HTTP 429 with Retry-After header telling you when to retry.

Workaround pattern:

import time
from anthropic import Anthropic, APIStatusError

client = Anthropic()

def call_with_backoff(**kwargs):
    for attempt in range(5):
        try:
            return client.messages.create(**kwargs)
        except APIStatusError as e:
            if e.status_code == 429:
                retry_after = int(e.response.headers.get('retry-after', 2 ** attempt))
                time.sleep(retry_after)
                continue
            raise
    raise Exception("Retries exhausted")

Limit 6 — The 529 Overloaded Error

Not technically a "limit" — it's capacity shedding. Anthropic returns HTTP 529 when their infrastructure is saturated. This is separate from rate limits:

429 = your account exceeded quota
529 = Anthropic's overall capacity is full

529 spiked after Claude Opus 4.7's April 16 release. Workaround: exponential backoff retry, fallback to Sonnet or Haiku tier, or route through an aggregator like TokenMix.ai that automatically fails over to GPT-5.5, DeepSeek V4-Pro, or Kimi K2.6 when Anthropic is overloaded.

Choosing the Right Tier

Decision matrix based on observed usage patterns:

Use Free tier if:

<10 messages/day casual use
Basic document Q&A
Evaluating whether Claude fits your workflow

Use Pro ($20/mo) if:

40-100 messages/day regular use
Occasional file analysis work
Coding assistance as secondary workflow

Use Max ($200/mo) if:

100+ messages/day primary workflow
Need 1M context window regularly
Always-on heavy reasoning tasks
Rarely need API for automation

Use API direct if:

Any programmatic use (scripts, apps, custom tools)
Need consistent rate limits (API doesn't have 5-hour cap)
Multiple team members sharing access
Integration with IDE or custom tooling

Use aggregator (TokenMix.ai, OpenRouter) if:

Multi-model routing (Claude + GPT-5.5 + DeepSeek + Kimi)
Automatic provider failover for reliability
Unified billing across providers
A/B testing models without multiple subscriptions

Cost Math: Pro/Max vs API

Assume heavy Pro user hitting caps daily, estimated 500K-1M tokens/day actual usage mixed across tiers:

Max plan: $200/mo flat

Anthropic API direct:

50% Opus 4.7 + 50% Sonnet 4.6 usage: ~$60-150/month
80% Sonnet + 20% Opus: ~$40-80/month
Most savings when you route cheaper models (Haiku) for routine work

Via TokenMix.ai: same or slightly lower than Anthropic direct (pay-per-token), plus access to 300+ other models. Useful for cost optimization — route classification to DeepSeek V4-Flash ($0.14/$0.28), agent work to Kimi K2.6 ($0.60/$2.50), frontier tasks to Claude Opus 4.7.

Rule of thumb: if you're hitting Pro limits and considering Max, check whether API cost is lower first. It usually is by 40-60% for API-accessible workflows.

Context Window Optimization

When context is the binding constraint:

1. Summarize older messages. Claude can summarize the first N messages, then discard them, preserving key facts in a compressed prefix.

2. Use explicit file references instead of pasting content. Instead of pasting a 10K-token file, reference it by path and let Claude Code/Cursor read it on demand.

3. Start fresh conversations for unrelated work. Context is per-conversation; new conversation = fresh budget.

4. Use the largest-context tier. 1M context on Max or API unlocks workflows that 200K can't handle.

Output Token Optimization

When you need longer output than 8K:

1. Chain responses. Ask for part 1, save, ask for part 2 continuing from part 1.

2. Ask Claude to structure as a document outline first. Then fill in sections one by one, each under 8K.

3. Route to a model with higher output limits. GPT-5.5 has 16K output limit — twice Claude's.

Special Cases

Claude Code: separate quota from Claude.ai. Uses API under the hood. Can be rate-limited independently from web.

Cursor with Claude: if using Cursor Pro, Cursor's quota is separate. If using BYOK with Anthropic, you consume your own API tier.

AWS Bedrock Claude: separate quota from Anthropic direct. Useful for teams with AWS commitments that want to avoid cross-billing, but typically not cheaper than direct.

Azure doesn't have Claude. Microsoft-Anthropic partnership hasn't shipped Claude on Azure as of April 2026.

Monitoring Your Usage

Claude.ai web: Settings → Usage shows rolling usage for the current 5-hour window.

API: /v1/usage endpoint or Anthropic console shows tokens consumed, remaining quota.

Via aggregator: TokenMix.ai dashboard consolidates usage across Claude, GPT, DeepSeek, Kimi — useful for teams routing multiple models through one key.

FAQ

Does the 5-hour limit reset all at once?

No. It rolls — messages from the start of the window age out first. You'll see gradual restoration, not instant reset.

Can I check exactly how many messages I have left?

Partially. Claude.ai Settings shows rolling usage. Exact countdown to "cap hit" isn't exposed — partly because weightings change based on model chosen.

Do images count against message count the same as text?

Yes. Each message counts as one, regardless of content type. A message with 5 images = 1 message against cap.

Does API have a 5-hour cap?

No. API uses per-minute rate limits, separate from the web cap. That's why heavy users often move to API.

What happens to my quota if I upgrade mid-cycle?

Upgrade applies immediately with pro-rated pricing. Quota boost reflects new tier right away.

Can I share a Max subscription with my team?

Anthropic's TOS forbids account sharing. Enterprise plans (Team) exist for multi-user access — contact Anthropic sales.

Does context length affect rate limits?

Sort of. Larger context = more tokens = consumes your per-minute token budget faster. Request rate limit stays the same, but you hit token limit sooner with long contexts.

How do aggregators help with Claude limits?

Aggregators like TokenMix.ai provide unified access to Claude plus 300+ other models. When Claude is rate-limited, 529'd, or over-quota, requests transparently route to GPT-5.5, DeepSeek V4-Pro, or Kimi K2.6 based on your configured fallback chain. For production workloads, this eliminates Claude-specific limits as a point of failure.

Is there an unlimited Claude option?

No. Even enterprise contracts with Anthropic have committed capacity rather than truly unlimited usage. The closest to unlimited for production is routing across multiple providers via aggregator — when one is rate-limited, others absorb traffic.

By TokenMix Research Lab · Updated 2026-04-24

Sources: Anthropic pricing, Anthropic API rate limits, Claude.ai plans, Claude API documentation, TokenMix.ai multi-model access