TokenMix Research Lab · 2026-06-08

Token Counting Guide 2026: OpenAI, Claude, Gemini, DeepSeek

Last Updated: 2026-06-08 Author: TokenMix Research Lab Data verified: 2026-06-08 - OpenAI token help, Claude token counting docs, Gemini token guide, DeepSeek context caching docs, and provider pricing pages

Token counting is provider-specific. OpenAI, Claude, Gemini, and DeepSeek expose different tokenizers, counters, and usage fields.

OpenAI recommends its Tokenizer tool or Tiktoken for programmatic tokenization; Anthropic documents messages.count_tokens; Google documents count_tokens and says Gemini tokens are about 4 characters; DeepSeek exposes prompt_cache_hit_tokens and prompt_cache_miss_tokens in usage. A safe guide must not pretend one tokenizer exactly predicts every provider.

Quick Verdict
Provider Counter Matrix
Core Formula
5 Token Workloads
Word Estimate Rules
Python Formula
Billing Traps
Search Intent Map
Cost Per Task Calculator
Decision Matrix
Monitoring Checklist
Non-Claims and Caveats
Final Recommendation
FAQ
Sources
Related Articles

Quick Verdict

Claim	Status	Source
OpenAI suggests the Tokenizer tool and Tiktoken for tokenization	Confirmed	OpenAI token help
OpenAI usage metadata includes input, output, cached, and reasoning token categories	Confirmed	OpenAI token help
All active Claude models support token counting	Confirmed	Claude token counting
Gemini tokens are about 4 characters and 100 tokens is about 60-80 English words	Confirmed	Gemini token guide
DeepSeek usage exposes prompt cache hit and miss token counts	Confirmed	DeepSeek context caching
One tokenizer exactly predicts all LLM providers	False	Tokenizers differ by provider and model
Use provider counters for procurement estimates	Likely	Official counters beat generic ratios
Word-count estimates are enough for invoices	Speculation	Exact billing needs provider usage logs

Provider Counter Matrix

Provider	Best counter	What it reports	Status
OpenAI	Tokenizer / Tiktoken / usage metadata	Input, output, cached, reasoning where relevant	Confirmed
Anthropic Claude	`messages.count_tokens`	Input token estimate before send	Confirmed
Google Gemini	`count_tokens`	Input token count	Confirmed
DeepSeek	API usage fields	Cache hit/miss input plus output	Confirmed
Gateway	Upstream usage logs	Depends on route	Likely

Use this with How Many Tokens Is 1,000 Words, LLM API Cost Calculator, and OpenAI API Cost Calculator.

Core Formula

The calculator logic for token counting is provider-neutral first: count monthly token volume, apply the provider's current per-million-token rates, then add retries, cache effects, tool calls, and non-token infrastructure. The model-specific price belongs in the final step, not in the mental model.

Input	Meaning	Status
`input_mtok`	Monthly input tokens divided by 1,000,000	Confirmed
`output_mtok`	Monthly output tokens divided by 1,000,000	Confirmed
`cache_hit_mtok`	Cached or reused input tokens where provider exposes a lower price	Confirmed
`retry_rate`	Failed calls divided by total attempted calls	Likely
`tool_calls`	Search, retrieval, shell, SQL, or other tool calls per task	Likely
`word_count`	Rough input estimate only	Likely
`usage_metadata`	Actual provider-reported billing fields	Confirmed

from dataclasses import dataclass

@dataclass
class TokenPrice:
    input_per_m: float
    output_per_m: float
    cached_input_per_m: float | None = None


def llm_cost(input_tokens, output_tokens, price: TokenPrice, cached_input_tokens=0, retry_rate=0.0):
    uncached_input = max(input_tokens - cached_input_tokens, 0)
    input_cost = uncached_input / 1_000_000 * price.input_per_m
    if price.cached_input_per_m is not None:
        input_cost += cached_input_tokens / 1_000_000 * price.cached_input_per_m
    else:
        input_cost += cached_input_tokens / 1_000_000 * price.input_per_m
    output_cost = output_tokens / 1_000_000 * price.output_per_m
    return (input_cost + output_cost) * (1 + retry_rate)

Use provider-specific token counters only after you have measured average input, average output, retries, cache hit rate, and tool calls. A model that is cheap per token can still lose if it causes extra retries or longer output.

5 Token Workloads

These five workloads are intentionally concrete. Replace the numbers with your own logs before procurement.

Workload	Monthly volume	Token/tool shape	Calculator output	Status
Short prompt	500 words	Estimate then provider-count	Good for draft budgets	Likely
Blog article	2,000 words	Long text input	Provider tokenizer required	Likely
RAG chunks	100 chunks	1K tokens/chunk target	Chunk policy matters	Likely
Coding context	50 files	Whitespace and symbols	Word ratios fail	Likely
Multimodal Gemini	1,000 calls	Text plus image/audio	Non-text modalities tokenized too	Confirmed

Scenario math should be written as tokens first and dollars second. That keeps the estimate portable across OpenAI, Claude, Gemini, DeepSeek, Groq, or an OpenAI-compatible gateway.

Word Estimate Rules

Rule	Use	Status
Gemini: 100 tokens about 60-80 English words	Gemini rough estimate	Confirmed
English prose often tokenizes near 0.75 words/token to 0.8 words/token	Rough cross-provider planning	Likely
Code, JSON, and tables can be denser than prose	Do not use prose ratio	Likely
Chinese/Japanese token ratios differ from English	Use provider tokenizer	Likely
Exact invoice token counts require API metadata	Billing truth	Confirmed

The safe estimate for planning is a range, not a single number. Exact billing comes from provider usage metadata.

Python Formula

def rough_english_tokens(words, low_words_per_token=0.80, high_words_per_token=0.60):
    low_tokens = words / low_words_per_token
    high_tokens = words / high_words_per_token
    return round(low_tokens), round(high_tokens)

print(rough_english_tokens(1000))  # rough range, not invoice truth

For OpenAI, use Tiktoken or the Tokenizer. For Claude and Gemini, use their official token counting APIs. For DeepSeek, read the usage object returned by the API.

Billing Traps

The calculator is only useful if it catches the hidden multipliers. These are the traps that turn cheap demo calls into expensive production months.

Trap	Cost symptom	Fix	Status
Using word count as invoice count	Budget misses bill	Use provider metadata	False assumption
Ignoring cached tokens	Input cost estimate wrong	Separate cache hit/miss	Confirmed
Ignoring reasoning tokens	Advanced model cost surprises	Read usage fields	Confirmed
Using OpenAI tokenizer for Claude/Gemini	Mismatched estimate	Use provider counter	Likely
Code counted like prose	Underestimate	Count actual files	Likely

A cost calculator should show cost per successful task, not only cost per API call. Failed calls, retries, cache misses, and long outputs are still part of the bill.

Search Intent Map

Search query	What the user really needs	Best answer	Status
`token counting guide`	A current, non-marketing answer	Compare official limits and cost controls	Confirmed
`token counting guide pricing`	Whether this becomes a monthly bill	Use per-task math, not sticker price	Confirmed
`token counting guide free`	Whether a no-cost path exists	Treat free quota as testing capacity	Likely
`token counting guide error`	Why setup fails	Check auth, quota, region, and model access	Likely
`token counting guide alternative`	Whether another route is safer	Compare direct API, gateway, and self-hosting	Likely

This is the reason the article is structured around tables instead of a narrative review. Search traffic for these terms usually comes from blocked developers, not readers browsing AI news.

Cost Per Task Calculator

Cost component	Formula	Why it matters	Status
Input tokens	input MTok x input price	Long prompts dominate retrieval and agents	Confirmed
Output tokens	output MTok x output price	Reasoning and verbose answers compound cost	Confirmed
Retry waste	failed calls x average cost	429 and timeout loops become real spend	Likely
Human review	minutes saved or added x hourly rate	Tooling can shift, not remove, labor cost	Likely
Infrastructure	storage, runners, or hosted platform cost	Non-token cost often appears later	Confirmed

Use this minimum calculator before choosing a provider: 30 days x calls per day x average input tokens x input price, plus 30 days x calls per day x average output tokens x output price. Then add retries. If the retry rate is 10%, your apparent price is already 1.1x before latency or support cost.

Monthly calls	Avg input	Avg output	Token volume	Operational reading
1,000	1K	300	1M in / 0.3M out	Prototype
10,000	2K	600	20M in / 6M out	Small app
100,000	4K	1K	400M in / 100M out	Production workload
1,000,000	2K	500	2B in / 500M out	Procurement problem

Decision Matrix

If your situation is...	Default move	Why	Confidence
You are still prototyping	Use the lowest-friction official route	Learning speed beats premature optimization	Likely
You have user-facing traffic	Add fallback and spend caps before launch	Users feel quota failures immediately	Confirmed
You have compliance constraints	Prefer direct vendor, cloud marketplace, or audited gateway	Procurement trail matters	Likely
You have high volume but flexible latency	Test batch or async processing	Batch discounts can beat realtime routes	Confirmed where documented
You have unknown token shape	Run a 7-day sample before committing	Average prompts hide tail risk	Likely
You need newest model features	Check direct provider docs first	Gateways and clouds may lag direct release	Likely

The durable rule: do not optimize for the cheapest successful demo. Optimize for the cheapest successful month with logs, retries, fallback, and support.

def pick_route(stage, traffic, compliance, latency_flexible):
    if stage == "prototype" and traffic < 1000:
        return "official_free_or_low_cost_route"
    if compliance == "strict":
        return "direct_vendor_or_cloud_marketplace"
    if latency_flexible and traffic > 100000:
        return "batch_or_async_route"
    if traffic > 10000:
        return "gateway_with_budget_caps"
    return "direct_api_with_monitoring"

Monitoring Checklist

Metric	Alert threshold	Why	Status
429 rate	>2% sustained	Quota is now user-visible	Confirmed
Retry multiplier	>1.1x	Hidden cost leak	Likely
Fallback rate	>10%	Primary route is unstable	Likely
Output/input ratio	Sudden 2x jump	Prompt or model behavior changed	Likely
Cost per successful task	Week-over-week increase	Real business KPI	Confirmed
Error by model	Any model-specific spike	Route or provider issue	Confirmed
User-level spend	Outlier user >5x median	Abuse or runaway workflow	Likely

The operational test is simple: if you cannot answer which model, user, route, or retry loop created the cost, you are not ready to scale that workflow.

Non-Claims and Caveats

Not claimed	Reason	Label
Universal benchmark superiority	No single benchmark covers every workload and provider route	False as a broad claim
Permanent free availability	Free tiers and previews can change	Speculation
Guaranteed model access in every region	Providers gate by region, tier, quota, or account status	False as a broad claim
Refund availability without official text	Refund terms must come from provider policy or support	Speculation
Identical pricing across direct API, cloud, and gateway	Routing layer, region, priority, and batch mode can change cost	False as a broad claim
Production safety from docs alone	Real workloads need logs and failure drills	Confirmed

This article uses official docs for hard numbers and marks forward-looking guidance as Likely or Speculation. If a provider changes a price, model name, rate limit, or credit rule after the data verification date, the conclusion should be rechecked before procurement.

Final Recommendation

Use word ratios only for first-pass planning. For real budgets, count tokens with the provider's own tool or read API usage metadata after calls. Cross-provider token estimates are always approximate.

FAQ

How do I count tokens across providers?

Use each provider's counter: OpenAI Tokenizer/Tiktoken, Claude count_tokens, Gemini count_tokens, and DeepSeek usage fields.

Is one token one word?

No. Gemini says 100 tokens is about 60-80 English words, and tokenization varies by provider and text type.

Can I use OpenAI's tokenizer for Claude?

Only as a rough estimate. For Claude procurement, use Anthropic's count_tokens endpoint.

How does DeepSeek report cached tokens?

DeepSeek usage includes prompt_cache_hit_tokens and prompt_cache_miss_tokens.

Why does code break word ratios?

Code contains symbols, whitespace, identifiers, and punctuation that tokenize differently from prose.