TokenMix Research Lab · 2026-06-08

Token Counting Guide 2026: OpenAI, Claude, Gemini, DeepSeek
Last Updated: 2026-06-08 Author: TokenMix Research Lab Data verified: 2026-06-08 - OpenAI token help, Claude token counting docs, Gemini token guide, DeepSeek context caching docs, and provider pricing pages
Token counting is provider-specific. OpenAI, Claude, Gemini, and DeepSeek expose different tokenizers, counters, and usage fields.
OpenAI recommends its Tokenizer tool or Tiktoken for programmatic tokenization; Anthropic documents messages.count_tokens; Google documents count_tokens and says Gemini tokens are about 4 characters; DeepSeek exposes prompt_cache_hit_tokens and prompt_cache_miss_tokens in usage. A safe guide must not pretend one tokenizer exactly predicts every provider.
Table of Contents
- Quick Verdict
- Provider Counter Matrix
- Core Formula
- 5 Token Workloads
- Word Estimate Rules
- Python Formula
- Billing Traps
- Search Intent Map
- Cost Per Task Calculator
- Decision Matrix
- Monitoring Checklist
- Non-Claims and Caveats
- Final Recommendation
- FAQ
- Sources
- Related Articles
Quick Verdict
| Claim | Status | Source |
|---|---|---|
| OpenAI suggests the Tokenizer tool and Tiktoken for tokenization | Confirmed | OpenAI token help |
| OpenAI usage metadata includes input, output, cached, and reasoning token categories | Confirmed | OpenAI token help |
| All active Claude models support token counting | Confirmed | Claude token counting |
| Gemini tokens are about 4 characters and 100 tokens is about 60-80 English words | Confirmed | Gemini token guide |
| DeepSeek usage exposes prompt cache hit and miss token counts | Confirmed | DeepSeek context caching |
| One tokenizer exactly predicts all LLM providers | False | Tokenizers differ by provider and model |
| Use provider counters for procurement estimates | Likely | Official counters beat generic ratios |
| Word-count estimates are enough for invoices | Speculation | Exact billing needs provider usage logs |
Provider Counter Matrix
| Provider | Best counter | What it reports | Status |
|---|---|---|---|
| OpenAI | Tokenizer / Tiktoken / usage metadata | Input, output, cached, reasoning where relevant | Confirmed |
| Anthropic Claude | messages.count_tokens |
Input token estimate before send | Confirmed |
| Google Gemini | count_tokens |
Input token count | Confirmed |
| DeepSeek | API usage fields | Cache hit/miss input plus output | Confirmed |
| Gateway | Upstream usage logs | Depends on route | Likely |
Use this with How Many Tokens Is 1,000 Words, LLM API Cost Calculator, and OpenAI API Cost Calculator.
Core Formula
The calculator logic for token counting is provider-neutral first: count monthly token volume, apply the provider's current per-million-token rates, then add retries, cache effects, tool calls, and non-token infrastructure. The model-specific price belongs in the final step, not in the mental model.
| Input | Meaning | Status |
|---|---|---|
input_mtok |
Monthly input tokens divided by 1,000,000 | Confirmed |
output_mtok |
Monthly output tokens divided by 1,000,000 | Confirmed |
cache_hit_mtok |
Cached or reused input tokens where provider exposes a lower price | Confirmed |
retry_rate |
Failed calls divided by total attempted calls | Likely |
tool_calls |
Search, retrieval, shell, SQL, or other tool calls per task | Likely |
word_count |
Rough input estimate only | Likely |
usage_metadata |
Actual provider-reported billing fields | Confirmed |
from dataclasses import dataclass
@dataclass
class TokenPrice:
input_per_m: float
output_per_m: float
cached_input_per_m: float | None = None
def llm_cost(input_tokens, output_tokens, price: TokenPrice, cached_input_tokens=0, retry_rate=0.0):
uncached_input = max(input_tokens - cached_input_tokens, 0)
input_cost = uncached_input / 1_000_000 * price.input_per_m
if price.cached_input_per_m is not None:
input_cost += cached_input_tokens / 1_000_000 * price.cached_input_per_m
else:
input_cost += cached_input_tokens / 1_000_000 * price.input_per_m
output_cost = output_tokens / 1_000_000 * price.output_per_m
return (input_cost + output_cost) * (1 + retry_rate)
Use provider-specific token counters only after you have measured average input, average output, retries, cache hit rate, and tool calls. A model that is cheap per token can still lose if it causes extra retries or longer output.
5 Token Workloads
These five workloads are intentionally concrete. Replace the numbers with your own logs before procurement.
| Workload | Monthly volume | Token/tool shape | Calculator output | Status |
|---|---|---|---|---|
| Short prompt | 500 words | Estimate then provider-count | Good for draft budgets | Likely |
| Blog article | 2,000 words | Long text input | Provider tokenizer required | Likely |
| RAG chunks | 100 chunks | 1K tokens/chunk target | Chunk policy matters | Likely |
| Coding context | 50 files | Whitespace and symbols | Word ratios fail | Likely |
| Multimodal Gemini | 1,000 calls | Text plus image/audio | Non-text modalities tokenized too | Confirmed |
Scenario math should be written as tokens first and dollars second. That keeps the estimate portable across OpenAI, Claude, Gemini, DeepSeek, Groq, or an OpenAI-compatible gateway.
Word Estimate Rules
| Rule | Use | Status |
|---|---|---|
| Gemini: 100 tokens about 60-80 English words | Gemini rough estimate | Confirmed |
| English prose often tokenizes near 0.75 words/token to 0.8 words/token | Rough cross-provider planning | Likely |
| Code, JSON, and tables can be denser than prose | Do not use prose ratio | Likely |
| Chinese/Japanese token ratios differ from English | Use provider tokenizer | Likely |
| Exact invoice token counts require API metadata | Billing truth | Confirmed |
The safe estimate for planning is a range, not a single number. Exact billing comes from provider usage metadata.
Python Formula
def rough_english_tokens(words, low_words_per_token=0.80, high_words_per_token=0.60):
low_tokens = words / low_words_per_token
high_tokens = words / high_words_per_token
return round(low_tokens), round(high_tokens)
print(rough_english_tokens(1000)) # rough range, not invoice truth
For OpenAI, use Tiktoken or the Tokenizer. For Claude and Gemini, use their official token counting APIs. For DeepSeek, read the usage object returned by the API.
Billing Traps
The calculator is only useful if it catches the hidden multipliers. These are the traps that turn cheap demo calls into expensive production months.
| Trap | Cost symptom | Fix | Status |
|---|---|---|---|
| Using word count as invoice count | Budget misses bill | Use provider metadata | False assumption |
| Ignoring cached tokens | Input cost estimate wrong | Separate cache hit/miss | Confirmed |
| Ignoring reasoning tokens | Advanced model cost surprises | Read usage fields | Confirmed |
| Using OpenAI tokenizer for Claude/Gemini | Mismatched estimate | Use provider counter | Likely |
| Code counted like prose | Underestimate | Count actual files | Likely |
A cost calculator should show cost per successful task, not only cost per API call. Failed calls, retries, cache misses, and long outputs are still part of the bill.
Search Intent Map
| Search query | What the user really needs | Best answer | Status |
|---|---|---|---|
token counting guide |
A current, non-marketing answer | Compare official limits and cost controls | Confirmed |
token counting guide pricing |
Whether this becomes a monthly bill | Use per-task math, not sticker price | Confirmed |
token counting guide free |
Whether a no-cost path exists | Treat free quota as testing capacity | Likely |
token counting guide error |
Why setup fails | Check auth, quota, region, and model access | Likely |
token counting guide alternative |
Whether another route is safer | Compare direct API, gateway, and self-hosting | Likely |
This is the reason the article is structured around tables instead of a narrative review. Search traffic for these terms usually comes from blocked developers, not readers browsing AI news.
Cost Per Task Calculator
| Cost component | Formula | Why it matters | Status |
|---|---|---|---|
| Input tokens | input MTok x input price | Long prompts dominate retrieval and agents | Confirmed |
| Output tokens | output MTok x output price | Reasoning and verbose answers compound cost | Confirmed |
| Retry waste | failed calls x average cost | 429 and timeout loops become real spend | Likely |
| Human review | minutes saved or added x hourly rate | Tooling can shift, not remove, labor cost | Likely |
| Infrastructure | storage, runners, or hosted platform cost | Non-token cost often appears later | Confirmed |
Use this minimum calculator before choosing a provider: 30 days x calls per day x average input tokens x input price, plus 30 days x calls per day x average output tokens x output price. Then add retries. If the retry rate is 10%, your apparent price is already 1.1x before latency or support cost.
| Monthly calls | Avg input | Avg output | Token volume | Operational reading |
|---|---|---|---|---|
| 1,000 | 1K | 300 | 1M in / 0.3M out | Prototype |
| 10,000 | 2K | 600 | 20M in / 6M out | Small app |
| 100,000 | 4K | 1K | 400M in / 100M out | Production workload |
| 1,000,000 | 2K | 500 | 2B in / 500M out | Procurement problem |
Decision Matrix
| If your situation is... | Default move | Why | Confidence |
|---|---|---|---|
| You are still prototyping | Use the lowest-friction official route | Learning speed beats premature optimization | Likely |
| You have user-facing traffic | Add fallback and spend caps before launch | Users feel quota failures immediately | Confirmed |
| You have compliance constraints | Prefer direct vendor, cloud marketplace, or audited gateway | Procurement trail matters | Likely |
| You have high volume but flexible latency | Test batch or async processing | Batch discounts can beat realtime routes | Confirmed where documented |
| You have unknown token shape | Run a 7-day sample before committing | Average prompts hide tail risk | Likely |
| You need newest model features | Check direct provider docs first | Gateways and clouds may lag direct release | Likely |
The durable rule: do not optimize for the cheapest successful demo. Optimize for the cheapest successful month with logs, retries, fallback, and support.
def pick_route(stage, traffic, compliance, latency_flexible):
if stage == "prototype" and traffic < 1000:
return "official_free_or_low_cost_route"
if compliance == "strict":
return "direct_vendor_or_cloud_marketplace"
if latency_flexible and traffic > 100000:
return "batch_or_async_route"
if traffic > 10000:
return "gateway_with_budget_caps"
return "direct_api_with_monitoring"
Monitoring Checklist
| Metric | Alert threshold | Why | Status |
|---|---|---|---|
| 429 rate | >2% sustained | Quota is now user-visible | Confirmed |
| Retry multiplier | >1.1x | Hidden cost leak | Likely |
| Fallback rate | >10% | Primary route is unstable | Likely |
| Output/input ratio | Sudden 2x jump | Prompt or model behavior changed | Likely |
| Cost per successful task | Week-over-week increase | Real business KPI | Confirmed |
| Error by model | Any model-specific spike | Route or provider issue | Confirmed |
| User-level spend | Outlier user >5x median | Abuse or runaway workflow | Likely |
The operational test is simple: if you cannot answer which model, user, route, or retry loop created the cost, you are not ready to scale that workflow.
Non-Claims and Caveats
| Not claimed | Reason | Label |
|---|---|---|
| Universal benchmark superiority | No single benchmark covers every workload and provider route | False as a broad claim |
| Permanent free availability | Free tiers and previews can change | Speculation |
| Guaranteed model access in every region | Providers gate by region, tier, quota, or account status | False as a broad claim |
| Refund availability without official text | Refund terms must come from provider policy or support | Speculation |
| Identical pricing across direct API, cloud, and gateway | Routing layer, region, priority, and batch mode can change cost | False as a broad claim |
| Production safety from docs alone | Real workloads need logs and failure drills | Confirmed |
This article uses official docs for hard numbers and marks forward-looking guidance as Likely or Speculation. If a provider changes a price, model name, rate limit, or credit rule after the data verification date, the conclusion should be rechecked before procurement.
Final Recommendation
Use word ratios only for first-pass planning. For real budgets, count tokens with the provider's own tool or read API usage metadata after calls. Cross-provider token estimates are always approximate.
FAQ
How do I count tokens across providers?
Use each provider's counter: OpenAI Tokenizer/Tiktoken, Claude count_tokens, Gemini count_tokens, and DeepSeek usage fields.
Is one token one word?
No. Gemini says 100 tokens is about 60-80 English words, and tokenization varies by provider and text type.
Can I use OpenAI's tokenizer for Claude?
Only as a rough estimate. For Claude procurement, use Anthropic's count_tokens endpoint.
How does DeepSeek report cached tokens?
DeepSeek usage includes prompt_cache_hit_tokens and prompt_cache_miss_tokens.
Why does code break word ratios?
Code contains symbols, whitespace, identifiers, and punctuation that tokenize differently from prose.
What is the safest estimate?
Use a range for drafts and provider usage metadata for billing.
Do cached tokens still count?
Yes, but they may be billed differently depending on the provider.
Sources
- OpenAI Token Help
- OpenAI API Pricing
- Claude Token Counting
- Claude Pricing
- Gemini Token Guide
- Gemini API Pricing
- DeepSeek Context Caching
- DeepSeek Pricing