TokenMix Research Lab · 2026-04-13

AI API Cost Per Request in 2026: From $0.001 for Simple Chat to $0.50 for Document Processing

AI API Cost Per Request: How Much Does Each LLM API Call Actually Cost? (2026)

Last Updated: 2026-04-29
Author: TokenMix Research Lab

The real cost of an AI API call ranges from $0.00003 for a simple classification to $0.50 or more for processing a long document with a premium model. Most developers underestimate costs at scale and overestimate costs for simple tasks. This guide breaks down the actual AI API cost per request across 12 models, five common use cases, and three volume tiers. Every price calculated from official provider rates tracked by TokenMix.ai as of April 2026.

Quick Cost Comparison: 12 Models at a Glance
How LLM Cost Per Query Is Calculated
Simple Chat Request Costs ($0.001-$0.01)
Code Review Request Costs ($0.01-$0.05)
Document Processing Costs ($0.05-$0.50)
Full Cost-Per-Request Table: 5 Use Cases x 12 Models
Cost at Scale: Monthly Projections
Hidden Cost Factors Most Developers Miss
How to Reduce Your AI API Cost Per Request
Conclusion
FAQ

Quick Cost Comparison: 12 Models at a Glance

Model	Provider	Input Price (/1M tokens)	Output Price (/1M tokens)	Simple Chat Cost	Document Cost
GPT-4.1 nano	OpenAI	$0.10	$0.40	$0.00006	$0.005
Gemini 2.0 Flash	Google	$0.075	$0.30	$0.00005	$0.004
DeepSeek V3	DeepSeek	$0.14	$0.28	$0.00005	$0.004
Llama 4 Scout (via Groq)	Meta/Groq	$0.11	$0.34	$0.00006	$0.005
GPT-4.1 mini	OpenAI	$0.40	$1.60	$0.0002	$0.02
Claude Haiku 3.5	Anthropic	$0.80	$4.00	$0.0005	$0.05
Gemini 3.1 Pro	Google	$1.25	$5.00	$0.0007	$0.06
GPT-4.1	OpenAI	$2.00	$8.00	$0.001	$0.09
DeepSeek V4	DeepSeek	$0.50	$2.00	$0.0003	$0.025
GPT-5.4	OpenAI	$2.50	$10.00	$0.0014	$0.12
Claude Sonnet 4	Anthropic	$3.00	$15.00	$0.0018	$0.17
Claude Opus 4.6	Anthropic	$15.00	$75.00	$0.009	$0.84

Prices as of April 2026. Live pricing at TokenMix.ai.

How LLM Cost Per Query Is Calculated

Every AI API call has a predictable cost formula. Understanding this formula is the key to accurate budgeting.

The formula:

Cost = (Input Tokens x Input Price per Token) + (Output Tokens x Output Price per Token)

What determines the token count:

Input tokens = system prompt + user message + conversation history + any context/documents
Output tokens = the model's response length

Typical token counts by use case:

Use Case	Input Tokens	Output Tokens	Total Tokens
Simple chat (1 question)	100-300	100-300	200-600
Code review (100 lines)	1,500-3,000	500-1,500	2,000-4,500
Document summary (5 pages)	4,000-8,000	300-800	4,300-8,800
Long document processing (20 pages)	15,000-30,000	500-2,000	15,500-32,000
Data extraction (structured)	500-2,000	200-500	700-2,500

Token-to-word ratio: In English, 1 token is approximately 0.75 words. A 1,000-word document is roughly 1,333 tokens. Non-English text (Chinese, Japanese, Korean) uses 1.5-3x more tokens per word.

Simple Chat Request Costs ($0.001-$0.01)

A simple chat request -- one question, one answer, minimal system prompt -- is the cheapest type of AI API call. This covers customer support responses, FAQ bots, simple Q&A, and classification tasks.

Assumptions: 200 input tokens, 200 output tokens (total 400 tokens).

Model	Input Cost	Output Cost	Total Cost Per Request
Gemini 2.0 Flash	$0.000015	$0.000060	$0.000075
DeepSeek V3	$0.000028	$0.000056	$0.000084
GPT-4.1 nano	$0.000020	$0.000080	$0.000100
Llama 4 Scout (Groq)	$0.000022	$0.000068	$0.000090
GPT-4.1 mini	$0.000080	$0.000320	$0.000400
DeepSeek V4	$0.000100	$0.000400	$0.000500
Claude Haiku 3.5	$0.000160	$0.000800	$0.000960
Gemini 3.1 Pro	$0.000250	$0.001000	$0.001250
GPT-4.1	$0.000400	$0.001600	$0.002000
GPT-5.4	$0.000500	$0.002000	$0.002500
Claude Sonnet 4	$0.000600	$0.003000	$0.003600
Claude Opus 4.6	$0.003000	$0.015000	$0.018000

Key insight: The cheapest models (Gemini Flash, DeepSeek V3, GPT-4.1 nano) cost under $0.0001 per simple chat request. At that price, 10,000 requests cost under $1. For simple tasks, cost is effectively negligible.

Code Review Request Costs ($0.01-$0.05)

Code review involves larger inputs (the code being reviewed) and more substantial outputs (detailed feedback). This category also covers code generation, bug detection, and refactoring suggestions.

Assumptions: 2,500 input tokens (code + instructions), 1,000 output tokens (review comments).

Model	Input Cost	Output Cost	Total Cost Per Request
Gemini 2.0 Flash	$0.000188	$0.000300	$0.000488
DeepSeek V3	$0.000350	$0.000280	$0.000630
GPT-4.1 nano	$0.000250	$0.000400	$0.000650
GPT-4.1 mini	$0.001000	$0.001600	$0.002600
DeepSeek V4	$0.001250	$0.002000	$0.003250
Claude Haiku 3.5	$0.002000	$0.004000	$0.006000
Gemini 3.1 Pro	$0.003125	$0.005000	$0.008125
GPT-4.1	$0.005000	$0.008000	$0.013000
GPT-5.4	$0.006250	$0.010000	$0.016250
Claude Sonnet 4	$0.007500	$0.015000	$0.022500
Claude Opus 4.6	$0.037500	$0.075000	$0.112500

For code review, model quality matters more. TokenMix.ai benchmark data shows that flagship models (GPT-4.1, Claude Sonnet 4, DeepSeek V4) catch 30-50% more bugs than budget models on complex codebases. The premium is often worth it for code review tasks.

Document Processing Costs ($0.05-$0.50)

Document processing -- summarization, extraction, analysis of multi-page documents -- is where AI API costs per request become significant. Long inputs drive up costs fast.

Assumptions: 10,000 input tokens (~7,500 words, ~15 pages), 1,000 output tokens (summary or extracted data).

Model	Input Cost	Output Cost	Total Cost Per Request
Gemini 2.0 Flash	$0.000750	$0.000300	$0.001050
DeepSeek V3	$0.001400	$0.000280	$0.001680
GPT-4.1 nano	$0.001000	$0.000400	$0.001400
GPT-4.1 mini	$0.004000	$0.001600	$0.005600
DeepSeek V4	$0.005000	$0.002000	$0.007000
Claude Haiku 3.5	$0.008000	$0.004000	$0.012000
Gemini 3.1 Pro	$0.012500	$0.005000	$0.017500
GPT-4.1	$0.020000	$0.008000	$0.028000
GPT-5.4	$0.025000	$0.010000	$0.035000
Claude Sonnet 4	$0.030000	$0.015000	$0.045000
Claude Opus 4.6	$0.150000	$0.075000	$0.225000

For large document processing (50+ pages, 50K+ tokens):

Processing a 50,000-token document on Claude Opus 4.6 costs $0.75 input alone. On Gemini 2.0 Flash, the same document costs $0.00375 input. That is a 200x price difference. For document-heavy workloads, model choice drives costs more than anything else.

Full Cost-Per-Request Table: 5 Use Cases x 12 Models

Model	Simple Chat	Classification	Code Review	Doc Summary	Long Doc Processing
	(400 tokens)	(300 tokens)	(3,500 tokens)	(6,000 tokens)	(32,000 tokens)
Gemini 2.0 Flash	$0.00008	$0.00005	$0.0005	$0.001	$0.004
DeepSeek V3	$0.00008	$0.00006	$0.0006	$0.001	$0.005
GPT-4.1 nano	$0.00010	$0.00006	$0.0007	$0.001	$0.005
Llama 4 Scout	$0.00009	$0.00006	$0.0006	$0.001	$0.005
GPT-4.1 mini	$0.0004	$0.0003	$0.003	$0.005	$0.02
DeepSeek V4	$0.0005	$0.0003	$0.003	$0.006	$0.02
Claude Haiku 3.5	$0.001	$0.0006	$0.006	$0.011	$0.05
Gemini 3.1 Pro	$0.001	$0.0008	$0.008	$0.015	$0.06
GPT-4.1	$0.002	$0.001	$0.013	$0.024	$0.10
GPT-5.4	$0.003	$0.002	$0.016	$0.030	$0.13
Claude Sonnet 4	$0.004	$0.002	$0.023	$0.039	$0.17
Claude Opus 4.6	$0.018	$0.011	$0.113	$0.195	$0.84

All costs per single request. Data from TokenMix.ai price tracking, April 2026.

Cost at Scale: Monthly Projections

Individual request costs are tiny. Monthly costs at production volume tell the real story.

Scenario: Customer support chatbot (simple chat, 50K requests/month)

Model	Cost Per Request	Monthly Cost
Gemini 2.0 Flash	$0.00008	$4
DeepSeek V3	$0.00008	$4
GPT-4.1 nano	$0.00010	$5
GPT-4.1 mini	$0.0004	$20
GPT-4.1	$0.002	$100
Claude Sonnet 4	$0.004	$200

Scenario: Code review pipeline (code review, 5K requests/month)

Model	Cost Per Request	Monthly Cost
GPT-4.1 mini	$0.003	$15
DeepSeek V4	$0.003	$15
GPT-4.1	$0.013	$65
Claude Sonnet 4	$0.023	$115

Scenario: Document processing pipeline (doc summary, 10K requests/month)

Model	Cost Per Request	Monthly Cost
Gemini 2.0 Flash	$0.001	$10
GPT-4.1 mini	$0.005	$50
GPT-4.1	$0.024	$240
Claude Opus 4.6	$0.195	$1,950

The pattern is clear: model choice is the biggest cost lever. Switching from a flagship to a budget model on tasks that do not need flagship quality saves 80-95%.

Hidden Cost Factors Most Developers Miss

1. System prompt overhead. If your system prompt is 500 tokens and you make 100K requests/month, that is 50M tokens just for system prompts. On GPT-4.1 at $2/M input, that is $100/month on instructions alone. Use prompt caching to cut this by 75%.

2. Conversation history accumulation. In multi-turn conversations, each message includes all previous turns. The 5th message in a conversation might include 3,000+ tokens of history. This means later messages cost 5-10x more than the first message.

3. Tokenizer differences. The same text produces different token counts on different providers. TokenMix.ai testing shows Claude tokenizes 8-12% more tokens than GPT for the same English text, and 15-25% more for Chinese text. This makes direct price comparisons misleading.

4. Failed request costs. Rate limit errors (429) and retries mean you pay for input tokens on the failed attempt too. Poorly managed retries add 10-20% to your bill. See our guide on fixing 429 errors.

5. Output verbosity. Without explicit length constraints, models default to verbose responses. Setting max_tokens and adding conciseness instructions in your prompt can reduce output costs by 30-50%.

How to Reduce Your AI API Cost Per Request

Your Goal	Strategy	Expected Savings
Lowest possible cost per request	Use budget models (Gemini Flash, DeepSeek V3, GPT-4.1 nano)	90-95% vs flagship
Same quality, lower cost	Enable prompt caching	50-75% on cached inputs
Non-real-time workloads	Use Batch API	50% flat discount
Mixed-complexity workload	Route by task complexity via TokenMix.ai	30-50% overall
Long documents	Use models with cheaper input rates	40-80% on input costs
Reduce output costs	Set max_tokens, request concise responses	20-40% on output

The optimal strategy for most teams: Use TokenMix.ai to route requests to the cheapest model that meets quality requirements for each specific task. Simple classification goes to GPT-4.1 nano. Standard tasks go to GPT-4.1 mini. Complex reasoning stays on flagship models. This typically reduces overall costs by 40-60% compared to using one model for everything.

Compare real-time costs across all 300+ models at TokenMix.ai.

Conclusion

AI API costs per request range from $0.00003 for budget-model classification to over $0.80 for long-document processing on premium models. The 200x price range across models means choosing the right model for each task is the single most impactful cost decision.

For most production applications, budget models (Gemini 2.0 Flash, DeepSeek V3, GPT-4.1 mini) deliver sufficient quality at 80-95% lower cost than flagship models. Reserve flagship models for tasks where the quality gap is measurable and meaningful.

TokenMix.ai tracks real-time pricing across 300+ models and provides intelligent routing to automatically select the best price-performance option for each request. Check current per-request costs at TokenMix.ai.

FAQ

How much does a single AI API call cost?

A single AI API call costs between $0.00003 and $0.50+ depending on the model and input/output length. A simple chat message on GPT-4.1 mini costs about $0.0004. A long document processed on Claude Opus 4.6 can cost $0.20-$0.80. Budget models like Gemini 2.0 Flash and DeepSeek V3 keep simple requests under $0.0001.

Why do output tokens cost more than input tokens?

Output tokens require the model to generate new content, which is computationally more expensive than processing input. Generating each output token requires running the full model forward pass, while input tokens are processed more efficiently in parallel. Most providers charge 2-5x more for output tokens.

How do I estimate my monthly AI API costs?

Multiply your average cost per request by your expected monthly request volume. For a customer support bot doing 50,000 simple chats/month on GPT-4.1 mini, the cost is approximately $0.0004 x 50,000 = $20/month. Use the tables in this guide to find cost per request for your model and use case.

Which AI model gives the best cost per quality ratio?

For most tasks, GPT-4.1 mini and DeepSeek V3 offer the best cost-to-quality ratio in April 2026. GPT-4.1 mini costs $0.40/M input with strong general performance. DeepSeek V3 costs $0.14/M input with competitive quality. Gemini 2.0 Flash at $0.075/M input is the cheapest capable option. TokenMix.ai tracks cost-performance ratios across all models.

Does using conversation history increase cost per request?

Yes, significantly. Each message in a multi-turn conversation includes all prior messages as input tokens. The 5th turn in a conversation might include 3,000+ tokens of history, making it 5-10x more expensive than the first turn. Use summarization or sliding window techniques to manage history length.

Can I reduce AI API costs without changing models?

Yes. Prompt caching saves 50-75% on repeated system prompts. The Batch API gives a flat 50% discount for non-real-time tasks. Prompt compression reduces token counts by 20-40%. Setting max_tokens limits output verbosity. Combined, these techniques can reduce costs by 40-60% on the same model.

Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: OpenAI Pricing, Anthropic Pricing, Google AI Pricing, TokenMix.ai