TokenMix Research Lab · 2026-04-07

LLM Inference Cost Calculator: AI API Price Comparison per 1K Requests (2026)
Last Updated: 2026-04-29
Author: TokenMix Research Lab
Per-1K-request pricing across 17 major models for 4 task profiles. Quick lookup: small chat $0.04-$7.50/1K, code review $0.19-$27.50/1K, doc processing $0.83-$100/1K, agent loop $1.49-$200/1K — Groq 8B floor, Claude Opus ceiling.
This is the reference table you bookmark. Every major LLM API priced by actual cost per 1,000 requests across four real task sizes — not abstract per-million-token rates that tell you nothing about your actual bill. TokenMix.ai tracks 155+ models in real time. This page distills that data into the numbers you need to make a budget decision in under 60 seconds.
Use this LLM inference calculator to compare AI API costs for your specific workload. Find your task type, check the cost column, multiply by your daily volume. That is your monthly budget.
Table of Contents
- How to Use This LLM Inference Calculator
- The Cost-Per-Task Framework
- Reference Table: Small Chat (500 in / 200 out)
- Reference Table: Code Review (3,000 in / 500 out)
- Reference Table: Document Processing (15,000 in / 1,000 out)
- Reference Table: Agent Loop (25,000 in / 3,000 out)
- Complete AI API Price Comparison: All Models, All Tasks
- Monthly Budget Calculator by Volume
- Hidden Cost Multipliers
- How to Choose: Decision Guide
- Conclusion
- FAQ
How to Use This LLM Inference Calculator
Three-step formula: find your task type, look up cost per 1K requests, multiply by (daily volume / 1000) × 30 = monthly cost. Example: 5K code reviews/day on Sonnet = $2,475/month vs DeepSeek V4 = $210/month (12× difference). Three steps:
- Find your task type in the tables below (small chat, code review, document processing, or agent loop).
- Find your model and note the cost per 1,000 requests.
- Multiply: (cost per 1K requests) x (your daily requests / 1,000) x 30 = monthly cost.
Example: You run 5,000 code review requests per day on Claude Sonnet.
- Cost per 1K requests: $16.50
- Daily cost: $16.50 x 5 = $82.50
- Monthly cost: $82.50 x 30 = $2,475/month
That same workload on DeepSeek V4: $1.40 x 5 x 30 = $210/month. A 12x difference.
The Cost-Per-Task Framework
Per-token pricing is an abstraction — nobody buys tokens, you buy completed tasks. Real cost depends on input length, output length, and token efficiency. A concise model at $2/M can beat a verbose one at $0.50/M. Per-token pricing is an abstraction. Nobody buys tokens — they buy completed tasks. The cost of a completed task depends on three variables:
Input tokens: The prompt length — your system instructions, user message, context, and any documents or code being processed.
Output tokens: The response length — the model's answer, generated code, summary, or analysis.
Token efficiency: How many tokens the model needs to produce a useful response. A concise model at $2/M can be cheaper than a verbose model at $0.50/M.
This calculator uses standardized task profiles based on TokenMix.ai's analysis of production workload data across thousands of API integrations:
| Task Type | Typical Input Tokens | Typical Output Tokens | Input-Heavy or Output-Heavy |
|---|---|---|---|
| Small Chat | 500 | 200 | Balanced |
| Code Review | 3,000 | 500 | Input-heavy |
| Document Processing | 15,000 | 1,000 | Very input-heavy |
| Agent Loop (5 steps) | 25,000 | 3,000 | Input-heavy (context grows) |
Reference Table: Small Chat (500 in / 200 out)
Groq Llama 8B at $0.041/1K is the cheapest small-chat option ($12/month at 10K daily); DeepSeek V4 at $0.25/1K is the frontier-quality bargain; Claude Opus at $7.50/1K is 183× more expensive than the floor.
Typical use: chatbot replies, Q&A, simple instructions, customer support.
| Model | Provider | Input Cost | Output Cost | Cost/Request | Cost/1K Requests | Cost/10K Daily (Monthly) |
|---|---|---|---|---|---|---|
| Groq Llama 8B | Groq | $0.000025 | $0.000016 | $0.000041 | $0.041 | $12 |
| Gemini Flash-Lite | $0.000050 | $0.000080 | $0.000130 | $0.130 | $39 | |
| Mistral Small | Mistral | $0.000100 | $0.000120 | $0.000220 | $0.220 | $66 |
| Grok 4.1 Fast | xAI | $0.000100 | $0.000100 | $0.000200 | $0.200 | $60 |
| GPT-5.4 Nano | OpenAI | $0.000100 | $0.000250 | $0.000350 | $0.350 | $105 |
| DeepSeek V4 | DeepSeek | $0.000150 | $0.000100 | $0.000250 | $0.250 | $75 |
| Gemini Flash | $0.000150 | $0.000500 | $0.000650 | $0.650 | $195 | |
| Groq Llama 70B | Groq | $0.000295 | $0.000158 | $0.000453 | $0.453 | $136 |
| DeepSeek R1 | DeepSeek | $0.000275 | $0.000438 | $0.000713 | $0.713 | $214 |
| Mistral Medium | Mistral | $0.000200 | $0.000400 | $0.000600 | $0.600 | $180 |
| GPT-5.4 Mini | OpenAI | $0.000375 | $0.000900 | $0.001275 | $1.275 | $383 |
| Mistral Large | Mistral | $0.001000 | $0.001200 | $0.002200 | $2.200 | $660 |
| Grok 4.20 | xAI | $0.001000 | $0.001200 | $0.002200 | $2.200 | $660 |
| Gemini Pro | $0.001000 | $0.002400 | $0.003400 | $3.400 | $1,020 | |
| GPT-5.4 | OpenAI | $0.001250 | $0.003000 | $0.004250 | $4.250 | $1,275 |
| Claude Haiku | Anthropic | $0.000500 | $0.001000 | $0.001500 | $1.500 | $450 |
| Claude Sonnet | Anthropic | $0.001500 | $0.003000 | $0.004500 | $4.500 | $1,350 |
| Claude Opus | Anthropic | $0.002500 | $0.005000 | $0.007500 | $7.500 | $2,250 |
Cheapest for small chat: Groq Llama 8B at $0.041 per 1K requests ($12/month at 10K daily). For quality chat, DeepSeek V4 at $0.250/1K ($75/month) is the frontier-quality bargain.
Reference Table: Code Review (3,000 in / 500 out)
DeepSeek V4 at $1.15/1K is the quality-adjusted winner (81% SWE-bench, 13× cheaper than GPT-5.4 at $15/1K); Groq 8B at $0.19/1K wins absolute price but quality limits complex code work.
Typical use: code review, bug detection, test generation, function implementation.
| Model | Provider | Input Cost | Output Cost | Cost/Request | Cost/1K Requests | Cost/10K Daily (Monthly) |
|---|---|---|---|---|---|---|
| Groq Llama 8B | Groq | $0.000150 | $0.000040 | $0.000190 | $0.190 | $57 |
| Gemini Flash-Lite | $0.000300 | $0.000200 | $0.000500 | $0.500 | $150 | |
| Grok 4.1 Fast | xAI | $0.000600 | $0.000250 | $0.000850 | $0.850 | $255 |
| Mistral Small | Mistral | $0.000600 | $0.000300 | $0.000900 | $0.900 | $270 |
| GPT-5.4 Nano | OpenAI | $0.000600 | $0.000625 | $0.001225 | $1.225 | $368 |
| DeepSeek V4 | DeepSeek | $0.000900 | $0.000250 | $0.001150 | $1.150 | $345 |
| Groq Llama 70B | Groq | $0.001770 | $0.000395 | $0.002165 | $2.165 | $650 |
| DeepSeek R1 | DeepSeek | $0.001650 | $0.001095 | $0.002745 | $2.745 | $824 |
| GPT-5.4 Mini | OpenAI | $0.002250 | $0.002250 | $0.004500 | $4.500 | $1,350 |
| Mistral Large | Mistral | $0.006000 | $0.003000 | $0.009000 | $9.000 | $2,700 |
| Grok 4.20 | xAI | $0.006000 | $0.003000 | $0.009000 | $9.000 | $2,700 |
| Gemini Pro | $0.006000 | $0.006000 | $0.012000 | $12.000 | $3,600 | |
| GPT-5.4 | OpenAI | $0.007500 | $0.007500 | $0.015000 | $15.000 | $4,500 |
| Claude Haiku | Anthropic | $0.003000 | $0.002500 | $0.005500 | $5.500 | $1,650 |
| Claude Sonnet | Anthropic | $0.009000 | $0.007500 | $0.016500 | $16.500 | $4,950 |
| Claude Opus | Anthropic | $0.015000 | $0.012500 | $0.027500 | $27.500 | $8,250 |
Cheapest for code review: Groq Llama 8B at $0.190/1K, but 8B models struggle with complex code tasks. DeepSeek V4 at $1.150/1K is the best quality-per-dollar for code — 81% SWE-bench at a fraction of GPT-5.4's cost ($15/1K).
Reference Table: Document Processing (15,000 in / 1,000 out)
Input pricing dominates here — at 15K input tokens per request, $0.05/M (Groq 8B) vs $5/M (Opus) is a 100× cost spread. Gemini Flash-Lite at $1.90/1K is the strongest major-provider value; Claude Opus at $100/1K is the ceiling.
Typical use: document summarization, contract analysis, report extraction, long-form Q&A.
| Model | Provider | Input Cost | Output Cost | Cost/Request | Cost/1K Requests | Cost/10K Daily (Monthly) |
|---|---|---|---|---|---|---|
| Groq Llama 8B | Groq | $0.000750 | $0.000080 | $0.000830 | $0.830 | $249 |
| Gemini Flash-Lite | $0.001500 | $0.000400 | $0.001900 | $1.900 | $570 | |
| Mistral Small | Mistral | $0.003000 | $0.000600 | $0.003600 | $3.600 | $1,080 |
| Grok 4.1 Fast | xAI | $0.003000 | $0.000500 | $0.003500 | $3.500 | $1,050 |
| GPT-5.4 Nano | OpenAI | $0.003000 | $0.001250 | $0.004250 | $4.250 | $1,275 |
| DeepSeek V4 | DeepSeek | $0.004500 | $0.000500 | $0.005000 | $5.000 | $1,500 |
| Groq Llama 70B | Groq | $0.008850 | $0.000790 | $0.009640 | $9.640 | $2,892 |
| DeepSeek R1 | DeepSeek | $0.008250 | $0.002190 | $0.010440 | $10.440 | $3,132 |
| GPT-5.4 Mini | OpenAI | $0.011250 | $0.004500 | $0.015750 | $15.750 | $4,725 |
| Gemini Flash | $0.004500 | $0.002500 | $0.007000 | $7.000 | $2,100 | |
| Mistral Large | Mistral | $0.030000 | $0.006000 | $0.036000 | $36.000 | $10,800 |
| Gemini Pro | $0.030000 | $0.012000 | $0.042000 | $42.000 | $12,600 | |
| GPT-5.4 | OpenAI | $0.037500 | $0.015000 | $0.052500 | $52.500 | $15,750 |
| Claude Haiku | Anthropic | $0.015000 | $0.005000 | $0.020000 | $20.000 | $6,000 |
| Claude Sonnet | Anthropic | $0.045000 | $0.015000 | $0.060000 | $60.000 | $18,000 |
| Claude Opus | Anthropic | $0.075000 | $0.025000 | $0.100000 | $100.000 | $30,000 |
Document processing is where input pricing dominates. At 15,000 input tokens per request, the difference between $0.05/M (Groq 8B) and $5.00/M (Opus) is 100x. Gemini Flash-Lite ($0.10/M input) is the strongest value from a major provider for this workload.
Reference Table: Agent Loop (25,000 in / 3,000 out)
Most expensive workload: context grows per step so input compounds. Claude Opus at $200/1K is 134× the floor (Groq 8B at $1.49/1K). DeepSeek V4 at $9/1K delivers frontier reasoning at 22× under Opus.
Typical use: multi-step tool-use agents, autonomous coding, research assistants, workflow automation. 5 steps with growing context.
| Model | Provider | Input Cost | Output Cost | Cost/Request | Cost/1K Requests | Cost/10K Daily (Monthly) |
|---|---|---|---|---|---|---|
| Groq Llama 8B | Groq | $0.001250 | $0.000240 | $0.001490 | $1.490 | $447 |
| Gemini Flash-Lite | $0.002500 | $0.001200 | $0.003700 | $3.700 | $1,110 | |
| Mistral Small | Mistral | $0.005000 | $0.001800 | $0.006800 | $6.800 | $2,040 |
| Grok 4.1 Fast | xAI | $0.005000 | $0.001500 | $0.006500 | $6.500 | $1,950 |
| GPT-5.4 Nano | OpenAI | $0.005000 | $0.003750 | $0.008750 | $8.750 | $2,625 |
| DeepSeek V4 | DeepSeek | $0.007500 | $0.001500 | $0.009000 | $9.000 | $2,700 |
| Groq Llama 70B | Groq | $0.014750 | $0.002370 | $0.017120 | $17.120 | $5,136 |
| DeepSeek R1 | DeepSeek | $0.013750 | $0.006570 | $0.020320 | $20.320 | $6,096 |
| Gemini Flash | $0.007500 | $0.007500 | $0.015000 | $15.000 | $4,500 | |
| GPT-5.4 Mini | OpenAI | $0.018750 | $0.013500 | $0.032250 | $32.250 | $9,675 |
| Mistral Large | Mistral | $0.050000 | $0.018000 | $0.068000 | $68.000 | $20,400 |
| Grok 4.20 | xAI | $0.050000 | $0.018000 | $0.068000 | $68.000 | $20,400 |
| Gemini Pro | $0.050000 | $0.036000 | $0.086000 | $86.000 | $25,800 | |
| GPT-5.4 | OpenAI | $0.062500 | $0.045000 | $0.107500 | $107.500 | $32,250 |
| Claude Haiku | Anthropic | $0.025000 | $0.015000 | $0.040000 | $40.000 | $12,000 |
| Claude Sonnet | Anthropic | $0.075000 | $0.045000 | $0.120000 | $120.000 | $36,000 |
| Claude Opus | Anthropic | $0.125000 | $0.075000 | $0.200000 | $200.000 | $60,000 |
Agent loops are the most expensive workload category. Context grows with each step, so input costs compound. Running Opus-level agents at 10K loops per day costs $60,000/month. The same workload on DeepSeek V4 costs $2,700/month — a 22x difference with comparable quality.
Complete AI API Price Comparison: All Models, All Tasks
Master matrix: 17 models × 4 task profiles. Cheapest cells: Groq 8B for chat ($0.04) / code ($0.19) / docs ($0.83) / agents ($1.49). Most expensive: Claude Opus ranges $7.50-$200 per 1K depending on task.
Summary table: cost per 1,000 requests across all four task types. Use this as your LLM inference calculator reference.
| Model | Small Chat (/1K) | Code Review (/1K) | Doc Processing (/1K) | Agent Loop (/1K) |
|---|---|---|---|---|
| Groq Llama 8B | $0.04 | $0.19 | $0.83 | $1.49 |
| Gemini Flash-Lite | $0.13 | $0.50 | $1.90 | $3.70 |
| Grok 4.1 Fast | $0.20 | $0.85 | $3.50 | $6.50 |
| Mistral Small | $0.22 | $0.90 | $3.60 | $6.80 |
| DeepSeek V4 | $0.25 | $1.15 | $5.00 | $9.00 |
| GPT-5.4 Nano | $0.35 | $1.23 | $4.25 | $8.75 |
| Groq Llama 70B | $0.45 | $2.17 | $9.64 | $17.12 |
| DeepSeek R1 | $0.71 | $2.75 | $10.44 | $20.32 |
| Claude Haiku | $1.50 | $5.50 | $20.00 | $40.00 |
| GPT-5.4 Mini | $1.28 | $4.50 | $15.75 | $32.25 |
| Mistral Large | $2.20 | $9.00 | $36.00 | $68.00 |
| Grok 4.20 | $2.20 | $9.00 | $36.00 | $68.00 |
| Gemini Pro | $3.40 | $12.00 | $42.00 | $86.00 |
| GPT-5.4 | $4.25 | $15.00 | $52.50 | $107.50 |
| Claude Sonnet | $4.50 | $16.50 | $60.00 | $120.00 |
| Claude Opus | $7.50 | $27.50 | $100.00 | $200.00 |
This table is updated monthly by TokenMix.ai. Real-time pricing available at tokenmix.ai.
Monthly Budget Calculator by Volume
At 100K daily code reviews: Groq 8B = $570/month, Claude Sonnet = $49,500/month — a $48,930 monthly gap. Even DeepSeek V4 ($3,450) vs GPT-5.4 ($45,000) is $41,550/month. Model selection is the single biggest cost lever.
Quick reference: what your monthly bill looks like at different request volumes. Using the "code review" task profile as the baseline.
| Daily Requests | Groq 8B | DeepSeek V4 | GPT-5.4 Nano | GPT-5.4 Mini | GPT-5.4 | Claude Sonnet |
|---|---|---|---|---|---|---|
| 100 | $0.57 | $3.45 | $3.68 | $13.50 | $45.00 | $49.50 |
| 1,000 | $5.70 | $34.50 | $36.75 | $135 | $450 | $495 |
| 5,000 | $28.50 | $172.50 | $183.75 | $675 | $2,250 | $2,475 |
| 10,000 | $57 | $345 | $367.50 | $1,350 | $4,500 | $4,950 |
| 50,000 | $285 | $1,725 | $1,838 | $6,750 | $22,500 | $24,750 |
| 100,000 | $570 | $3,450 | $3,675 | $13,500 | $45,000 | $49,500 |
The scaling math is brutal. At 100K daily code review requests, the difference between Groq 8B ($570/month) and Claude Sonnet ($49,500/month) is $48,930/month. Even the DeepSeek V4 to GPT-5.4 gap is $41,550/month. Model selection is the single biggest cost lever for any AI-powered application.
Hidden Cost Multipliers
Four multipliers shift effective cost 10-75% beyond table numbers: prompt cache (cuts input 25-72%), OpenAI Batch (50% off all OpenAI), Claude tokenizer (8-12% more tokens than OpenAI), retry overhead (~3% on 97.2% uptime providers). These factors change effective pricing by 10-75% and are not reflected in the reference tables above:
Prompt Caching Discount
If your application reuses system prompts (most do), prompt caching cuts input costs by 25-72% depending on cache hit rate. This disproportionately benefits input-heavy tasks (document processing, agent loops).
Impact on the calculator: Multiply the input cost column by (1 - cache_hit_rate x 0.5) for providers that support caching. At 80% cache hit rate, input costs drop by 40%.
Batch Processing Discount (OpenAI Only)
OpenAI's Batch API gives 50% off all costs for async workloads. Apply a 0.5x multiplier to all OpenAI costs in the tables above for batch-eligible workloads.
Token Counting Differences
Claude's tokenizer generates 8-12% more tokens than OpenAI's for the same text. TokenMix.ai testing confirms this across 500 test prompts. Effective Claude costs are 8-12% higher than the table suggests relative to OpenAI models.
Retry Overhead
Models with lower uptime require more retries. DeepSeek's 97.2% uptime adds approximately 3% to effective costs through retry token consumption. Factor this into high-volume calculations.
Which Model Should You Pick? Decision Guide
Match priority to model: absolute cheapest → Groq 8B; quality-cost ratio → DeepSeek V4; cheapest GPT-class → Nano; async batch → GPT-5.4 Batch (50% off); best coding → Claude Opus; mixed workload → TokenMix.ai routing.
| Your Priority | Best Choice | LLM Inference Cost (Code Review/1K) | Trade-off |
|---|---|---|---|
| Absolute cheapest | Groq Llama 8B | $0.19 | Limited quality for complex tasks |
| Cheapest from major provider | Gemini Flash-Lite | $0.50 | Lower quality than frontier models |
| Best quality-to-cost ratio | DeepSeek V4 | $1.15 | 97.2% uptime, China data routing |
| Cheapest GPT-class quality | GPT-5.4 Nano | $1.23 | Smaller model, less capability |
| Cheapest frontier (async) | GPT-5.4 Batch | $7.50 | 24-hour latency |
| Best coding model | Claude Opus | $27.50 | Most expensive option |
| Multi-model optimization | TokenMix.ai routing | Varies | Route each request to cheapest provider |
Related: Compare all model pricing in our complete LLM API pricing comparison
What's the Bottom Line on LLM Inference Cost?
Three takeaways: cheapest model varies by task, input-heavy tasks amplify pricing differences 100×+, batch + cache discounts can flip rankings. Bookmark this table and recheck monthly — provider price cuts are 2-4× more common than increases. This LLM inference cost calculator gives you the one number that matters: cost per completed task. Not cost per token, not cost per million — cost per request at your actual usage pattern.
Three takeaways from the data:
The cheapest model changes by task type. Groq 8B wins for classification and simple chat. DeepSeek V4 wins for code and complex tasks. Gemini Flash-Lite wins for document processing. There is no single cheapest option.
Input-heavy tasks amplify pricing differences. Document processing and agent loops show 100x+ cost differences between budget and premium models. This is where model selection matters most.
Discounts change the rankings. OpenAI Batch API (50% off) and prompt caching (25-72% off input) can make premium models cheaper than budget alternatives for specific workload patterns.
Bookmark this page. TokenMix.ai updates these reference tables monthly as providers adjust pricing. For real-time cost calculations with your exact workload parameters, use the calculator at TokenMix.ai.
FAQ
How do I calculate my LLM API cost per month?
Use this formula: (cost per 1K requests from the tables above) x (daily requests / 1,000) x 30. For example, 5,000 daily code review requests on DeepSeek V4: $1.15 x 5 x 30 = $172.50/month. The tables in this LLM inference calculator provide cost per 1K requests for four standard task sizes.
Why is cost per token misleading for AI API price comparison?
Because different models generate different numbers of tokens for the same task. A model that generates 2x more output tokens at half the price costs the same per task. Additionally, input/output ratios vary by task — a model with cheap input but expensive output looks different for classification (input-heavy) versus content generation (output-heavy). Cost per completed task is the only meaningful metric.
Which LLM API is cheapest for high-volume classification?
Groq Llama 8B at $0.041 per 1,000 classification requests. For 100,000 daily requests, that is $123/month. The next cheapest is Gemini Flash-Lite at $0.130/1K ($390/month for 100K daily). Both provide sufficient quality for standard classification tasks.
How much does prompt caching reduce LLM inference costs?
With an 80% cache hit rate and a 50% cache discount, effective input costs drop by 40%. For input-heavy workloads like document processing (15,000 input tokens per request), this can reduce total request costs by 30-35%. TokenMix.ai monitors cache hit rates across providers and tasks to optimize routing.
Is DeepSeek V4 really 10x cheaper than GPT-5.4 for the same quality?
On benchmark scores, yes — DeepSeek V4 (81% SWE-bench) matches or exceeds GPT-5.4 (80% SWE-bench) at roughly 1/10th the cost per request. On a code review task: DeepSeek V4 costs $1.15/1K vs GPT-5.4 at $15.00/1K. The trade-offs are uptime (97.2% vs 99.7%) and data routing through China.
How often does LLM API pricing change?
Major providers adjust pricing 2-4 times per year. Price cuts are more common than increases — the overall trend is downward. TokenMix.ai tracks pricing changes in real time across all providers. Significant pricing events in the past 6 months include multiple providers cutting prices by 30-60% on mid-tier models.
Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: TokenMix.ai Real-Time Model Pricing, OpenAI Pricing, Anthropic Pricing, Google AI Pricing, DeepSeek Pricing, Groq Pricing, Mistral Pricing