TokenMix Research Lab · 2026-04-13

Groq Free Tier Limits in 2026: Exact RPM, TPM, and RPD by Model (Complete Reference)
Last Updated: 2026-04-29
Author: TokenMix Research Lab
Per Groq's official rate limits documentation, free tier caps at 30 RPM / 6,000 TPM / 1,000 RPD for most models. Llama 4 Maverick gets half: 15 RPM / 3,000 TPM / 500 RPD. Gemma 2 9B uniquely raises TPM to 15,000 (2.5x default). RPD is the binding constraint — most developers exhaust 1,000 daily requests before hitting RPM/TPM. Free tier value: ~$4-17/mo equivalent on Developer tier ($0.59/$0.79 per M tokens for Llama 3.3 70B).
Groq's free tier is one of the fastest ways to test LLM inference without spending a dollar. But the limits are tighter than most developers expect. This guide breaks down every rate limit by model, shows you how to check remaining quota in real time, and tells you exactly when the free tier stops making sense.
TokenMix.ai tracks Groq's rate limits alongside 300+ other API providers. The data below reflects Groq's published limits as of April 2026.
Table of Contents
- Quick Overview: Groq Free Tier Limits by Model
- Understanding Groq's Rate Limit Structure
- Exact RPM, TPM, and RPD Limits for Every Free Tier Model
- How to Check Your Remaining Groq Rate Limits
- Free Tier vs Developer Tier: What You Actually Get for $0
- When to Upgrade from Groq Free Tier to Developer Tier
- Cost Breakdown: Free Tier Ceiling vs Paid Plans
- How Groq Free Tier Compares to Other Free LLM APIs
- Is Groq Free Tier Enough for Your Use Case?
- FAQ
Quick Overview: Groq Free Tier Limits by Model
Based on Groq's official rate limit documentation, 9 free-tier models share consistent caps. Default tier (most models): 30 RPM / 6,000 TPM / 1,000 RPD. Llama 4 Maverick at half quota: 15 RPM / 3,000 TPM / 500 RPD. Gemma 2 9B unique: 30 RPM / 15,000 TPM (2.5x higher) / 1,000 RPD with smaller 8K context. Hit any one of three caps = 429 status code until window resets.
Here is the complete breakdown of Groq free tier rate limits by model, based on Groq's official documentation.
| Model | Requests/Min (RPM) | Tokens/Min (TPM) | Requests/Day (RPD) | Context Window |
|---|---|---|---|---|
| Llama 3.3 70B | 30 | 6,000 | 1,000 | 128K |
| Llama 3.1 8B | 30 | 6,000 | 1,000 | 128K |
| Llama 4 Scout 17B | 30 | 6,000 | 1,000 | 128K |
| Llama 4 Maverick 17B | 15 | 3,000 | 500 | 128K |
| DeepSeek R1 Distill 70B | 30 | 6,000 | 1,000 | 128K |
| Gemma 2 9B | 30 | 15,000 | 1,000 | 8K |
| Mistral Saba 24B | 30 | 6,000 | 1,000 | 32K |
| Qwen QwQ 32B | 30 | 6,000 | 1,000 | 128K |
| Allam 2 7B | 30 | 6,000 | 1,000 | 128K |
These numbers are hard limits. Hit any one of the three (RPM, TPM, or RPD) and your requests get rejected with a 429 status code until the window resets.
Understanding Groq's Rate Limit Structure
Three-dimensional system per Groq's published rate limits: RPM (rolling 60-sec), TPM (input + output combined), RPD (hard daily ceiling resetting at midnight UTC). 30 RPM = 1 request per 2 seconds. 6,000 TPM = single long prompt + detailed response can eat full minute. 1,000 RPD = ~42 req/hour evenly spread, but most devs burn it in hours of active dev. Limits apply per API key per model — using same key on Llama 70B + R1 Distill = 2,000 combined RPD.
Groq uses a three-dimensional rate limiting system. Most developers only think about requests per minute, but the daily cap is what actually kills free tier usage in production.
Requests Per Minute (RPM): The maximum number of API calls you can make in a rolling 60-second window. At 30 RPM, you get one request every 2 seconds.
Tokens Per Minute (TPM): The total input + output tokens processed per minute. At 6,000 TPM, a single long prompt with a detailed response can eat your entire minute's quota.
Requests Per Day (RPD): The hard daily ceiling. At 1,000 RPD, that's roughly 42 requests per hour if spread evenly. In practice, most developers burn through this in a few hours of active development.
The key insight: RPD is the binding constraint for 90% of free tier users. You will hit the daily limit long before RPM or TPM becomes a problem during normal development.
Groq's rate limits apply per API key, not per model. If you use the same key across multiple models, the limits are tracked separately for each model. This means you can make 1,000 requests to Llama 3.3 70B and 1,000 requests to DeepSeek R1 Distill in the same day.
Exact RPM, TPM, and RPD Limits for Every Free Tier Model
Three quota tiers per Groq's docs: High-quota (Llama 3.3 70B, Llama 3.1 8B, Llama 4 Scout, DeepSeek R1 Distill, Mistral Saba, Qwen QwQ, Allam 2 — all 30/6K/1K) → Reduced-quota (Llama 4 Maverick at 15/3K/500 = half allowance) → Special TPM (Gemma 2 9B at 30/15K/1K = 2.5x more token throughput per minute, trade-off is 8K context vs 128K others).
Let's break these down by practical impact. Not all models on Groq's free tier are equal in terms of what you can actually build.
High-Quota Models (30 RPM / 6,000 TPM / 1,000 RPD)
Most models fall into this category. Llama 3.3 70B, Llama 3.1 8B, Llama 4 Scout, DeepSeek R1 Distill 70B, Mistral Saba 24B, Qwen QwQ 32B, and Allam 2 7B all share the same limits.
At these limits, you can realistically handle:
- 1,000 short chat completions per day (under 200 tokens each)
- 500 medium-length responses per day (400-600 tokens each)
- 200 long-form generations per day (1,000+ tokens each)
Reduced-Quota Models (15 RPM / 3,000 TPM / 500 RPD)
Llama 4 Maverick gets half the allowance of other models. At 500 RPD, you are limited to roughly 20 requests per hour. This model is best reserved for testing, not any kind of sustained workload.
Special Case: Gemma 2 9B (30 RPM / 15,000 TPM / 1,000 RPD)
Gemma 2 9B is the only model with elevated TPM limits (15,000 vs 6,000). If your use case involves long prompts or lengthy outputs, Gemma 2 gives you 2.5x more token throughput per minute than other free tier models. The trade-off is a smaller 8K context window.
How to Check Your Remaining Groq Rate Limits
Per Groq's official rate limits documentation, six headers expose remaining quota per response: x-ratelimit-remaining-requests (RPM left), x-ratelimit-remaining-tokens (TPM left), x-ratelimit-reset-requests/-tokens (reset timestamps). Critical gap: Groq does not expose RPD in headers — track daily count yourself with a counter resetting at midnight UTC. Without RPD tracking, you'll discover the 1,000-request cap mid-production.
Groq returns rate limit information in every API response header. Here is exactly what to look for.
Response headers you need:
| Header | What It Tells You |
|---|---|
x-ratelimit-limit-requests |
Your RPM cap |
x-ratelimit-remaining-requests |
Requests left this minute |
x-ratelimit-reset-requests |
When RPM resets (timestamp) |
x-ratelimit-limit-tokens |
Your TPM cap |
x-ratelimit-remaining-tokens |
Tokens left this minute |
x-ratelimit-reset-tokens |
When TPM resets (timestamp) |
Python code to check limits:
import openai
client = openai.OpenAI(
base_url="https://api.groq.com/openai/v1",
api_key="your-groq-api-key"
)
response = client.chat.completions.with_raw_response.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Hello"}]
)
headers = response.headers
print(f"RPM remaining: {headers.get('x-ratelimit-remaining-requests')}")
print(f"TPM remaining: {headers.get('x-ratelimit-remaining-tokens')}")
print(f"RPM resets at: {headers.get('x-ratelimit-reset-requests')}")
Important: Groq does not expose daily request counts (RPD) in response headers. The only way to track daily usage is to count requests on your side. Build a simple counter that resets at midnight UTC.
For real-time monitoring across multiple providers, TokenMix.ai tracks rate limit headers automatically and alerts you before you hit caps.
Free Tier vs Developer Tier: What You Actually Get for $0
Per Groq's published pricing, Developer tier removes daily cap entirely and 33x RPM increase (30 → 1,000+). TPM jumps 16x (6,000 → 100,000+). Per Groq's pricing docs, Llama 3.3 70B Developer cost: $0.59/M input + $0.79/M output. Plus access to: all models (vs ~10 free), full audio models, batch API (50% discount), 99.9% SLA, email support. Pricing may vary by region and commitment tier per Groq's documentation.
The gap between Groq's free tier and Developer tier is substantial.
| Dimension | Free Tier | Developer Tier |
|---|---|---|
| Price | $0 | Pay-per-token |
| RPM (most models) | 30 | 1,000+ |
| TPM (most models) | 6,000 | 100,000+ |
| RPD | 1,000 | No daily cap |
| Models available | ~10 | All models |
| Audio models | Limited | Full access |
| SLA | None | 99.9% uptime |
| Support | Community | Email support |
| Batch API | No | Yes |
The Developer tier removes the daily request cap entirely. That alone makes it worth the switch for any workload beyond prototyping. Pricing on the Developer tier is competitive: Llama 3.3 70B runs at $0.59 per million input tokens and $0.79 per million output tokens.
When to Upgrade from Groq Free Tier to Developer Tier
Four upgrade triggers: (1) Hitting 1,000 RPD twice/week (Developer tier 1,000 requests of avg length costs $0.50-1.00/day per Groq's pricing). (2) Need consistent availability — free tier deprioritized at peak per Groq's tier policy. (3) User-facing app — 30 RPM saturates with 5 concurrent users instantly. (4) Need batch processing — free tier excludes batch API which Groq lists at 50% discount. Most developers upgrade within 2-3 weeks of starting if building anything real.
The decision point is straightforward. Upgrade when any of these apply:
You are hitting 1,000 RPD regularly. If you exhaust daily limits more than twice a week, the free tier is holding you back. At Developer tier rates, 1,000 requests of average length costs about $0.50-$1.00 per day. That is a trivial expense compared to the development time lost waiting for limits to reset.
You need consistent availability. The free tier has no SLA. During peak hours, Groq may deprioritize free tier traffic. Developer tier gets priority routing.
You are building a product others will use. Any user-facing application needs more than 30 RPM. Even a small internal tool with 5 concurrent users will saturate 30 RPM instantly.
You need batch processing. The free tier does not support Groq's batch API, which offers 50% cost reduction on large-scale processing jobs.
The sweet spot: Use the free tier for development and testing. Switch to Developer tier for staging and production. TokenMix.ai data shows that most developers who start on Groq's free tier upgrade within 2-3 weeks if they are building anything real.
Cost Breakdown: Free Tier Ceiling vs Paid Plans
Free tier monthly value (max usage every day) at Groq's published Developer rates: short prompts (200 tokens avg) ~$4.14/mo equivalent. Medium prompts (500 tokens) ~$10.35/mo. Long prompts (1,000 tokens) ~$16.56/mo. Developer tier at scale: 100M tokens Llama 3.3 70B = $69, Llama 3.1 8B = $10. Per OpenAI's official pricing, GPT-4o costs roughly 3-5x more for same volume. Pricing may vary by tier; numbers reflect April 2026 published rates.
Let's put real numbers on what the free tier is worth and what it costs to go beyond.
Free tier monthly value (if you max out every day):
| Usage Pattern | Daily Requests | Monthly Tokens (est.) | Equivalent Developer Tier Cost |
|---|---|---|---|
| Short prompts (200 tokens avg) | 1,000 | ~6M tokens | ~$4.14/month |
| Medium prompts (500 tokens avg) | 1,000 | ~15M tokens | ~$10.35/month |
| Long prompts (1,000 tokens avg) | 800 | ~24M tokens | ~$16.56/month |
So the free tier is worth roughly $4-$17 per month depending on usage patterns. Not bad for zero cost, but also not a meaningful amount if you are building something real.
Developer tier costs at scale:
| Monthly Volume | Llama 3.3 70B Cost | Llama 3.1 8B Cost |
|---|---|---|
| 10M tokens | $6.90 | $1.00 |
| 100M tokens | $69.00 | $10.00 |
| 1B tokens | $690.00 | $100.00 |
For comparison, the same volume through OpenAI GPT-4o would cost 3-5x more. Groq's pricing is aggressive, and routing through providers like TokenMix.ai can squeeze additional savings.
How Groq Free Tier Compares to Other Free LLM APIs
Google AI Studio's published limits offer 1,500 RPD vs Groq's 1,000 (50% more daily volume) and 1M TPM vs Groq's 6,000 (167x more token throughput). Groq's own docs counter with sub-200ms TTFT — 3-10x faster than Google's GPU inference per Groq's LPU benchmarks. Cerebras free tier matches Groq's 30 RPM / 1,000 RPD on Llama 3.3 70B but with smaller model catalog. Limits may vary as providers adjust free tiers periodically.
Groq is not the only free option. Here is how it stacks up against other providers offering free API access.
| Provider | Free Tier RPM | Free Daily Limit | Best Free Model | Speed |
|---|---|---|---|---|
| Groq | 30 | 1,000 RPD | Llama 3.3 70B | Ultra-fast (LPU) |
| Google AI Studio | 15 | 1,500 RPD | Gemini 2.5 Flash | Fast |
| OpenRouter | Varies | Varies | Multiple | Varies |
| Hugging Face | 10 | 1,000 RPD | Multiple | Moderate |
| Cerebras | 30 | 1,000 RPD | Llama 3.3 70B | Very fast |
Groq's advantage is speed. Its LPU inference engine delivers sub-200ms time-to-first-token for most models, which is 3-10x faster than standard GPU inference. If latency matters for your prototype, Groq's free tier is the best option.
Google AI Studio offers more daily requests (1,500 vs 1,000) and access to Gemini models, which excel at multimodal tasks. For a detailed breakdown, see our Gemini API free tier guide.
OpenRouter aggregates multiple providers and offers free access to select models, though availability fluctuates. Check our OpenRouter alternatives comparison for the full picture.
For developers who want to compare free LLM API options across all providers, TokenMix.ai maintains a real-time dashboard showing which free tiers are currently available and their actual rate limits.
Is Groq Free Tier Enough for Your Use Case?
Per Groq's published rate limits, free tier suits: learning/experimenting (1,000 RPD covers casual testing), hackathons (2-day projects rarely exceed limits), personal projects (start free, plan to upgrade in 2-3 weeks). NOT suitable for: MVPs serving real users (30 RPM saturates with 5 concurrent users), production APIs (need fallback provider for reliability), batch processing (free tier excludes batch API). Free tier ceiling = $4-17/mo equivalent value.
| Your Situation | Recommendation | Why |
|---|---|---|
| Learning/experimenting with LLMs | Groq free tier is enough | 1,000 RPD covers casual testing |
| Building a personal project | Start free, plan to upgrade | You will hit RPD limits within weeks |
| Hackathon or weekend project | Groq free tier is enough | 2-day projects rarely exceed limits |
| MVP for a startup | Upgrade immediately | 30 RPM cannot serve real users |
| Production API integration | Upgrade + add fallback | Add a second provider for reliability |
| Cost-sensitive batch processing | Use Developer + Batch API | Batch API saves 50% and is not on free tier |
FAQ
What are the exact Groq free tier rate limits in 2026?
Groq's free tier allows 30 requests per minute (RPM), 6,000 tokens per minute (TPM), and 1,000 requests per day (RPD) for most models. Llama 4 Maverick is limited to 15 RPM, 3,000 TPM, and 500 RPD. These limits apply per API key per model.
How do I check my remaining Groq API rate limits?
Groq includes rate limit data in API response headers. Check x-ratelimit-remaining-requests for RPM and x-ratelimit-remaining-tokens for TPM. Daily limits (RPD) are not exposed in headers, so you need to track them on your side with a request counter.
When should I upgrade from Groq free tier to Developer tier?
Upgrade when you hit the 1,000 RPD limit more than twice a week, need more than 30 RPM for concurrent users, or require batch processing capabilities. The Developer tier removes daily caps entirely and costs roughly $0.59-$0.79 per million tokens for Llama 3.3 70B.
Does Groq free tier have a daily request limit?
Yes. The daily limit is 1,000 requests for most models and 500 requests for Llama 4 Maverick. This is the most restrictive constraint for most developers. The counter resets at midnight UTC.
Can I use multiple Groq API keys to bypass free tier limits?
Groq's terms of service prohibit creating multiple accounts to circumvent rate limits. Doing so risks account suspension. If you need more capacity, the Developer tier is the legitimate path forward and costs less than most developers expect.
How does Groq free tier compare to Google AI Studio free tier?
Google AI Studio offers 1,500 requests per day (vs Groq's 1,000) and 1 million tokens per minute (vs Groq's 6,000). Google's free tier is more generous on volume. Groq's advantage is inference speed, delivering 3-10x faster response times through its LPU hardware. The choice depends on whether speed or volume matters more for your use case.
Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: Groq Official Documentation, Groq Pricing Page, TokenMix.ai API Tracker