TokenMix Research Lab · 2026-04-04

OpenAI Embedding Models Pricing in 2026: text-embedding-3-small vs 3-large vs ada-002 Cost Comparison
Last Updated: 2026-04-29
Author: TokenMix Research Lab
text-embedding-3-small at $0.02/M is the right default for 80% of RAG apps; 3-large at $0.13/M (6.5× more) only worth it for legal/medical accuracy; ada-002 is obsolete (3-small is 5× cheaper and better).
OpenAI's embedding models are the cheapest part of their API — text-embedding-3-small costs just $0.02 per million tokens, and batch processing halves it to $0.01/M. But choosing wrong between the three models can cost you 6.5x more than necessary. text-embedding-3-large at $0.13/M delivers better retrieval quality for RAG — whether the 6.5x premium is worth it depends on your use case. This guide breaks down every OpenAI embedding model's real cost, compares them against Voyage, Cohere, and free alternatives, and shows exactly when to use which. All pricing verified against OpenAI's official docs and tracked by TokenMix.ai as of April 2026.
Table of Contents
- Quick OpenAI Embedding Pricing Overview
- text-embedding-3-small vs 3-large: When 6.5x More is Worth It
- text-embedding-ada-002: Should You Migrate?
- Batch API: Cut Embedding Costs by 50%
- OpenAI Embedding Pricing vs Voyage vs Cohere vs Free Options
- Real-World Embedding Cost Scenarios
- How to Choose the Right OpenAI Embedding Model
- Conclusion
- FAQ
Quick OpenAI Embedding Pricing Overview
Three OpenAI embedding tiers: 3-small at $0.02/M (1536 dim), 3-large at $0.13/M (3072 dim), legacy ada-002 at $0.10/M — all input-only, capped at 8,191 input tokens per request.
All prices per 1M tokens, OpenAI direct API, April 2026:
| Model | Standard | Batch (50% off) | Dimensions | Max Input |
|---|---|---|---|---|
| text-embedding-3-small | $0.02 | $0.01 | 1536 | 8,191 |
| text-embedding-3-large | $0.13 | $0.065 | 3072 | 8,191 |
| text-embedding-ada-002 | $0.10 | $0.05 | 1536 | 8,191 |
Key insight: Embeddings are input-only — you pay per token embedded, no output cost. This makes them fundamentally cheaper than chat models per token processed.
text-embedding-3-small at $0.02/M is 97% cheaper than GPT-5.4 Nano input ($0.20/M). If your workload is primarily search/retrieval rather than generation, embeddings save massive costs.
text-embedding-3-small vs 3-large: When 6.5x More is Worth It
3-large costs 6.5× more for a 4-point MTEB lift (62% → 66%) and 2× the dimensions/storage — only worth it for legal/medical/technical retrieval where wrong answers have high consequences. The small model costs $0.02/M. The large model costs $0.13/M. That's a 6.5x difference. What do you get for it?
| Metric | 3-small | 3-large | Improvement |
|---|---|---|---|
| Price/M tokens | $0.02 | $0.13 | 6.5x more expensive |
| Dimensions | 1536 | 3072 | 2x |
| MTEB retrieval score | ~62% | ~66% | +4 points |
| Storage per vector | ~6KB | ~12KB | 2x |
| Query latency | Faster | Slower | ~1.5x slower |
Use 3-small when:
- Cost is primary concern
- Document corpus is large (millions of docs — storage cost matters)
- Use case is simple semantic search or classification
- Retrieval quality "good enough" is acceptable
Use 3-large when:
- Retrieval quality is critical (legal, medical, technical RAG)
- Corpus is small-to-medium (storage cost difference is negligible)
- You need maximum accuracy for reranking or multi-step retrieval
- The 4-point MTEB improvement translates to meaningful answer quality
The 80/20 rule: For 80% of production RAG applications, 3-small is sufficient. The 4-point quality gap only matters at the margins — if your users notice wrong search results, upgrade.
text-embedding-ada-002: Should You Migrate?
Yes, immediately — text-embedding-3-small is 5× cheaper ($0.02 vs $0.10/M) and slightly higher quality (~62% vs ~61% MTEB). There is no scenario where staying on ada-002 makes sense. ada-002 was OpenAI's standard embedding model before the v3 series launched. It's still available but there's little reason to stay on it.
| Comparison | ada-002 | 3-small | 3-large |
|---|---|---|---|
| Price/M | $0.10 | $0.02 | $0.13 |
| Quality (MTEB) | ~61% | ~62% | ~66% |
| Dimensions | 1536 | 1536 | 3072 |
text-embedding-3-small is 5x cheaper AND slightly better than ada-002. There is no scenario where ada-002 is the right choice for new projects. If you're still on ada-002, migrate to 3-small — you get better quality at 20% of the cost.
Migration is straightforward: same API endpoint, change the model parameter. But you'll need to re-embed your entire corpus since the vector spaces aren't compatible.
Batch API: Cut Embedding Costs by 50%
Batch API halves embedding costs (3-small drops to $0.01/M, 3-large to $0.065/M) — always use it for corpus building, index migrations, and re-embedding; standard pricing only for live user queries. Embedding workloads are the ideal batch API candidate — they're inherently async (you're building/updating an index, not serving real-time queries).
| Model | Standard | Batch | Monthly cost for 1B tokens |
|---|---|---|---|
| text-embedding-3-small | $0.02 | $0.01 | $10 → $5 |
| text-embedding-3-large | $0.13 | $0.065 | $65 → $32.50 |
Always use batch for initial corpus embedding. Building a new search index? Migrating to a new model? Re-embedding after dimension changes? Use batch. There's no reason to pay standard rates for non-real-time embedding work.
For real-time embedding (live user queries), standard pricing applies — but individual queries are so cheap ($0.02/M) that the cost is negligible.
OpenAI Embedding Pricing vs Voyage vs Cohere vs Free Options
Cross-provider: Google text-embedding-005 cheapest hosted at $0.00625/M (3× under OpenAI), Voyage-3-large highest quality at 67% MTEB ($0.18/M, 9× more), nomic-embed-text free if you self-host.
| Provider | Model | Price/M tokens | Dimensions | Quality (MTEB) |
|---|---|---|---|---|
| OpenAI | text-embedding-3-small | $0.02 | 1536 | ~62% |
| OpenAI | text-embedding-3-large | $0.13 | 3072 | ~66% |
| Voyage AI | voyage-3-large | $0.18 | 1024 | ~67% |
| Cohere | embed-v4.0 | $0.10 | 1024 | ~65% |
| text-embedding-005 | $0.00625 | 768 | ~64% | |
| Free (local) | nomic-embed-text | $0 (self-host) | 768 | ~61% |
Key insights from TokenMix.ai cross-provider data:
Google text-embedding-005 is the cheapest hosted option at $0.00625/M — 3x cheaper than OpenAI 3-small. Quality is competitive at 64% MTEB.
Voyage-3-large leads on quality (67% MTEB) but costs 9x more than OpenAI 3-small. Only worth it for quality-critical applications.
Self-hosted nomic-embed-text is free if you have GPU capacity. Quality matches ada-002. Best for teams with existing ML infrastructure.
OpenAI 3-small offers the best balance of quality, price, ecosystem integration, and reliability. It's not the cheapest or the best, but it's "good enough and easy" for most teams.
Through TokenMix.ai, you can access OpenAI embedding models alongside alternatives from multiple providers through a single API.
Real-World Embedding Cost Scenarios
Two real workloads: 100K-doc startup RAG → $2.72 for entire Year 1 on 3-small (effectively free); 10M-doc enterprise search → $336 Year 1 on 3-small vs $199 on Google embed (saves $137/yr).
Scenario 1: Startup RAG app — 100K documents
- Average document: 2,000 tokens
- Total corpus: 200M tokens (one-time embedding)
- Daily queries: 1,000 (avg 100 tokens each)
| Model | Initial Embed | Monthly Query Cost | Total Year 1 |
|---|---|---|---|
| 3-small (batch) | $2.00 | $0.06 | $2.72 |
| 3-large (batch) | $13.00 | $0.39 | $17.68 |
| Voyage-3-large | $36.00 | $0.54 | $42.48 |
text-embedding-3-small costs $2.72 for an entire year of a 100K-document RAG application. Embedding costs are essentially free at this scale.
Scenario 2: Enterprise search — 10M documents
- Average document: 3,000 tokens
- Total corpus: 30B tokens
- Daily queries: 50,000
| Model | Initial Embed | Monthly Query Cost | Total Year 1 |
|---|---|---|---|
| 3-small (batch) | $300 | $3.00 | $336 |
| 3-large (batch) | $1,950 | $19.50 | $2,184 |
| Google embed | $187.50 | $0.94 | $199 |
At enterprise scale, Google's embedding model saves $137/year vs OpenAI 3-small. But if you're already using OpenAI for chat, the integration simplicity of staying with one provider often outweighs the $11/month savings.
Which OpenAI Embedding Model Should You Pick?
Default to 3-small for 80% of RAG apps; upgrade to 3-large only when wrong answers have high consequences (legal/medical). Always pair with Batch API for corpus building. ada-002 has no remaining use case.
| Your Situation | Recommended Model | Why |
|---|---|---|
| Most RAG applications | text-embedding-3-small | $0.02/M, quality sufficient for 80% of use cases |
| Quality-critical retrieval | text-embedding-3-large | +4 MTEB points worth 6.5x cost for legal/medical |
| Currently on ada-002 | Migrate to 3-small | 5x cheaper, slightly better quality |
| Maximum quality at any cost | Voyage-3-large | Leads MTEB benchmarks |
| Maximum cost savings | Google text-embedding-005 | $0.00625/M — cheapest hosted option |
| Bulk re-embedding, index building | Any model + Batch API | 50% off, no reason not to |
| Privacy-first, own infrastructure | nomic-embed-text (self-host) | Free, decent quality, full control |
Related: Compare all model pricing in our complete LLM API pricing comparison
What's the Bottom Line on OpenAI Embedding Pricing?
Embeddings are too cheap to over-think — text-embedding-3-small at $0.02/M means a 100K-doc RAG app costs $3/year. Default to 3-small + Batch API. Spend 6.5× more on 3-large only when retrieval accuracy directly impacts user outcomes. OpenAI's embedding models are so cheap that cost is rarely the deciding factor. text-embedding-3-small at $0.02/M means a 100K-document RAG app costs under $3/year for embeddings. The real decision is quality: spend 6.5x more for text-embedding-3-large if retrieval accuracy directly impacts user experience, stick with 3-small for everything else.
If you're still on ada-002, migrate today — 3-small is 5x cheaper and slightly better. And always use the Batch API for corpus embedding — there's no reason to pay standard rates for non-real-time work.
Real-time embedding model pricing across all providers at tokenmix.ai/models.
FAQ
How much does OpenAI text-embedding-3-small cost?
$0.02 per million tokens at standard rates. With the Batch API (50% off), it drops to $0.01/M. Embedding 1 million tokens costs about 2 cents — making it one of the cheapest API operations available.
Should I use text-embedding-3-small or 3-large?
Use 3-small for most applications — it's 6.5x cheaper ($0.02 vs $0.13/M) with only a 4-point quality gap on MTEB benchmarks. Use 3-large only when retrieval accuracy is critical: legal search, medical RAG, or applications where wrong results have high consequences.
Is text-embedding-ada-002 still worth using?
No. text-embedding-3-small is 5x cheaper ($0.02 vs $0.10/M) and slightly higher quality. Migrate to 3-small for immediate cost savings with no quality loss. You'll need to re-embed your corpus since vector spaces differ.
What are the cheapest embedding models available?
Google text-embedding-005 at $0.00625/M is the cheapest hosted option. OpenAI 3-small at $0.02/M is second. Self-hosted nomic-embed-text is free if you have GPU infrastructure.
How do I reduce OpenAI embedding costs?
Three strategies: (1) Use the Batch API for all non-real-time embedding — saves 50%. (2) Use 3-small instead of 3-large unless quality demands it. (3) Consider Google text-embedding-005 at $0.00625/M for pure cost optimization.
What is the max input size for OpenAI embeddings?
8,191 tokens per embedding request for all three models (3-small, 3-large, ada-002). Documents longer than 8,191 tokens must be chunked before embedding.
Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: OpenAI Official Pricing, TokenMix.ai, and MTEB Leaderboard