TokenMix Research Lab · 2026-04-04

OpenAI Embedding Pricing 2026: $0.02/M vs $0.13/M (6.5x Premium)

OpenAI Embedding Models Pricing in 2026: text-embedding-3-small vs 3-large vs ada-002 Cost Comparison

Last Updated: 2026-04-29
Author: TokenMix Research Lab

text-embedding-3-small at $0.02/M is the right default for 80% of RAG apps; 3-large at $0.13/M (6.5× more) only worth it for legal/medical accuracy; ada-002 is obsolete (3-small is 5× cheaper and better).

OpenAI's embedding models are the cheapest part of their API — text-embedding-3-small costs just $0.02 per million tokens, and batch processing halves it to $0.01/M. But choosing wrong between the three models can cost you 6.5x more than necessary. text-embedding-3-large at $0.13/M delivers better retrieval quality for RAG — whether the 6.5x premium is worth it depends on your use case. This guide breaks down every OpenAI embedding model's real cost, compares them against Voyage, Cohere, and free alternatives, and shows exactly when to use which. All pricing verified against OpenAI's official docs and tracked by TokenMix.ai as of April 2026.

Table of Contents


Quick OpenAI Embedding Pricing Overview

Three OpenAI embedding tiers: 3-small at $0.02/M (1536 dim), 3-large at $0.13/M (3072 dim), legacy ada-002 at $0.10/M — all input-only, capped at 8,191 input tokens per request.

All prices per 1M tokens, OpenAI direct API, April 2026:

Model Standard Batch (50% off) Dimensions Max Input
text-embedding-3-small $0.02 $0.01 1536 8,191
text-embedding-3-large $0.13 $0.065 3072 8,191
text-embedding-ada-002 $0.10 $0.05 1536 8,191

Key insight: Embeddings are input-only — you pay per token embedded, no output cost. This makes them fundamentally cheaper than chat models per token processed.

text-embedding-3-small at $0.02/M is 97% cheaper than GPT-5.4 Nano input ($0.20/M). If your workload is primarily search/retrieval rather than generation, embeddings save massive costs.


text-embedding-3-small vs 3-large: When 6.5x More is Worth It

3-large costs 6.5× more for a 4-point MTEB lift (62% → 66%) and 2× the dimensions/storage — only worth it for legal/medical/technical retrieval where wrong answers have high consequences. The small model costs $0.02/M. The large model costs $0.13/M. That's a 6.5x difference. What do you get for it?

Metric 3-small 3-large Improvement
Price/M tokens $0.02 $0.13 6.5x more expensive
Dimensions 1536 3072 2x
MTEB retrieval score ~62% ~66% +4 points
Storage per vector ~6KB ~12KB 2x
Query latency Faster Slower ~1.5x slower

Use 3-small when:

Use 3-large when:

The 80/20 rule: For 80% of production RAG applications, 3-small is sufficient. The 4-point quality gap only matters at the margins — if your users notice wrong search results, upgrade.


text-embedding-ada-002: Should You Migrate?

Yes, immediately — text-embedding-3-small is 5× cheaper ($0.02 vs $0.10/M) and slightly higher quality (~62% vs ~61% MTEB). There is no scenario where staying on ada-002 makes sense. ada-002 was OpenAI's standard embedding model before the v3 series launched. It's still available but there's little reason to stay on it.

Comparison ada-002 3-small 3-large
Price/M $0.10 $0.02 $0.13
Quality (MTEB) ~61% ~62% ~66%
Dimensions 1536 1536 3072

text-embedding-3-small is 5x cheaper AND slightly better than ada-002. There is no scenario where ada-002 is the right choice for new projects. If you're still on ada-002, migrate to 3-small — you get better quality at 20% of the cost.

Migration is straightforward: same API endpoint, change the model parameter. But you'll need to re-embed your entire corpus since the vector spaces aren't compatible.


Batch API: Cut Embedding Costs by 50%

Batch API halves embedding costs (3-small drops to $0.01/M, 3-large to $0.065/M) — always use it for corpus building, index migrations, and re-embedding; standard pricing only for live user queries. Embedding workloads are the ideal batch API candidate — they're inherently async (you're building/updating an index, not serving real-time queries).

Model Standard Batch Monthly cost for 1B tokens
text-embedding-3-small $0.02 $0.01 $10 → $5
text-embedding-3-large $0.13 $0.065 $65 → $32.50

Always use batch for initial corpus embedding. Building a new search index? Migrating to a new model? Re-embedding after dimension changes? Use batch. There's no reason to pay standard rates for non-real-time embedding work.

For real-time embedding (live user queries), standard pricing applies — but individual queries are so cheap ($0.02/M) that the cost is negligible.


OpenAI Embedding Pricing vs Voyage vs Cohere vs Free Options

Cross-provider: Google text-embedding-005 cheapest hosted at $0.00625/M (3× under OpenAI), Voyage-3-large highest quality at 67% MTEB ($0.18/M, 9× more), nomic-embed-text free if you self-host.

Provider Model Price/M tokens Dimensions Quality (MTEB)
OpenAI text-embedding-3-small $0.02 1536 ~62%
OpenAI text-embedding-3-large $0.13 3072 ~66%
Voyage AI voyage-3-large $0.18 1024 ~67%
Cohere embed-v4.0 $0.10 1024 ~65%
Google text-embedding-005 $0.00625 768 ~64%
Free (local) nomic-embed-text $0 (self-host) 768 ~61%

Key insights from TokenMix.ai cross-provider data:

  1. Google text-embedding-005 is the cheapest hosted option at $0.00625/M — 3x cheaper than OpenAI 3-small. Quality is competitive at 64% MTEB.

  2. Voyage-3-large leads on quality (67% MTEB) but costs 9x more than OpenAI 3-small. Only worth it for quality-critical applications.

  3. Self-hosted nomic-embed-text is free if you have GPU capacity. Quality matches ada-002. Best for teams with existing ML infrastructure.

  4. OpenAI 3-small offers the best balance of quality, price, ecosystem integration, and reliability. It's not the cheapest or the best, but it's "good enough and easy" for most teams.

Through TokenMix.ai, you can access OpenAI embedding models alongside alternatives from multiple providers through a single API.


Real-World Embedding Cost Scenarios

Two real workloads: 100K-doc startup RAG → $2.72 for entire Year 1 on 3-small (effectively free); 10M-doc enterprise search → $336 Year 1 on 3-small vs $199 on Google embed (saves $137/yr).

Scenario 1: Startup RAG app — 100K documents

Model Initial Embed Monthly Query Cost Total Year 1
3-small (batch) $2.00 $0.06 $2.72
3-large (batch) $13.00 $0.39 $17.68
Voyage-3-large $36.00 $0.54 $42.48

text-embedding-3-small costs $2.72 for an entire year of a 100K-document RAG application. Embedding costs are essentially free at this scale.

Scenario 2: Enterprise search — 10M documents

Model Initial Embed Monthly Query Cost Total Year 1
3-small (batch) $300 $3.00 $336
3-large (batch) $1,950 $19.50 $2,184
Google embed $187.50 $0.94 $199

At enterprise scale, Google's embedding model saves $137/year vs OpenAI 3-small. But if you're already using OpenAI for chat, the integration simplicity of staying with one provider often outweighs the $11/month savings.


Which OpenAI Embedding Model Should You Pick?

Default to 3-small for 80% of RAG apps; upgrade to 3-large only when wrong answers have high consequences (legal/medical). Always pair with Batch API for corpus building. ada-002 has no remaining use case.

Your Situation Recommended Model Why
Most RAG applications text-embedding-3-small $0.02/M, quality sufficient for 80% of use cases
Quality-critical retrieval text-embedding-3-large +4 MTEB points worth 6.5x cost for legal/medical
Currently on ada-002 Migrate to 3-small 5x cheaper, slightly better quality
Maximum quality at any cost Voyage-3-large Leads MTEB benchmarks
Maximum cost savings Google text-embedding-005 $0.00625/M — cheapest hosted option
Bulk re-embedding, index building Any model + Batch API 50% off, no reason not to
Privacy-first, own infrastructure nomic-embed-text (self-host) Free, decent quality, full control

Related: Compare all model pricing in our complete LLM API pricing comparison

What's the Bottom Line on OpenAI Embedding Pricing?

Embeddings are too cheap to over-think — text-embedding-3-small at $0.02/M means a 100K-doc RAG app costs $3/year. Default to 3-small + Batch API. Spend 6.5× more on 3-large only when retrieval accuracy directly impacts user outcomes. OpenAI's embedding models are so cheap that cost is rarely the deciding factor. text-embedding-3-small at $0.02/M means a 100K-document RAG app costs under $3/year for embeddings. The real decision is quality: spend 6.5x more for text-embedding-3-large if retrieval accuracy directly impacts user experience, stick with 3-small for everything else.

If you're still on ada-002, migrate today — 3-small is 5x cheaper and slightly better. And always use the Batch API for corpus embedding — there's no reason to pay standard rates for non-real-time work.

Real-time embedding model pricing across all providers at tokenmix.ai/models.


FAQ

How much does OpenAI text-embedding-3-small cost?

$0.02 per million tokens at standard rates. With the Batch API (50% off), it drops to $0.01/M. Embedding 1 million tokens costs about 2 cents — making it one of the cheapest API operations available.

Should I use text-embedding-3-small or 3-large?

Use 3-small for most applications — it's 6.5x cheaper ($0.02 vs $0.13/M) with only a 4-point quality gap on MTEB benchmarks. Use 3-large only when retrieval accuracy is critical: legal search, medical RAG, or applications where wrong results have high consequences.

Is text-embedding-ada-002 still worth using?

No. text-embedding-3-small is 5x cheaper ($0.02 vs $0.10/M) and slightly higher quality. Migrate to 3-small for immediate cost savings with no quality loss. You'll need to re-embed your corpus since vector spaces differ.

What are the cheapest embedding models available?

Google text-embedding-005 at $0.00625/M is the cheapest hosted option. OpenAI 3-small at $0.02/M is second. Self-hosted nomic-embed-text is free if you have GPU infrastructure.

How do I reduce OpenAI embedding costs?

Three strategies: (1) Use the Batch API for all non-real-time embedding — saves 50%. (2) Use 3-small instead of 3-large unless quality demands it. (3) Consider Google text-embedding-005 at $0.00625/M for pure cost optimization.

What is the max input size for OpenAI embeddings?

8,191 tokens per embedding request for all three models (3-small, 3-large, ada-002). Documents longer than 8,191 tokens must be chunked before embedding.


Author: TokenMix Research Lab | Last Updated: April 2026 | Data Source: OpenAI Official Pricing, TokenMix.ai, and MTEB Leaderboard