TokenMix Research Lab · 2026-04-25

text-embedding-3-small: $0.02/MTok, 1536 Dims, MTEB 62.26 Guide

OpenAI's text-embedding-3-small is the cheapest production-quality embedding model from OpenAI as of April 2026 — $0.02 per million tokens standard ($0.01 batch), default 1,536 dimensions with Matryoshka support down to 256, and a 62.26 MTEB benchmark score. It's 5× cheaper than the retired text-embedding-ada-002 and 6.5× cheaper than text-embedding-3-large while losing only ~4 points on MTEB. For RAG, semantic search, and classification workloads, it's the OpenAI-ecosystem default for 80% of production use cases. This guide covers the real pricing math, dimension trade-offs, MTEB benchmark context, when to use it vs alternatives, and common deployment gotchas.

What text-embedding-3-small Is
Pricing Breakdown
Dimension Flexibility (Matryoshka)
MTEB Benchmark Context
Supported LLM Providers and Model Routing
When to Use It
Comparison With Alternatives
Known Limitations
Quick Usage Guide
FAQ

What text-embedding-3-small Is

Released January 2024 alongside text-embedding-3-large, this model replaces the older text-embedding-ada-002 as OpenAI's cost-efficient embedding tier. Two years on, it remains OpenAI's default recommendation for cost-sensitive embedding workloads.

Key attributes:

Attribute	Value
Creator	OpenAI
Released	January 25, 2024
Max input tokens	8,191 per request
Default output dimensions	1,536
Matryoshka reduction	Down to 256 dims
MTEB score	62.26
Standard price	$0.02 / MTok
Batch price	$0.01 / MTok
Typical latency	50-200ms per request
Status	Current production default

Pricing Breakdown

At $0.02 per million tokens, practical costs for typical workloads:

Workload	Monthly token volume	Monthly cost
Small RAG (1M docs, periodic re-embed)	~5M	$0.10
Medium RAG (10M docs, weekly refresh)	~50M	.00
Product catalog search (100K items, daily)	~100M	$2.00
Large enterprise RAG (10M docs, real-time ingestion)	~500M	0.00
Massive document corpus (1B docs one-time embed)	~1B	$20.00

Batch tier is 50% off standard. Use batch API when your embedding workload isn't real-time — OpenAI processes asynchronously and returns results within 24 hours.

Cost comparison with competitors (per MTok):

text-embedding-3-small: $0.02
text-embedding-3-large: $0.13
text-embedding-ada-002 (deprecated): $0.10
Google text-embedding-005: $0.025 (sometimes cheaper, depends on latency/quality needs)
Cohere embed-v4: $0.10
Voyage AI voyage-3.5: $0.18

For most general-purpose RAG, text-embedding-3-small is near the bottom of the cost curve.

Dimension Flexibility (Matryoshka)

Default output is 1,536 dimensions. A trained-in Matryoshka representation allows shortening embeddings to any dimension between 256 and 1,536 via the dimensions API parameter — without retraining.

The trade-off:

Dimensions	Quality retention	Use case
1,536 (default)	100%	Maximum accuracy, when storage isn't constraining
1,024	~99%	Balanced default for most production
768	~97%	Standard RAG with tight storage budgets
512	~94%	Aggressive compression, acceptable for most search
256	~87%	Minimum reasonable; still beats older models

API usage:

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Your text here",
    dimensions=768,
)

Storage implications: a 1M-vector index at 1,536 dims is ~6GB (FP32). At 768 dims, it's 3GB. For vector databases (Qdrant, Pinecone, Weaviate), this halves your infrastructure cost.

The common production pattern: store at 1,536 dimensions once, truncate to 768 or 512 for the production index. Keep the full version for re-indexing if you upgrade your retrieval strategy.

MTEB Benchmark Context

The Massive Text Embedding Benchmark (MTEB) evaluates 56 tasks across classification, clustering, retrieval, and similarity. text-embedding-3-small scores 62.26 — solid but not frontier.

For comparison:

Model	MTEB	Relative quality
`text-embedding-3-large`	64.6	+2.34 vs 3-small
`text-embedding-3-small`	62.26	baseline
Google `text-embedding-005`	~63	slightly better
Cohere embed-v4	~65	frontier closed
Voyage AI voyage-3.5	~67	frontier paid
BGE-m3 (open-weight)	~66	frontier open-weight
NVIDIA NV-Embed-v2 (open-weight)	~70	current open-weight leader

What the score differences mean in practice: a 2-3 point MTEB gap translates to roughly 5-10% improvement in top-k retrieval precision on most real workloads. Meaningful but not transformative.

When MTEB points matter: high-stakes retrieval (legal, medical, financial where recall accuracy affects outcomes). When they don't: general Q&A, product search, classification.

Supported LLM Providers and Model Routing

text-embedding-3-small is accessible via:

OpenAI direct (api.openai.com) — official endpoint
Azure OpenAI — same model, Microsoft-hosted
OpenAI-compatible aggregators — TokenMix.ai, OpenRouter, and similar

The aggregator path fits naturally in multi-model stacks. TokenMix.ai exposes text-embedding-3-small alongside Google text-embedding-005, Cohere embed-v4, Voyage AI voyage-3.5, and 300+ chat models (GPT-5.5, Claude Opus 4.7, DeepSeek V4-Pro, Kimi K2.6) through a single OpenAI-compatible API key — useful when you want to A/B test embedding models on your actual data without managing multiple vendor relationships.

Configuration is identical to direct OpenAI usage; only the base URL changes:

from openai import OpenAI

client = OpenAI(
    api_key="your-tokenmix-key",
    base_url="https://api.tokenmix.ai/v1",
)

embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input="semantic search query",
).data[0].embedding

For teams routing through TokenMix.ai, the same API key covers embeddings, chat completion, vision, and tool-calling across all 300+ models — one billing relationship regardless of how many model types your stack uses.

When to Use It

Your workload	text-embedding-3-small fit
General-purpose RAG	Excellent default
E-commerce / product search	Strong fit
Semantic deduplication	Strong
Classification as embedding	Strong
Legal / medical retrieval	Consider 3-large or Voyage
Multilingual (heavy non-English)	Consider Cohere embed-v4 or Google
Code retrieval (code-specific tasks)	Consider Voyage code-3
Real-time streaming	Fine (50-200ms latency)
Batch processing	Use batch tier for 50% discount
Cost-critical at massive scale	Best in OpenAI ecosystem

Comparison With Alternatives

vs text-embedding-3-large ($0.13 / MTok):

6.5× cheaper
~4 MTEB points worse
Same API, same dimensional flexibility
Use 3-large only when last 5% quality matters

vs Google text-embedding-005 ($0.025 / MTok):

Google is slightly higher quality, similar price
Better multilingual performance
Routes through Google Cloud or Vertex AI (some teams prefer, some avoid)

vs Voyage AI voyage-3.5 ($0.18 / MTok):

Voyage is 9× more expensive
Wins on code and technical content
Strong domain-specific variants (code, finance, law)
Use when quality ceiling matters more than cost

vs open-weight BGE-m3 or NV-Embed-v2:

Free if self-hosted; requires GPU infrastructure
Quality: BGE-m3 ~66 MTEB, NV-Embed-v2 ~70 MTEB (both higher than 3-small)
Operational overhead: you maintain the inference server
Breakeven point vs API: ~500M tokens/month or higher volume

Known Limitations

1. Max input is 8,191 tokens. Longer documents require chunking. Unlike some newer models, there's no automatic truncation — oversized inputs error.

2. No code-specific fine-tuning. For code retrieval specifically, Voyage code-3 or specialized code embedding models outperform.

3. Weaker on very short queries. Semantic search with 2-3 word queries sees noisier results than with full-sentence queries. This applies to most embedding models, not unique to 3-small.

4. Multilingual quality varies. Strong on English and major European languages. Less strong on Chinese, Japanese, Korean compared to purpose-built multilingual models.

5. No batch-specific cost tier below $0.01/MTok. If you need cheaper per-token pricing and can accept 24-hour latency, the batch tier is the floor. Below that, you're looking at self-hosted open-weight options.

6. Deprecation risk over multi-year horizons. OpenAI maintains embedding APIs but has retired older models (ada-002 deprecation). Plan for the possibility that a future text-embedding-4 replaces 3-small within 2-3 years.

Quick Usage Guide

Python SDK:

from openai import OpenAI

client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=["document 1", "document 2"],
    dimensions=768,
)

vectors = [d.embedding for d in response.data]

Node.js SDK:

import OpenAI from "openai";
const client = new OpenAI();

const response = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: "document text",
  dimensions: 768,
});

const vector = response.data[0].embedding;

curl:

curl https://api.openai.com/v1/embeddings \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "text-embedding-3-small", "input": "hello", "dimensions": 768}'

Batch API (50% discount, async): submit a JSONL file with multiple embedding requests; get results within 24 hours. Use when latency isn't critical.

FAQ

Is text-embedding-3-small free?

No. $0.02 per million tokens. New OpenAI accounts get $5 in free credits — enough for ~250M tokens of embedding, more than enough for testing.

What dimensions should I use?

Default 1,536 for max quality. For most production RAG, 768 is the sweet spot — 97% quality retention at half the storage. Drop to 512 only when storage is a hard constraint.

Can I compare it against other embedding models?

Yes. Through TokenMix.ai you can hit text-embedding-3-small, text-embedding-3-large, Google text-embedding-005, Cohere embed-v4, and Voyage voyage-3.5 through one API key. Run the same query through each, measure retrieval quality on your ground truth, pick accordingly.

Does it support Chinese / Japanese?

Yes, with weaker quality than on English. For multilingual-heavy workloads, Cohere embed-v4 or Google text-embedding-005 are often better. Test on your specific language mix.

What's the max input length?

8,191 tokens per embedding request. For longer documents, chunk before embedding.

Does it support L2 normalization?

Embeddings are returned as L2-normalized vectors by default. If you need unnormalized, use the encoding_format parameter.

Can I use it with Qdrant, Pinecone, Weaviate?

Yes, all major vector DBs accept text-embedding-3-small vectors directly. Set your collection's dimension to match your chosen dimensions parameter (1,536 / 1,024 / 768 / 512 / 256).

How does it compare to text-embedding-3-large for code search?

Both are general-purpose. For code-specific, dedicated code embedding models (Voyage code-3) outperform. If budget allows, Voyage code-3 > text-embedding-3-large > text-embedding-3-small for code retrieval specifically.

Is batch tier really 50% off?

Yes, $0.01/MTok for batch. Trade-off: 24-hour max latency, async workflow. For one-time bulk embedding of a corpus, always use batch.

Should I wait for text-embedding-4?

Not announced. text-embedding-3 series has been stable for 2+ years. No signals of imminent replacement. If a new model ships, it'll likely be superior but more expensive — keep 3-small as your cost tier. For current projects, no reason to wait.

Author: TokenMix Research Lab | Last Updated: April 25, 2026 | Data Sources: OpenAI embeddings announcement, OpenAI API pricing, OpenAI embeddings guide, MTEB leaderboard, Awesome Agents embedding pricing April 2026, TokenMix.ai multi-model embeddings