TokenMix Research Lab · 2026-04-25

text-embedding-3-small: $0.02/MTok, 1536 Dims, MTEB 62.26 Guide
Last Updated: 2026-04-25
Author: TokenMix Research Lab
OpenAI's text-embedding-3-small is the cheapest production-quality embedding model from OpenAI as of April 2026 — $0.02 per million tokens standard ($0.01 batch), default 1,536 dimensions with Matryoshka support down to 256, and a 62.26 MTEB benchmark score. It's 5× cheaper than the retired text-embedding-ada-002 and 6.5× cheaper than text-embedding-3-large while losing only ~4 points on MTEB. For RAG, semantic search, and classification workloads, it's the OpenAI-ecosystem default for 80% of production use cases. This guide covers the real pricing math, dimension trade-offs, MTEB benchmark context, when to use it vs alternatives, and common deployment gotchas.
Table of Contents
- What text-embedding-3-small Is
- Pricing Breakdown
- Dimension Flexibility (Matryoshka)
- MTEB Benchmark Context
- Supported LLM Providers and Model Routing
- When to Use It
- Comparison With Alternatives
- Known Limitations
- Quick Usage Guide
- FAQ
What text-embedding-3-small Is
Released January 2024 alongside text-embedding-3-large, this model replaces the older text-embedding-ada-002 as OpenAI's cost-efficient embedding tier. Two years on, it remains OpenAI's default recommendation for cost-sensitive embedding workloads.
Key attributes:
| Attribute | Value |
|---|---|
| Creator | OpenAI |
| Released | January 25, 2024 |
| Max input tokens | 8,191 per request |
| Default output dimensions | 1,536 |
| Matryoshka reduction | Down to 256 dims |
| MTEB score | 62.26 |
| Standard price | $0.02 / MTok |
| Batch price | $0.01 / MTok |
| Typical latency | 50-200ms per request |
| Status | Current production default |
Pricing Breakdown
At $0.02 per million tokens, practical costs for typical workloads:
| Workload | Monthly token volume | Monthly cost |
|---|---|---|
| Small RAG (1M docs, periodic re-embed) | ~5M | $0.10 |
| Medium RAG (10M docs, weekly refresh) | ~50M | $1.00 |
| Product catalog search (100K items, daily) | ~100M | $2.00 |
| Large enterprise RAG (10M docs, real-time ingestion) | ~500M | $10.00 |
| Massive document corpus (1B docs one-time embed) | ~1B | $20.00 |
Batch tier is 50% off standard. Use batch API when your embedding workload isn't real-time — OpenAI processes asynchronously and returns results within 24 hours.
Cost comparison with competitors (per MTok):
text-embedding-3-small: $0.02text-embedding-3-large: $0.13text-embedding-ada-002(deprecated): $0.10- Google
text-embedding-005: $0.025 (sometimes cheaper, depends on latency/quality needs) - Cohere embed-v4: $0.10
- Voyage AI voyage-3.5: $0.18
For most general-purpose RAG, text-embedding-3-small is near the bottom of the cost curve.
Dimension Flexibility (Matryoshka)
Default output is 1,536 dimensions. A trained-in Matryoshka representation allows shortening embeddings to any dimension between 256 and 1,536 via the dimensions API parameter — without retraining.
The trade-off:
| Dimensions | Quality retention | Use case |
|---|---|---|
| 1,536 (default) | 100% | Maximum accuracy, when storage isn't constraining |
| 1,024 | ~99% | Balanced default for most production |
| 768 | ~97% | Standard RAG with tight storage budgets |
| 512 | ~94% | Aggressive compression, acceptable for most search |
| 256 | ~87% | Minimum reasonable; still beats older models |
API usage:
response = client.embeddings.create(
model="text-embedding-3-small",
input="Your text here",
dimensions=768,
)
Storage implications: a 1M-vector index at 1,536 dims is ~6GB (FP32). At 768 dims, it's 3GB. For vector databases (Qdrant, Pinecone, Weaviate), this halves your infrastructure cost.
The common production pattern: store at 1,536 dimensions once, truncate to 768 or 512 for the production index. Keep the full version for re-indexing if you upgrade your retrieval strategy.
MTEB Benchmark Context
The Massive Text Embedding Benchmark (MTEB) evaluates 56 tasks across classification, clustering, retrieval, and similarity. text-embedding-3-small scores 62.26 — solid but not frontier.
For comparison:
| Model | MTEB | Relative quality |
|---|---|---|
text-embedding-3-large |
64.6 | +2.34 vs 3-small |
text-embedding-3-small |
62.26 | baseline |
Google text-embedding-005 |
~63 | slightly better |
| Cohere embed-v4 | ~65 | frontier closed |
| Voyage AI voyage-3.5 | ~67 | frontier paid |
| BGE-m3 (open-weight) | ~66 | frontier open-weight |
| NVIDIA NV-Embed-v2 (open-weight) | ~70 | current open-weight leader |
What the score differences mean in practice: a 2-3 point MTEB gap translates to roughly 5-10% improvement in top-k retrieval precision on most real workloads. Meaningful but not transformative.
When MTEB points matter: high-stakes retrieval (legal, medical, financial where recall accuracy affects outcomes). When they don't: general Q&A, product search, classification.
Supported LLM Providers and Model Routing
text-embedding-3-small is accessible via:
- OpenAI direct (
api.openai.com) — official endpoint - Azure OpenAI — same model, Microsoft-hosted
- OpenAI-compatible aggregators — TokenMix.ai, OpenRouter, and similar
The aggregator path fits naturally in multi-model stacks. TokenMix.ai exposes text-embedding-3-small alongside Google text-embedding-005, Cohere embed-v4, Voyage AI voyage-3.5, and 300+ chat models (GPT-5.5, Claude Opus 4.7, DeepSeek V4-Pro, Kimi K2.6) through a single OpenAI-compatible API key — useful when you want to A/B test embedding models on your actual data without managing multiple vendor relationships.
Configuration is identical to direct OpenAI usage; only the base URL changes:
from openai import OpenAI
client = OpenAI(
api_key="your-tokenmix-key",
base_url="https://api.tokenmix.ai/v1",
)
embedding = client.embeddings.create(
model="text-embedding-3-small",
input="semantic search query",
).data[0].embedding
For teams routing through TokenMix.ai, the same API key covers embeddings, chat completion, vision, and tool-calling across all 300+ models — one billing relationship regardless of how many model types your stack uses.
When to Use It
| Your workload | text-embedding-3-small fit |
|---|---|
| General-purpose RAG | Excellent default |
| E-commerce / product search | Strong fit |
| Semantic deduplication | Strong |
| Classification as embedding | Strong |
| Legal / medical retrieval | Consider 3-large or Voyage |
| Multilingual (heavy non-English) | Consider Cohere embed-v4 or Google |
| Code retrieval (code-specific tasks) | Consider Voyage code-3 |
| Real-time streaming | Fine (50-200ms latency) |
| Batch processing | Use batch tier for 50% discount |
| Cost-critical at massive scale | Best in OpenAI ecosystem |
Comparison With Alternatives
vs text-embedding-3-large ($0.13 / MTok):
- 6.5× cheaper
- ~4 MTEB points worse
- Same API, same dimensional flexibility
- Use 3-large only when last 5% quality matters
vs Google text-embedding-005 ($0.025 / MTok):
- Google is slightly higher quality, similar price
- Better multilingual performance
- Routes through Google Cloud or Vertex AI (some teams prefer, some avoid)
vs Voyage AI voyage-3.5 ($0.18 / MTok):
- Voyage is 9× more expensive
- Wins on code and technical content
- Strong domain-specific variants (code, finance, law)
- Use when quality ceiling matters more than cost
vs open-weight BGE-m3 or NV-Embed-v2:
- Free if self-hosted; requires GPU infrastructure
- Quality: BGE-m3 ~66 MTEB, NV-Embed-v2 ~70 MTEB (both higher than 3-small)
- Operational overhead: you maintain the inference server
- Breakeven point vs API: ~500M tokens/month or higher volume
Known Limitations
1. Max input is 8,191 tokens. Longer documents require chunking. Unlike some newer models, there's no automatic truncation — oversized inputs error.
2. No code-specific fine-tuning. For code retrieval specifically, Voyage code-3 or specialized code embedding models outperform.
3. Weaker on very short queries. Semantic search with 2-3 word queries sees noisier results than with full-sentence queries. This applies to most embedding models, not unique to 3-small.
4. Multilingual quality varies. Strong on English and major European languages. Less strong on Chinese, Japanese, Korean compared to purpose-built multilingual models.
5. No batch-specific cost tier below $0.01/MTok. If you need cheaper per-token pricing and can accept 24-hour latency, the batch tier is the floor. Below that, you're looking at self-hosted open-weight options.
6. Deprecation risk over multi-year horizons. OpenAI maintains embedding APIs but has retired older models (ada-002 deprecation). Plan for the possibility that a future text-embedding-4 replaces 3-small within 2-3 years.
Quick Usage Guide
Python SDK:
from openai import OpenAI
client = OpenAI()
response = client.embeddings.create(
model="text-embedding-3-small",
input=["document 1", "document 2"],
dimensions=768,
)
vectors = [d.embedding for d in response.data]
Node.js SDK:
import OpenAI from "openai";
const client = new OpenAI();
const response = await client.embeddings.create({
model: "text-embedding-3-small",
input: "document text",
dimensions: 768,
});
const vector = response.data[0].embedding;
curl:
curl https://api.openai.com/v1/embeddings \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "text-embedding-3-small", "input": "hello", "dimensions": 768}'
Batch API (50% discount, async): submit a JSONL file with multiple embedding requests; get results within 24 hours. Use when latency isn't critical.
FAQ
Is text-embedding-3-small free?
No. $0.02 per million tokens. New OpenAI accounts get $5 in free credits — enough for ~250M tokens of embedding, more than enough for testing.
What dimensions should I use?
Default 1,536 for max quality. For most production RAG, 768 is the sweet spot — 97% quality retention at half the storage. Drop to 512 only when storage is a hard constraint.
Can I compare it against other embedding models?
Yes. Through TokenMix.ai you can hit text-embedding-3-small, text-embedding-3-large, Google text-embedding-005, Cohere embed-v4, and Voyage voyage-3.5 through one API key. Run the same query through each, measure retrieval quality on your ground truth, pick accordingly.
Does it support Chinese / Japanese?
Yes, with weaker quality than on English. For multilingual-heavy workloads, Cohere embed-v4 or Google text-embedding-005 are often better. Test on your specific language mix.
What's the max input length?
8,191 tokens per embedding request. For longer documents, chunk before embedding.
Does it support L2 normalization?
Embeddings are returned as L2-normalized vectors by default. If you need unnormalized, use the encoding_format parameter.
Can I use it with Qdrant, Pinecone, Weaviate?
Yes, all major vector DBs accept text-embedding-3-small vectors directly. Set your collection's dimension to match your chosen dimensions parameter (1,536 / 1,024 / 768 / 512 / 256).
How does it compare to text-embedding-3-large for code search?
Both are general-purpose. For code-specific, dedicated code embedding models (Voyage code-3) outperform. If budget allows, Voyage code-3 > text-embedding-3-large > text-embedding-3-small for code retrieval specifically.
Is batch tier really 50% off?
Yes, $0.01/MTok for batch. Trade-off: 24-hour max latency, async workflow. For one-time bulk embedding of a corpus, always use batch.
Should I wait for text-embedding-4?
Not announced. text-embedding-3 series has been stable for 2+ years. No signals of imminent replacement. If a new model ships, it'll likely be superior but more expensive — keep 3-small as your cost tier. For current projects, no reason to wait.
Related Articles
- Ultimate LLM Comparison Hub 2026: Every Major Model Benchmarked
- GPT-5 Nano: $0.05/$0.40 Pricing, 400K Context, Should You Still Use It?
- gpt-4o-transcribe: Speech-to-Text API Guide ($0.006/Min, 2026)
- gpt-4o-mini-tts: The Cheapest TTS API in 2026 ($0.015/Min, 13 Voices)
- claude-sonnet-4-5-20250929 vs 4-20250514: Version Diff Guide
Author: TokenMix Research Lab | Last Updated: April 25, 2026 | Data Sources: OpenAI embeddings announcement, OpenAI API pricing, OpenAI embeddings guide, MTEB leaderboard, Awesome Agents embedding pricing April 2026, TokenMix.ai multi-model embeddings