TokenMix Research Lab · 2026-04-25

gemini-embedding-001: Dimensions, Pricing and Usage Guide (2026)
Last Updated: 2026-04-25
Author: TokenMix Research Lab
Google's gemini-embedding-001 is the state-of-the-art text embedding model from Google, currently holding top positions on the Massive Text Embedding Benchmark (MTEB) Multilingual leaderboard with an average task score of 68.32. Pricing: $0.15 per million input tokens standard, $0.075/MTok batch (50% off). Default output is 3,072 dimensions with Matryoshka Representation Learning support — truncate to 1,536 or 768 dimensions without quality loss. Released generally available in 2025 after an experimental launch, it's Google's embedding flagship as of April 2026. This guide covers pricing mechanics, MTEB context, Matryoshka dimension reduction, and when to pick it vs OpenAI or Cohere alternatives. All data verified against Google's official documentation.
Table of Contents
- What gemini-embedding-001 Is
- Pricing: Standard vs Batch
- Output Dimensions: 3072 with Matryoshka
- MTEB Benchmark Performance
- Supported LLM Providers and Model Routing
- When to Use It vs Alternatives
- Context Window and Input Handling
- Quick Usage Guide
- Known Limitations
- FAQ
What gemini-embedding-001 Is
Google's production text embedding model, accessible via the Gemini API and Vertex AI. Replaces older Google embedding models (text-embedding-gecko and similar legacy versions). Optimized for:
- Multilingual semantic search
- RAG retrieval across languages
- Semantic similarity and classification
- Cross-lingual retrieval tasks
Key attributes:
| Attribute | Value |
|---|---|
| Creator | |
| Released | Experimental March 2025, GA 2025 |
| Model ID | gemini-embedding-001 |
| Default dimensions | 3,072 |
| Matryoshka reductions | 1,536, 768 (and smaller) |
| Max input | 2,048 tokens |
| Standard price | $0.15 / MTok |
| Batch price | $0.075 / MTok (50% off) |
| MTEB average task score | 68.32 |
| Status | Current production default |
Pricing: Standard vs Batch
$0.15 per million input tokens for standard, $0.075 for batch.
Practical monthly cost examples:
| Workload | Monthly token volume | Standard cost | Batch cost |
|---|---|---|---|
| Small RAG (1M docs, occasional re-embed) | ~5M | $0.75 | $0.38 |
| Medium RAG (10M docs, weekly refresh) | ~50M | $7.50 | $3.75 |
| Product catalog search | ~100M | $15.00 | $7.50 |
| Enterprise RAG (10M docs, streaming ingest) | ~500M | $75.00 | $37.50 |
| Massive one-time embed (1B docs) | ~1B | $150.00 | $75.00 |
Cost comparison with competitors (standard tier, per MTok):
- gemini-embedding-001: $0.15
- OpenAI
text-embedding-3-small: $0.02 (cheaper but lower MTEB) - OpenAI
text-embedding-3-large: $0.13 (comparable price, lower MTEB) - Google
text-embedding-005: $0.025 (older, cheaper, lower quality) - Cohere embed-v4: $0.10
- Voyage AI voyage-3.5: $0.18
Pricing insight: gemini-embedding-001 is priced as a premium offering. It's more expensive than text-embedding-3-small but delivers measurably better MTEB, especially multilingual. For mixed-language workloads, the premium is justified; for English-only at cost sensitivity, cheaper options may suffice.
Output Dimensions: 3072 with Matryoshka
Default 3,072 dimensions — large but flexible. Uses Matryoshka Representation Learning (MRL) technique, allowing truncation to smaller sizes without quality loss.
Supported output dimensions:
| Dimensions | Typical quality retention | Use case |
|---|---|---|
| 3,072 (default) | 100% | Maximum accuracy, research |
| 1,536 | ~99% | Standard production RAG |
| 768 | ~97% | Cost/storage-optimized |
| 256-512 | ~90-93% | Aggressive compression |
API usage:
from google import genai
client = genai.Client()
result = client.models.embed_content(
model="gemini-embedding-001",
contents="Your text to embed",
config={"output_dimensionality": 768},
)
Storage implications: 1M vectors at 3,072 dimensions is ~12GB. At 768 dimensions, ~3GB. For vector DBs (Qdrant, Pinecone, Weaviate), this 4× storage difference translates directly to infrastructure cost.
Production pattern: store at 3,072 once for maximum fidelity. Index at 768 or 1,536 for day-to-day search. Keep 3,072 as cold storage for re-indexing if retrieval strategy changes.
MTEB Benchmark Performance
gemini-embedding-001 achieves 68.32 on MTEB — near the top of the multilingual leaderboard.
Category-specific scores:
- Pair classification: 85.13
- Retrieval: 67.71
- Reranking: 65.58
- Bitext mining (cross-lingual): top tier
For comparison:
| Model | MTEB avg | Strengths |
|---|---|---|
| gemini-embedding-001 | 68.32 | Multilingual, balanced |
| Qwen-based embeddings (Alibaba) | ~67 | Open-weight, Chinese-strong |
| Mistral embeddings | ~66 | Strong retrieval |
| NVIDIA NV-Embed-v2 | ~70 | Current open-weight leader |
| Voyage AI voyage-3.5 | ~67 | Paid frontier |
| OpenAI text-embedding-3-large | ~64.6 | General purpose |
| OpenAI text-embedding-3-small | 62.26 | Cost-efficient |
| Cohere embed-v4 | ~65 | Multilingual, production |
What the score gaps mean: gemini-embedding-001 beats OpenAI's 3-large by 3-4 MTEB points (meaningful on retrieval tasks) but sits behind specialized open-weight leaders like NV-Embed-v2. For closed-source production embedding with Google ecosystem integration, it's a strong default.
Supported LLM Providers and Model Routing
gemini-embedding-001 is accessible via:
- Google AI Studio / Gemini API (
generativelanguage.googleapis.com) - Google Vertex AI — enterprise deployment
- OpenAI-compatible aggregators — TokenMix.ai, OpenRouter, and similar
Through TokenMix.ai, you get OpenAI-compatible access to gemini-embedding-001 alongside OpenAI text-embedding-3-small/3-large, Voyage AI voyage-3.5, Cohere embed-v4, and 300+ other models (including chat models like Claude Opus 4.7, GPT-5.5, DeepSeek V4-Pro, Kimi K2.6) through a single API key. For RAG stacks using mixed embedding + LLM generation, unified access eliminates vendor management overhead.
Basic usage:
from openai import OpenAI
client = OpenAI(
api_key="your-tokenmix-key",
base_url="https://api.tokenmix.ai/v1",
)
embedding = client.embeddings.create(
model="gemini-embedding-001",
input="Your search query",
dimensions=768,
).data[0].embedding
Note: the Google native SDK has a different API shape. Through OpenAI-compatible endpoints, embedding requests work identically to OpenAI's pattern.
When to Use It vs Alternatives
| Your priority | Recommended |
|---|---|
| Best multilingual retrieval | gemini-embedding-001 |
| Cheapest English-focused | OpenAI text-embedding-3-small |
| Top-of-MTEB leaderboard | NV-Embed-v2 (self-hosted) or Voyage voyage-3.5 |
| Code retrieval specifically | Voyage code-3 |
| Already on Google Cloud | gemini-embedding-001 |
| Cohere ecosystem | Cohere embed-v4 |
| On-prem / strict privacy | Open-weight BGE-m3 or NV-Embed-v2 |
| Real-time low-latency | Any, all are <200ms |
Rule of thumb: if your content is multilingual or you're on Google Cloud, start with gemini-embedding-001. For English-only cost optimization, text-embedding-3-small wins on price. For absolute highest quality at any price, Voyage or self-hosted NV-Embed-v2.
Context Window and Input Handling
2,048 token input limit per embedding request. Longer documents must be chunked.
Common chunking strategies:
- Sliding window: overlap 200 tokens between chunks
- Semantic chunking: break at paragraph/section boundaries
- Hybrid: chunk at paragraphs but cap each at ~1,500 tokens
Gotcha: don't exceed 2,048 tokens. The API errors rather than truncating. Validate input length client-side before submission.
Quick Usage Guide
Via Google native SDK:
from google import genai
client = genai.Client()
result = client.models.embed_content(
model="gemini-embedding-001",
contents="Your document text",
config={"output_dimensionality": 768},
)
vector = result.embeddings[0].values
Via OpenAI-compatible aggregator (TokenMix.ai):
from openai import OpenAI
client = OpenAI(api_key="your-key", base_url="https://api.tokenmix.ai/v1")
embedding = client.embeddings.create(
model="gemini-embedding-001",
input="Your document text",
dimensions=768,
).data[0].embedding
Batch processing for large-scale indexing:
Google Gemini API supports batch mode via the Batch API — submit JSONL files of requests, get results within 24 hours at 50% cost. Best for one-time corpus embedding.
Typical RAG flow:
# Index phase
for document in corpus:
chunks = chunk_document(document, max_tokens=1500)
for chunk in chunks:
embedding = client.embeddings.create(
model="gemini-embedding-001",
input=chunk.text,
dimensions=768,
).data[0].embedding
vector_db.insert(embedding, chunk.metadata)
# Query phase
query_embedding = client.embeddings.create(
model="gemini-embedding-001",
input=user_query,
dimensions=768,
).data[0].embedding
results = vector_db.search(query_embedding, top_k=10)
Known Limitations
1. 2,048 token input limit. Longer documents require chunking.
2. Higher cost than OpenAI text-embedding-3-small. 7.5× more expensive per token. Worth it for multilingual, questionable for English-only cost-sensitive workloads.
3. Default dimensions are large. 3,072 dims consume storage fast. Use Matryoshka truncation to 768 for most production.
4. Google ecosystem lock-in risk. If your stack uses Google AI Studio or Vertex AI directly, migration away is engineering work. Route through an aggregator to avoid this.
5. No code-specific variant. For code retrieval, Voyage code-3 outperforms.
6. Batch tier requires separate API setup. Standard and batch use different endpoints in Google's native SDK. OpenAI-compatible wrappers may not expose batch cleanly.
FAQ
Is gemini-embedding-001 free?
No. $0.15/MTok standard, $0.075/MTok batch. Google AI Studio new users get free-tier quota for initial testing.
What's the difference between gemini-embedding-001 and text-embedding-005?
gemini-embedding-001 is the newer, higher-quality model. text-embedding-005 is older and cheaper ($0.025/MTok). For new work, use gemini-embedding-001.
Can I truncate embeddings to arbitrary dimensions?
Officially, recommended dimensions are 3,072 / 1,536 / 768 / 256. Other values work but may show quality variation. Stick to documented dims.
Does it support Chinese, Japanese, Korean?
Yes, excellent multilingual support. One of the reasons for its MTEB multilingual leadership.
How does it compare to NV-Embed-v2?
NV-Embed-v2 leads on English-dominant benchmarks but is open-weight (requires self-hosting). gemini-embedding-001 is API-accessible with strong multilingual. Pick based on your preference for managed vs self-hosted.
Is it available on Vertex AI?
Yes. Same model, enterprise deployment on Google Cloud.
What if I need more than 2,048 tokens per embedding?
Chunk first. Common pattern: 1,500-token chunks with 200-token overlap. Store chunk embeddings + metadata; rebuild document context at query time.
Can I fine-tune gemini-embedding-001?
Not directly via the standard API. Google Cloud Vertex AI may offer custom tuning options for enterprise accounts — check with your Google Cloud rep.
Where can I A/B test it against OpenAI or Voyage embeddings?
TokenMix.ai provides access to gemini-embedding-001, OpenAI text-embedding-3-large/small, Voyage AI voyage-3.5, and Cohere embed-v4 through a single API key — run the same query through each, measure retrieval quality on your test set, pick accordingly.
Related Articles
- Ultimate LLM Comparison Hub 2026: Every Major Model Benchmarked
- imagen-3.0-generate-002: Deprecated — Migration Guide (2026)
- QVQ Max: Alibaba's Visual Reasoning Model Explained (2026)
- qwen3-next-80b-a3b-instruct: Full Review (80B MoE, 3B Active)
- text-embedding-3-small: $0.02/MTok, 1536 Dims, MTEB 62.26 Guide
Author: TokenMix Research Lab | Last Updated: April 25, 2026 | Data Sources: Google Gemini Embedding announcement, Google Embeddings API docs, MTEB Multilingual leaderboard, CometAPI Gemini embedding coverage, Embedding models pricing April 2026, TokenMix.ai multi-model embeddings