gemini-embedding-001: Dimensions, Pricing and Usage Guide (2026)
Google's gemini-embedding-001 is the state-of-the-art text embedding model from Google, currently holding top positions on the Massive Text Embedding Benchmark (MTEB) Multilingual leaderboard with an average task score of 68.32. Pricing: $0.15 per million input tokens standard, $0.075/MTok batch (50% off). Default output is 3,072 dimensions with Matryoshka Representation Learning support — truncate to 1,536 or 768 dimensions without quality loss. Released generally available in 2025 after an experimental launch, it's Google's embedding flagship as of April 2026. This guide covers pricing mechanics, MTEB context, Matryoshka dimension reduction, and when to pick it vs OpenAI or Cohere alternatives. All data verified against Google's official documentation.
Google's production text embedding model, accessible via the Gemini API and Vertex AI. Replaces older Google embedding models (text-embedding-gecko and similar legacy versions). Optimized for:
Multilingual semantic search
RAG retrieval across languages
Semantic similarity and classification
Cross-lingual retrieval tasks
Key attributes:
Attribute
Value
Creator
Google
Released
Experimental March 2025, GA 2025
Model ID
gemini-embedding-001
Default dimensions
3,072
Matryoshka reductions
1,536, 768 (and smaller)
Max input
2,048 tokens
Standard price
$0.15 / MTok
Batch price
$0.075 / MTok (50% off)
MTEB average task score
68.32
Status
Current production default
Pricing: Standard vs Batch
$0.15 per million input tokens for standard, $0.075 for batch.
Practical monthly cost examples:
Workload
Monthly token volume
Standard cost
Batch cost
Small RAG (1M docs, occasional re-embed)
~5M
$0.75
$0.38
Medium RAG (10M docs, weekly refresh)
~50M
$7.50
$3.75
Product catalog search
~100M
5.00
$7.50
Enterprise RAG (10M docs, streaming ingest)
~500M
$75.00
$37.50
Massive one-time embed (1B docs)
~1B
50.00
$75.00
Cost comparison with competitors (standard tier, per MTok):
gemini-embedding-001: $0.15
OpenAI text-embedding-3-small: $0.02 (cheaper but lower MTEB)
Google text-embedding-005: $0.025 (older, cheaper, lower quality)
Cohere embed-v4: $0.10
Voyage AI voyage-3.5: $0.18
Pricing insight: gemini-embedding-001 is priced as a premium offering. It's more expensive than text-embedding-3-small but delivers measurably better MTEB, especially multilingual. For mixed-language workloads, the premium is justified; for English-only at cost sensitivity, cheaper options may suffice.
Output Dimensions: 3072 with Matryoshka
Default 3,072 dimensions — large but flexible. Uses Matryoshka Representation Learning (MRL) technique, allowing truncation to smaller sizes without quality loss.
Supported output dimensions:
Dimensions
Typical quality retention
Use case
3,072 (default)
100%
Maximum accuracy, research
1,536
~99%
Standard production RAG
768
~97%
Cost/storage-optimized
256-512
~90-93%
Aggressive compression
API usage:
from google import genai
client = genai.Client()
result = client.models.embed_content(
model="gemini-embedding-001",
contents="Your text to embed",
config={"output_dimensionality": 768},
)
Storage implications: 1M vectors at 3,072 dimensions is ~12GB. At 768 dimensions, ~3GB. For vector DBs (Qdrant, Pinecone, Weaviate), this 4× storage difference translates directly to infrastructure cost.
Production pattern: store at 3,072 once for maximum fidelity. Index at 768 or 1,536 for day-to-day search. Keep 3,072 as cold storage for re-indexing if retrieval strategy changes.
MTEB Benchmark Performance
gemini-embedding-001 achieves 68.32 on MTEB — near the top of the multilingual leaderboard.
Category-specific scores:
Pair classification: 85.13
Retrieval: 67.71
Reranking: 65.58
Bitext mining (cross-lingual): top tier
For comparison:
Model
MTEB avg
Strengths
gemini-embedding-001
68.32
Multilingual, balanced
Qwen-based embeddings (Alibaba)
~67
Open-weight, Chinese-strong
Mistral embeddings
~66
Strong retrieval
NVIDIA NV-Embed-v2
~70
Current open-weight leader
Voyage AI voyage-3.5
~67
Paid frontier
OpenAI text-embedding-3-large
~64.6
General purpose
OpenAI text-embedding-3-small
62.26
Cost-efficient
Cohere embed-v4
~65
Multilingual, production
What the score gaps mean: gemini-embedding-001 beats OpenAI's 3-large by 3-4 MTEB points (meaningful on retrieval tasks) but sits behind specialized open-weight leaders like NV-Embed-v2. For closed-source production embedding with Google ecosystem integration, it's a strong default.
Supported LLM Providers and Model Routing
gemini-embedding-001 is accessible via:
Google AI Studio / Gemini API (generativelanguage.googleapis.com)
Google Vertex AI — enterprise deployment
OpenAI-compatible aggregators — TokenMix.ai, OpenRouter, and similar
Through TokenMix.ai, you get OpenAI-compatible access to gemini-embedding-001 alongside OpenAI text-embedding-3-small/3-large, Voyage AI voyage-3.5, Cohere embed-v4, and 300+ other models (including chat models like Claude Opus 4.7, GPT-5.5, DeepSeek V4-Pro, Kimi K2.6) through a single API key. For RAG stacks using mixed embedding + LLM generation, unified access eliminates vendor management overhead.
Note: the Google native SDK has a different API shape. Through OpenAI-compatible endpoints, embedding requests work identically to OpenAI's pattern.
When to Use It vs Alternatives
Your priority
Recommended
Best multilingual retrieval
gemini-embedding-001
Cheapest English-focused
OpenAI text-embedding-3-small
Top-of-MTEB leaderboard
NV-Embed-v2 (self-hosted) or Voyage voyage-3.5
Code retrieval specifically
Voyage code-3
Already on Google Cloud
gemini-embedding-001
Cohere ecosystem
Cohere embed-v4
On-prem / strict privacy
Open-weight BGE-m3 or NV-Embed-v2
Real-time low-latency
Any, all are <200ms
Rule of thumb: if your content is multilingual or you're on Google Cloud, start with gemini-embedding-001. For English-only cost optimization, text-embedding-3-small wins on price. For absolute highest quality at any price, Voyage or self-hosted NV-Embed-v2.
Context Window and Input Handling
2,048 token input limit per embedding request. Longer documents must be chunked.
Common chunking strategies:
Sliding window: overlap 200 tokens between chunks
Semantic chunking: break at paragraph/section boundaries
Hybrid: chunk at paragraphs but cap each at ~1,500 tokens
Gotcha: don't exceed 2,048 tokens. The API errors rather than truncating. Validate input length client-side before submission.
Quick Usage Guide
Via Google native SDK:
from google import genai
client = genai.Client()
result = client.models.embed_content(
model="gemini-embedding-001",
contents="Your document text",
config={"output_dimensionality": 768},
)
vector = result.embeddings[0].values
Google Gemini API supports batch mode via the Batch API — submit JSONL files of requests, get results within 24 hours at 50% cost. Best for one-time corpus embedding.
Typical RAG flow:
# Index phase
for document in corpus:
chunks = chunk_document(document, max_tokens=1500)
for chunk in chunks:
embedding = client.embeddings.create(
model="gemini-embedding-001",
input=chunk.text,
dimensions=768,
).data[0].embedding
vector_db.insert(embedding, chunk.metadata)
# Query phase
query_embedding = client.embeddings.create(
model="gemini-embedding-001",
input=user_query,
dimensions=768,
).data[0].embedding
results = vector_db.search(query_embedding, top_k=10)
2. Higher cost than OpenAI text-embedding-3-small. 7.5× more expensive per token. Worth it for multilingual, questionable for English-only cost-sensitive workloads.
3. Default dimensions are large. 3,072 dims consume storage fast. Use Matryoshka truncation to 768 for most production.
4. Google ecosystem lock-in risk. If your stack uses Google AI Studio or Vertex AI directly, migration away is engineering work. Route through an aggregator to avoid this.
5. No code-specific variant. For code retrieval, Voyage code-3 outperforms.
6. Batch tier requires separate API setup. Standard and batch use different endpoints in Google's native SDK. OpenAI-compatible wrappers may not expose batch cleanly.
FAQ
Is gemini-embedding-001 free?
No. $0.15/MTok standard, $0.075/MTok batch. Google AI Studio new users get free-tier quota for initial testing.
What's the difference between gemini-embedding-001 and text-embedding-005?
gemini-embedding-001 is the newer, higher-quality model. text-embedding-005 is older and cheaper ($0.025/MTok). For new work, use gemini-embedding-001.
Can I truncate embeddings to arbitrary dimensions?
Officially, recommended dimensions are 3,072 / 1,536 / 768 / 256. Other values work but may show quality variation. Stick to documented dims.
Does it support Chinese, Japanese, Korean?
Yes, excellent multilingual support. One of the reasons for its MTEB multilingual leadership.
How does it compare to NV-Embed-v2?
NV-Embed-v2 leads on English-dominant benchmarks but is open-weight (requires self-hosting). gemini-embedding-001 is API-accessible with strong multilingual. Pick based on your preference for managed vs self-hosted.
Is it available on Vertex AI?
Yes. Same model, enterprise deployment on Google Cloud.
What if I need more than 2,048 tokens per embedding?
Chunk first. Common pattern: 1,500-token chunks with 200-token overlap. Store chunk embeddings + metadata; rebuild document context at query time.
Can I fine-tune gemini-embedding-001?
Not directly via the standard API. Google Cloud Vertex AI may offer custom tuning options for enterprise accounts — check with your Google Cloud rep.
Where can I A/B test it against OpenAI or Voyage embeddings?
TokenMix.ai provides access to gemini-embedding-001, OpenAI text-embedding-3-large/small, Voyage AI voyage-3.5, and Cohere embed-v4 through a single API key — run the same query through each, measure retrieval quality on your test set, pick accordingly.