TokenMix Research Lab · 2026-04-24

Pinecone to Qdrant Migration Guide: 1 Day, 50% Cost Cut (2026)

Pinecone to Qdrant Migration Guide: 1 Day, 50% Cost Cut (2026)

Qdrant has quietly eaten Pinecone's production market share throughout 2026. The reason is boring and conclusive: Qdrant runs 2× faster at half the Pinecone cost on equivalent recall. At 10M vectors with 10K queries/day, Pinecone lists ~$70/month vs Qdrant Cloud at ~$45/month. At 50M vectors with 100K queries/day, self-hosted Qdrant on a 20/month VPS undercuts all managed options by 3-10×. Migration is measured in engineer-days, not weeks: export Pinecone index, re-embed (or reuse embeddings), upsert to Qdrant, flip routing. Full guide below with export scripts, re-embedding logic, verification queries, and the four production workloads where staying on Pinecone is still the right call. Tested on Qdrant 1.14.1 (April 2026) and Pinecone Serverless.

Table of Contents


Migration ROI Math

The migration is a one-engineer, one-day project. Break-even depends only on current Pinecone spend:

Pinecone monthly spend Qdrant equivalent Monthly savings Migration ROI
00 ~$50 $50 ~4 months (not worth it)
$300 ~ 06 94 60 days
$500 ~ 06 $394 30 days
,000 ~ 06 $894 13 days
$5,000 ~$200 $4,800 2 days
$20,000+ ~$400 9,600+ Immediate

The ROI threshold is ~$300/month. Below that, your engineer's time is better spent on product work. Above it, migration pays for itself in weeks. At ,000+/month, you should already have started.

The reason Qdrant scales so well: its pricing is compute-proportional, not vector-count-proportional. Pinecone's per-read and per-write unit charges accumulate nonlinearly at scale. Qdrant on a fixed VPS gives you deterministic costs regardless of query volume — a DevOps advantage as important as the raw savings.


Performance Comparison: Head-to-Head

Benchmark data at equivalent recall (95%+) on 10M 768-dim vectors:

Metric Pinecone Serverless Qdrant Cloud Qdrant Self-Hosted
p50 query latency 11ms 4ms 3ms
p95 query latency 45ms 22ms 18ms
p99 query latency 180ms 75ms 45ms
Throughput (QPS) ~2,000 ~4,500 ~8,000
Cold-start penalty ~900ms ~80ms 0ms
10M vectors/10K QPD cost $70/mo $45/mo 06/mo (fixed)
50M vectors/100K QPD cost ~$800/mo ~$350/mo 20/mo (fixed)
Filtered query speed Moderate 2-3× Pinecone 2-3× Pinecone

Where Qdrant wins hardest: filtered queries (vector + metadata constraints). Qdrant's HNSW implementation handles pre-filtering without rebuilding the index graph, so metadata-heavy queries ("find similar vectors where customer_id = 42 and created_after = 2026-04-01") run 2-3× faster than Pinecone equivalents.

Where Pinecone still matters: operational simplicity and SOC 2 Type 2 out of the box. We'll cover when these outweigh Qdrant's speed and cost below.


When to Migrate vs When to Stay on Pinecone

Migrate if:

Stay on Pinecone if:

The "stay" cases shrink every quarter. Qdrant Cloud added SOC 2 Type 2 in Q1 2026 and now supports HIPAA BAA, closing two of the historical gaps.


Step 1: Export Your Pinecone Index

Pinecone doesn't offer a native export CLI, so you'll iterate through the index in batches. For most production indexes:

from pinecone import Pinecone
import json
import os

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
index = pc.Index("your-index-name")

stats = index.describe_index_stats()
total_vectors = stats["total_vector_count"]
dimension = stats["dimension"]

print(f"Exporting {total_vectors} vectors of dim {dimension}")

all_ids = []
for prefix in ["a", "b", "c", ...]:  # Adjust based on your ID scheme
    response = index.list(prefix=prefix, limit=10000)
    all_ids.extend([v["id"] for v in response])

batch_size = 1000
with open("pinecone_export.jsonl", "w") as f:
    for i in range(0, len(all_ids), batch_size):
        batch_ids = all_ids[i:i+batch_size]
        result = index.fetch(ids=batch_ids)
        for vec_id, vec_data in result["vectors"].items():
            record = {
                "id": vec_id,
                "values": vec_data["values"],
                "metadata": vec_data.get("metadata", {}),
            }
            f.write(json.dumps(record) + "\n")
        print(f"Exported {i+len(batch_ids)}/{len(all_ids)}")

If you don't have a clean ID prefix scheme: use index.query with random query vectors and top_k=10000 to discover IDs, then fetch. Slower but works for any index.

For indexes over 10M vectors: parallelize the fetch step across namespaces or shard IDs. A single-threaded export of 50M vectors takes ~6 hours; parallelized with 10 workers, ~45 minutes.

Export file size estimate: ~5KB per vector at 768-dim FP32. A 10M vector export is ~50GB.


Step 2: Decide on Re-embedding vs Reuse

Two paths:

Reuse existing embeddings (faster, free):

Re-embed from source text (more thorough):

Reuse is almost always right. The cost and time of re-embedding 10M documents is significant (~$500-2,000 in API costs at current rates, plus 2-8 hours of wall-clock time). Unless your embeddings are demonstrably bad, reuse them.

Exception: if migrating is already prompting a model upgrade, do both at once. You can re-embed using TokenMix.ai for unified access to OpenAI text-embedding-3-large, Voyage AI voyage-3.5, Cohere embed-v4, and Google's text-embedding-005 through one API — useful for A/B testing embedding quality before committing to one.


Step 3: Bulk Upsert to Qdrant

Qdrant accepts batch upserts efficiently. For a 10M vector import:

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import json

client = QdrantClient(host="localhost", port=6333)

client.recreate_collection(
    collection_name="your-collection",
    vectors_config=VectorParams(size=768, distance=Distance.COSINE),
)

batch_size = 500
points_batch = []

with open("pinecone_export.jsonl") as f:
    for line_num, line in enumerate(f):
        record = json.loads(line)
        points_batch.append(PointStruct(
            id=record["id"],
            vector=record["values"],
            payload=record["metadata"],
        ))

        if len(points_batch) >= batch_size:
            client.upsert(
                collection_name="your-collection",
                points=points_batch,
                wait=False,
            )
            points_batch = []
            if line_num % 10000 == 0:
                print(f"Upserted {line_num} vectors")

    if points_batch:
        client.upsert(collection_name="your-collection", points=points_batch)

Three optimization tips:

  1. Use wait=False during bulk load. Qdrant defers index rebuilding until completion, dramatically speeding up upserts.
  2. Set optimizer_config.deleted_threshold high during load to prevent premature index optimization.
  3. Run upsert in parallel across 4-8 workers for 10M+ vector imports. The Qdrant server handles concurrent upserts efficiently.

Expected timing on a modest server (4 vCPU, 16GB RAM):


Step 4: Configure Filter Indexes

This is the step that unlocks Qdrant's biggest performance advantage. Create payload indexes for every metadata field you filter on:

from qdrant_client.models import PayloadSchemaType

client.create_payload_index(
    collection_name="your-collection",
    field_name="customer_id",
    field_schema=PayloadSchemaType.KEYWORD,
)

client.create_payload_index(
    collection_name="your-collection",
    field_name="created_at",
    field_schema=PayloadSchemaType.DATETIME,
)

client.create_payload_index(
    collection_name="your-collection",
    field_name="price",
    field_schema=PayloadSchemaType.FLOAT,
)

client.create_payload_index(
    collection_name="your-collection",
    field_name="tags",
    field_schema=PayloadSchemaType.KEYWORD,
)

Without payload indexes, filter queries degrade to full collection scans. With indexes, filtered queries run as fast as unfiltered (4ms p50 territory).

Pinecone doesn't expose explicit payload indexing controls, so this step has no Pinecone equivalent. It's pure gain from the migration.


Step 5: Verify Recall Parity

Before cutting traffic over, run a parallel query set against both systems and measure recall agreement:

import random

sample_queries = random.sample(all_query_vectors, 1000)

agreements = 0
top_k = 10

for query_vec in sample_queries:
    pinecone_results = pc_index.query(
        vector=query_vec,
        top_k=top_k,
        include_values=False,
    )
    qdrant_results = qdrant_client.search(
        collection_name="your-collection",
        query_vector=query_vec,
        limit=top_k,
    )

    pinecone_ids = set(r["id"] for r in pinecone_results["matches"])
    qdrant_ids = set(r.id for r in qdrant_results)

    overlap = len(pinecone_ids & qdrant_ids) / top_k
    if overlap >= 0.8:
        agreements += 1

print(f"Recall agreement: {agreements/len(sample_queries):.1%}")

Expected agreement: 95-99% on identical embeddings with default HNSW parameters. If you see <90% agreement, check:

  1. Distance metric matches (Pinecone cosine vs Qdrant cosine — both exist as Distance.COSINE)
  2. Vectors weren't normalized differently during export/upsert
  3. HNSW parameters are reasonable (Qdrant defaults work well for most cases)

Step 6: Cutover With Zero Downtime

Never cut over all traffic in one step. The canary pattern:

Phase 1 — Shadow traffic (2-3 days):

async def query_rag(query_vector):
    pinecone_task = asyncio.create_task(query_pinecone(query_vector))
    qdrant_task = asyncio.create_task(query_qdrant(query_vector))

    pinecone_result = await pinecone_task
    try:
        qdrant_result = await asyncio.wait_for(qdrant_task, timeout=0.5)
        log_comparison(pinecone_result, qdrant_result)
    except asyncio.TimeoutError:
        log_timeout()

    return pinecone_result

Run this for 2-3 days. Log divergences and investigate any >5% recall mismatch cases.

Phase 2 — Percentage cutover:

Start routing 5% of production traffic to Qdrant, hold for 24 hours, check error rates. Increase to 25%, then 50%, then 100% over 5-7 days.

Phase 3 — Decommission Pinecone:

After 2 weeks of 100% Qdrant traffic with no incidents, downgrade the Pinecone index to a single tiny read-replica for rollback insurance, then delete it 30 days later.

Total elapsed time for zero-downtime cutover: 2-3 weeks. The actual engineering work is still 1-2 days; the rest is conservative rollout.


Self-Hosted vs Qdrant Cloud Decision

The cost math only makes sense at specific scale points:

Scenario Qdrant Cloud Self-Hosted Qdrant
Under 1M vectors $25-40/mo 06/mo fixed
1M - 10M vectors $45-100/mo 06/mo (break-even at ~5M)
10M - 50M vectors ~$200-400/mo 06-250/mo
50M - 200M vectors ~$800-2,000/mo $250-500/mo
200M+ vectors Contact sales $500-2,000/mo

Self-hosted is better past 10M vectors if you have DevOps capacity. Simplest deployment: Qdrant on a $40-120/month DigitalOcean or Hetzner VPS with 8GB-32GB RAM. For production HA: 3-node cluster on Kubernetes.

Qdrant Cloud is better if:

The hidden cost nobody mentions for self-hosted: backup and disaster recovery. Budget 10-20% extra for proper snapshot storage and off-site replicas. This is included in Qdrant Cloud but not in the raw VPS math.


Embedding Model Routing

Your migration is a natural point to reconsider your embedding model. The 2026 embedding leaderboard:

For production RAG with mixed content, text-embedding-3-large still has the widest compatibility. For cost-sensitive workloads at scale, Google's 768-dim model is 5× cheaper with marginal quality loss on English.

TokenMix.ai provides OpenAI-compatible access to all five embedding models through one API key. Useful for A/B testing before committing to re-embedding a production index.


Migration Checklist


FAQ

How long does a Pinecone to Qdrant migration actually take?

Engineering work: 1-2 days for small indexes (under 10M vectors), 3-5 days for large indexes (50M+). Full cutover including shadow traffic and canary rollout: 2-3 weeks to be safe. The conservative rollout is where most of the calendar time goes — the actual code and data migration is fast.

Will my RAG recall change after migration?

Tested correctly, agreement between Pinecone and Qdrant results runs 95-99% on identical embeddings. Slight divergences come from HNSW graph construction differences, not quality differences. Net recall should be equivalent or slightly better on Qdrant due to more precise filtered query handling.

How does Qdrant handle hybrid search (sparse + dense)?

Qdrant supports hybrid search via NamedVector collections where you store dense and sparse (BM25-like) vectors per point. Query time, you combine scores via weighted average or reciprocal rank fusion. Performance and quality match Pinecone's hybrid implementation; setup is slightly more manual.

Can I run Qdrant serverless like Pinecone?

Qdrant Cloud has managed offerings but no true "scale-to-zero serverless." The closest analog is a small Qdrant Cloud plan ($25-40/mo) that's always-on. For true serverless (scale-to-zero) with vector search, consider Turbopuffer or Supabase's pgvector extension — but know that both trade off performance vs Qdrant.

What about Weaviate, Milvus, or Chroma instead of Qdrant?

Weaviate is Qdrant's closest competitor — similar performance, broader feature set (built-in RAG modules), slightly more complex operationally. Pick Weaviate if you need its specific features. Milvus scales to billions of vectors but has heavier operational overhead. Chroma is great for prototyping, not production. For 80% of Pinecone migration cases, Qdrant is the right default.

Does migration break any existing integrations?

Only if you're using Pinecone SDK-specific features in your code. The abstraction pattern that survives migration best: wrap vector search behind a repository interface (search_similar(query_vec, k, filter)) that can swap between Pinecone and Qdrant clients. If you're not using this pattern yet, the migration is a good excuse to add it.

How do I handle namespaces in Pinecone when migrating to Qdrant?

Pinecone namespaces map to Qdrant collections 1:1. A 20-namespace Pinecone index becomes 20 Qdrant collections. Query routing logic changes from namespace="foo" to collection_name="foo". Collection-level operations (delete, snapshot, reindex) are easier in Qdrant since each collection is fully independent.

Where can I test Qdrant alongside embedding model alternatives?

Run Qdrant locally via Docker (docker run -p 6333:6333 qdrant/qdrant) and route embedding generation through TokenMix.ai. This gives you a single API key to compare OpenAI, Voyage, Cohere, and Google embeddings feeding the same Qdrant index — the fastest way to evaluate embedding-model changes alongside vector DB migration without multiplying vendor relationships.


By TokenMix Research Lab · Updated 2026-04-24

Sources: Qdrant official benchmarks, Pinecone vs Qdrant — Particula.tech, RankSquire Pinecone pricing 2026, Vector DB cost comparison 2026 — LeanOps, Markaicode — Pinecone vs Qdrant billion scale, Qdrant documentation, TokenMix.ai embedding model aggregation