Vector Search in Production: pgvector vs Dedicated Vector Databases

May 2, 2026#pgvector vs Pinecone

Sandor Farkas

Founder & Lead Developer

Expert in software development and legacy code optimization

A SaaS team in Hamburg spent three weeks last quarter benchmarking pgvector vs Pinecone, Weaviate, and Qdrant for a 4-million-chunk knowledge base. Engineering pushed for Pinecone because the docs were friendliest. Platform pushed for Qdrant because the licence was right. The CTO wanted Postgres for the same reason every CTO wants Postgres: one fewer system to babysit. The decision sat on a Confluence page for forty days while a feature that would have shipped on pgvector in a long weekend earned exactly zero revenue. When they finally measured their actual workload — 4 million chunks, ~6 QPS at peak, sub-300 ms p95 acceptable — pgvector won on every dimension that mattered. The vendor evaluation had been a tax paid to the imagined version of the system, not the real one.

This post is the comparison we wish that team had read first. It is not a feature matrix scraped from vendor landing pages; it is a load-driven look at vector search production trade-offs, with concrete thresholds for when pgvector is the right answer and when a dedicated vector database earns its operational cost. The default in 2026 should be Postgres with pgvector, and you should be able to defend the move to anything else with numbers.

What Actually Drives the Decision

Most vector database comparison posts rank the contenders on benchmarks none of their readers will hit. The variables that matter for a real product are narrower and more boring.

Corpus size. How many vectors do you need to index, and how fast is that number growing? A few hundred thousand chunks is hobby scale. A few million is the meat of B2B SaaS. Tens of millions is the threshold where the game changes. Above that, you start paying serious attention to memory layout and disk I/O.

Query rate and latency budget. A back-office "search our docs" feature sees 1–5 QPS at peak with a 500 ms latency budget. A user-facing autocomplete tied to embeddings sees hundreds of QPS with a 50 ms budget. Those are different problems with different answers, even on the same corpus.

Update cadence. A static legal corpus that changes monthly is a different system than a customer support index that changes every minute. Real-time index updates change the calculus on HNSW (which is expensive to rebuild) versus IVFFlat (which is cheaper but less accurate).

Filter selectivity. Almost every production query has metadata filters — tenant, language, document type, date range. The tighter your filter, the more brute-force scans become acceptable. The looser your filter, the more you depend on a fast approximate index.

Operational reality. You already run Postgres. You probably already run Redis. Adding a fourth stateful service to your platform is not free, and the cost is mostly invisible until something breaks at 03:00 on a Sunday.

These five variables, not the marketing-page benchmarks, decide which tool fits.

pgvector: Where Postgres Vector Search Wins

The pgvector extension turns Postgres into a competent vector store with HNSW and IVFFlat indexes, cosine, L2 and inner-product distance, and predictable performance up to roughly 50 million rows on commodity hardware. For most B2B SaaS workloads that ceiling is several years above where the product is now.

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE chunk (
    id BIGSERIAL PRIMARY KEY,
    tenant_id UUID NOT NULL,
    document_id UUID NOT NULL,
    content TEXT NOT NULL,
    embedding vector(1536),
    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX chunk_tenant_idx ON chunk (tenant_id);
CREATE INDEX chunk_embedding_idx
    ON chunk USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

A single-row schema, two indexes, and you have a tenant-aware vector store backed by the database your application already trusts. The wins compound in production.

Transactional consistency is the biggest one. When you embed a new document, you write the document, the chunks, and the vectors in one transaction. There is no eventual consistency between your source of truth and your vector store, no synchronisation job to reconcile drift, and no scenario where the application sees a chunk that the index does not yet know about. Teams who have run a separate vector database know how much engineering goes into pretending that consistency exists.

Filter pushdown is the second. Postgres knows how to combine an HNSW scan with a B-tree index on tenant_id or language better than most dedicated systems. With pgvector 0.7+ and the iterative scan option, highly selective filters stop being a foot-gun. For multi-tenant SaaS, that single property eliminates an entire category of vendor lock-in.

Operational fit is the third. Your backups, your monitoring, your point-in-time recovery, your replicas, your access control — all of it covers vector search the day you ship the feature. There is no separate dashboard, no separate IAM model, no second on-call rotation.

Where pgvector wears thin: corpora above ~50 million chunks where HNSW memory cost gets uncomfortable, sustained loads above a few hundred QPS per primary, multi-vector retrieval (ColBERT-style) which requires architecture pgvector does not natively support, and very high-dimension models (3072+) where index quality degrades. Those are the real ceilings. Almost no B2B SaaS hits them.

Pinecone, Weaviate, Qdrant: What You Actually Get

The three dedicated systems are not interchangeable, and conflating them is the most common mistake in vendor evaluations.

Pinecone is the managed-only option. You hand them vectors and queries; they hand you back results with predictable latency. The pricing model is per-pod, the operational story is "there is no operational story," and the speed-to-feature is excellent. The cost is loss of control: no self-hosting, US data residency unless you pay for the EU tier, and the kind of vendor-lock-in that becomes a board-level discussion when the bill grows. Pinecone makes sense when your team has zero infrastructure capacity and your data is not GDPR-sensitive.

Weaviate is the most feature-rich. Built-in hybrid (BM25 + vectors), modular embedders that can run inside the database, multi-tenancy, GraphQL and REST APIs. The cloud option is real and the self-hosted option works. The cost is conceptual surface area: Weaviate is a database in its own right and treating it as "just a vector store" leaves performance on the table. Weaviate makes sense when hybrid search and tight embedding integration matter more than schema simplicity.

Qdrant is the performance-and-control option. Written in Rust, very fast, strong filter performance, an Apache 2.0 licence, and a clean self-hosted story. The cloud option is fine; the self-hosted option is what most Qdrant users actually run. The cost is that you are now running a stateful service that nobody on your team wakes up at night thinking about. Qdrant makes sense when raw performance matters and the team has the platform muscle to operate it well.

For European clients, an underrated factor is data residency. Qdrant self-hosted in your own infrastructure is the simplest GDPR story; Weaviate with EU hosting is reasonable; Pinecone EU is workable but more expensive and more locked in. A pgvector setup on a Hetzner or OVH Postgres needs no separate review at all.

The Numbers That Should Drive Your Choice

Pick a tool by the load it has to carry, not the load you imagine it might one day carry. The thresholds below come from production deployments we have audited and built; treat them as starting points, not laws.

Stay on pgvector when your corpus is below 20 million chunks, your peak QPS is below 100, your update rate is moderate (under a few thousand vectors per minute), and your filters are selective. This covers the overwhelming majority of B2B SaaS knowledge bases, internal search, and RAG features.

Consider Qdrant when your corpus is between 20 and 200 million chunks, you need sub-50 ms p99 at three-figure QPS, and you have engineers who can operate it. Self-hosted Qdrant on a serious VM with NVMe storage holds up impressively at this scale, and the cost story beats Pinecone by a wide margin.

Consider Weaviate when hybrid search is a first-class requirement, you want to keep embedding-model logic inside the database, or you need a GraphQL API to feed a frontend without writing a custom service.

Consider Pinecone when the team has zero capacity to run another stateful service, GDPR is not a concern (or the EU tier fits the budget), and shipping the feature next month matters more than per-query cost.

The honest threshold to switch off pgvector is rarely "we hit 50 million vectors." It is usually "the bottleneck moved from retrieval quality to retrieval latency, we have measured it, and we have ruled out cheaper fixes." A great deal of money has been spent on dedicated vector databases by teams who could have closed the same gap with a better re-ranker, smarter chunking, or an HNSW parameter change.

A Decision Framework for Vector DB Selection

A short checklist is more useful than another benchmark table. Run through it the next time someone schedules a Pinecone-vs-Weaviate-vs-Qdrant meeting.

Is the corpus under 20 million chunks today and unlikely to triple in the next twelve months? Is peak QPS under 100? Are filter clauses selective? Is the data sensitive enough that staying inside an EU-hosted Postgres is a hard win? Does the team already operate Postgres in production? If five out of five are yes, pgvector is the answer and the meeting is over.

If the answer is no on corpus or QPS, run an actual benchmark on representative data and traffic against your top one or two alternatives — not all three. Decide on numbers: p95 retrieval latency, recall@10 against a curated eval set, monthly cost at projected volume, and a realistic estimate of platform engineering hours per quarter.

Then commit. The teams that thrash on this decision usually pay more in delayed shipping than they could possibly save on infrastructure cost.

Migration: How to Switch Without Drama

If a benchmark says you genuinely need to leave pgvector, the migration is a straightforward two-phase rollout. Keep pgvector as the source of truth, dual-write embeddings to the new system behind a feature flag, run both retrievers in shadow mode for a week with logged comparison metrics, then flip traffic. The new system handles retrieval; pgvector continues to be the consistent canonical store.

The same blueprint works in reverse. We have helped European teams migrate from a managed vector database back to pgvector after the bill, the latency profile, and the GDPR review aligned in one direction. That kind of pragmatic backtracking is part of healthy legacy code optimization, even when the "legacy" code is twelve months old.

What to Take Away

The default for vector search production at most B2B SaaS scales is Postgres with pgvector. It eliminates a synchronisation problem, fits the operational shape of a team that already runs Postgres, and holds up to loads most products will not exceed. Dedicated vector databases earn their place above clear thresholds, and the right one depends on whether you are optimising for managed simplicity (Pinecone), feature richness (Weaviate), or raw performance with control (Qdrant).

Picking a stack on real numbers — corpus size, QPS, update cadence, filter selectivity, and the operational headcount you actually have — is dramatically cheaper than picking one on vendor charisma. If you are evaluating vector database options for a regulated European product, or rescuing a vector search system whose costs and latencies have drifted past the point of comfort, this is the kind of architecture review a tech stack strategy engagement is built for. Contact us at hello@wolf-tech.io or visit wolf-tech.io to talk through the right vector store for your workload.