Vector Search in Production: pgvector vs Dedicated Vector Databases
Choosing a vector search backend should take an afternoon. Instead, most teams spend weeks reading benchmark posts that contradict each other, sitting through vendor demos, and eventually picking the option with the best-looking dashboard. Then they rebuild the decision six months later when production traffic reveals what the benchmarks didn't.
This post is about making that choice correctly the first time. The short version: pgvector is the right answer for most SaaS products, and the thresholds where a dedicated vector database genuinely wins are more specific than the vendor marketing suggests. We'll walk through those thresholds concretely, and give you the criteria to apply to your own workload.
What Vector Search Actually Involves at the Infrastructure Layer
Before comparing options, it's worth being precise about what you're choosing between.
Vector search finds the k nearest neighbors to a query embedding across a stored collection of vectors. The two dominant index types are:
- HNSW (Hierarchical Navigable Small World): A graph-based index. Fast at query time, memory-hungry at build time, excellent recall. The dominant approach in 2026.
- IVFFlat (Inverted File Index): Clusters vectors into buckets, searches a subset. Lower memory overhead, faster to build, worse recall unless you tune the
nprobesparameter carefully.
Both pgvector and every dedicated vector database support HNSW. The differences are in how well each system handles the surrounding concerns: transactional consistency with your relational data, index update throughput, operational overhead, and cost at the scale you actually run.
pgvector: The Case For Starting Here
pgvector is a PostgreSQL extension. You store vectors as a column type alongside the rest of your relational data. You query with <-> (L2 distance), <=> (cosine distance), or <#> (inner product).
SELECT id, content, embedding <=> query_embedding AS distance
FROM documents
WHERE user_id = $1
ORDER BY distance
LIMIT 10;
That's it. No separate service. No synchronization job. No separate authentication layer. No additional infrastructure cost until you exceed what a well-tuned Postgres instance can handle.
Where pgvector genuinely wins
Transactional consistency. When a user deletes a document, the row disappears from vector search results immediately — within the same transaction. With any dedicated vector database, you now have an eventual consistency problem. You need a deletion propagation mechanism, and you need to test it under failure conditions. That's not a reason never to use a dedicated store, but it's a real engineering cost that benchmarks don't capture.
Metadata filtering. Hybrid queries that filter on relational attributes before or after vector search are first-class SQL. In Postgres, a query like "find the 10 most similar documents to this embedding, written by users in the enterprise tier, created in the last 90 days" is a standard query planner problem. In most dedicated vector databases, metadata filtering is a bolt-on with subtly different semantics and performance characteristics depending on filter selectivity.
Operational simplicity. Your team already runs Postgres. You already have backup procedures, monitoring, access control, and runbooks for it. Adding pgvector means adding an extension and a column type — not a new infrastructure tier with its own failure modes.
Cost at moderate scale. A well-provisioned Postgres instance with pgvector on a 4-core/32GB node comfortably handles 5–10 million vectors with HNSW and sub-50ms p99 query latency for typical SaaS workloads. At Pinecone's standard pricing, the equivalent performance would cost 3–5× more per month. That spread matters at early growth stages.
The honest limitations
pgvector's HNSW implementation has improved substantially with each release, but it has real constraints:
- Memory requirements. HNSW indexes are loaded entirely into memory for fast query performance. At 10M+ vectors with high-dimensional embeddings (1536 dimensions for OpenAI's
text-embedding-3-small), your index consumes tens of gigabytes. This is manageable, but it shapes your instance sizing. - Index build time. Building an HNSW index on 50M+ vectors is a multi-hour operation that blocks concurrent updates on that column. Dedicated databases handle live index updates more gracefully.
- Throughput under heavy concurrent writes. At high ingest rates (thousands of vectors per second), Postgres's general-purpose write path creates contention that dedicated systems avoid through purpose-built ingest pipelines.
Dedicated Vector Databases: Pinecone, Weaviate, Qdrant
The dedicated options each have distinct positioning:
Pinecone is the managed-only option — no self-hosting. It's operationally the simplest of the dedicated databases, with a clean API and strong managed-infrastructure guarantees. The tradeoff is cost and lock-in. Pricing is consumption-based and climbs quickly past 10M vectors with significant query throughput.
Weaviate is open-source with a managed cloud option. It combines vector search with a graph-like object model, which is appealing for knowledge graph use cases but adds conceptual overhead for teams who just want nearest-neighbor search. Strong multimodal support (images, audio) if that's relevant to your product.
Qdrant is open-source, written in Rust, and has emerged as the strongest self-hosted option in terms of performance and operational simplicity. It supports sparse vectors alongside dense vectors natively, which makes hybrid BM25+vector search straightforward. For teams who want a dedicated vector database without Pinecone's pricing, Qdrant is usually the right choice in 2026.
When a dedicated database actually pays off
The cases where dedicated vector databases genuinely justify the additional operational layer:
Collections exceeding ~50M vectors. At this scale, Postgres's general-purpose storage format and write path become real bottlenecks. Dedicated databases optimize storage layout specifically for vector data and handle index updates more gracefully at this volume.
Ingest throughput above ~1,000 vectors/second sustained. If your product ingests documents or events at high velocity — real-time user activity streams, continuous document processing pipelines — dedicated systems handle this more cleanly. They're built around the assumption that your write path is vector-heavy.
Multitenancy at extreme scale. If you're building a platform where each tenant has a separate embedding namespace with tens of millions of vectors, dedicated systems have first-class namespace/collection isolation. pgvector requires you to handle this with row-level filtering, which adds query complexity.
Search-as-a-core-product. If similarity search is your primary product surface — not a supporting feature for an AI assistant or document retrieval — the additional tuning controls in dedicated databases (HNSW parameter tuning, quantization options, hybrid reranking pipelines) are worth learning.
Hybrid Search: BM25 + Vector
Most production retrieval systems benefit from combining sparse keyword search (BM25) with dense vector search. Pure semantic similarity misses exact-match queries ("what does field X in our API response mean?") that keyword search handles better.
In pgvector, you can implement hybrid search by combining a ts_vector full-text search score with the vector distance score using a weighted combination or RRF (Reciprocal Rank Fusion). This is fully achievable in SQL, but it requires deliberate query construction.
Qdrant has native sparse vector support that makes BM25+vector hybrid search a first-class operation. Weaviate and Pinecone both support hybrid search modes as well.
If hybrid search is central to your retrieval architecture — not just a nice-to-have — this is the one area where dedicated databases offer a noticeably better developer experience than pgvector today.
Evaluation Metrics That Actually Matter
When evaluating retrieval quality, the standard metrics are:
- Recall@k: Of the true k nearest neighbors, how many does the index actually return? HNSW typically achieves 95–99% recall with default settings; IVFFlat requires tuning.
- p95/p99 query latency: Median latency is misleading. What matters for user-facing features is the tail.
- Index build time and online update latency: How long does adding new vectors take to be searchable?
The most important evaluation, which most teams skip: end-to-end retrieval quality on your actual data, with your actual queries. Generic benchmarks use synthetic datasets. Your users' documents and query patterns may have very different characteristics. Build a small evaluation set from real production queries before committing to an index type or backend.
A Practical Decision Path
Starting a new project or adding vector search to an existing system:
- Default to pgvector. You almost certainly don't need anything else at launch. Ship with it, measure, and revisit when you have real production data.
- Revisit at 10M+ vectors if you're experiencing query latency issues, high index memory pressure, or ingest throughput bottlenecks. Instrument these metrics from day one so the revisit is data-driven.
- Choose Qdrant if migrating off pgvector and you want to self-host. It has the best performance/operational-complexity ratio among the self-hosted options and native support for the hybrid search patterns you'll likely need.
- Choose Pinecone if your team has no capacity to operate vector infrastructure and you need managed reliability guarantees with predictable SLAs.
Migrating From pgvector to a Dedicated Store (When the Time Comes)
If you start with pgvector and need to migrate later, the path is straightforward in principle:
- Export embeddings from Postgres with their associated IDs.
- Bulk-ingest into the new store.
- Run your evaluation suite against both backends in parallel before cutting over.
- Maintain the Postgres foreign key structure; the new store holds only the vectors and IDs, not your relational data.
The biggest migration risk is that your metadata filtering queries, which work naturally in SQL, need to be translated to whatever query API the new system provides. Plan for this translation effort upfront — it's rarely as simple as the migration guides suggest.
What This Means for Your Architecture
If you're building an AI-assisted SaaS product with document retrieval, semantic search, or any embedding-based personalization, the right starting point is pgvector inside your existing Postgres database.
Add a vector(1536) column. Build your HNSW index. Write your retrieval queries in SQL alongside your other queries. Keep the operational surface area of your product minimal while you're finding product-market fit and learning your actual access patterns.
The dedicated vector database conversation becomes productive once you have real production data showing you where pgvector's constraints actually matter for your specific workload — not before.
If you're evaluating your vector search architecture or building out an AI feature stack and want an independent read on the tradeoffs for your specific situation, we're happy to take a look. Reach out at hello@wolf-tech.io or via wolf-tech.io.
Frequently Asked Questions
Is pgvector production-ready in 2026? Yes. With recent HNSW improvements and proper tuning, pgvector handles the vector search workloads of the vast majority of SaaS products without issue. The main constraints — memory requirements for large indexes and ingest throughput limits — only become binding at scale thresholds most products don't reach in their first few years.
What embedding dimensions does pgvector support?
pgvector supports up to 2,000 dimensions. OpenAI's text-embedding-3-small uses 1,536 dimensions; text-embedding-3-large can be reduced to 256–1,536 dimensions. Anthropic and Cohere embeddings fit within the limit.
Can I run hybrid BM25 + vector search in pgvector?
Yes, using Postgres's built-in ts_vector full-text search combined with pgvector's distance operators. The query is more verbose than Qdrant's native hybrid mode, but it's fully functional and avoids a separate service.
When should I use Pinecone instead of Qdrant? Pinecone is the better choice when your team has no capacity to operate self-hosted infrastructure and you need fully managed reliability. Qdrant is the better choice when you're comfortable operating a service and want to avoid Pinecone's per-query pricing at scale.
How does pgvector perform compared to Pinecone? On equivalent hardware, a well-tuned pgvector HNSW index achieves comparable recall and query latency to Pinecone for collections under ~20M vectors. Above that threshold, Pinecone's purpose-built infrastructure typically shows better throughput under concurrent load.

