Hybrid Search in Postgres: Combining pgvector and Full-Text Search Without Elasticsearch

#hybrid search Postgres
Sandor Farkas - Founder & Lead Developer at Wolf-Tech

Sandor Farkas

Founder & Lead Developer

Expert in software development and legacy code optimization

A startup in Amsterdam built a support knowledge base on pgvector. Semantic search was genuinely impressive in demos — queries like "how do I cancel my subscription" matched articles titled "ending your plan" even though not a single word overlapped. Then they shipped to production and discovered the failure mode: users searching for a specific error code like ERR_AUTH_EXPIRED got back generic articles about authentication. The vector model had no idea that a six-character token was the most important part of the query. They added Elasticsearch as a second system. Six months later they were paying two sets of infrastructure bills, maintaining two sync pipelines, and debugging the gap between what Postgres knew and what Elasticsearch knew. There is a better path.

Hybrid search in Postgres — combining pgvector embeddings with native tsvector full-text search and merging the results with reciprocal rank fusion — gives you most of what Elasticsearch does for search quality at none of the operational overhead. For most mid-size SaaS teams handling up to a few million documents, it is the right default in 2026.

Why Neither System Alone Is Enough

Pure vector search excels at intent matching. It handles synonyms, paraphrases, and cross-lingual queries that BM25 simply cannot. Ask "how do I fix a broken login" and it will correctly surface articles titled "troubleshooting authentication failures." But it fails on specificity. Product codes, error identifiers, version numbers, proper nouns, and technical abbreviations are compressed into the same embedding space as everything else, and they lose their salience. A query for symfony/http-kernel 6.4 should return the exact changelog or migration guide, not a cloud of semantically adjacent articles.

Pure full-text search — whether Postgres tsvector, Elasticsearch BM25, or Solr — is the complement. It is extremely precise on exact tokens and partial matches, handles numbers and codes correctly, and is fast. But it has no concept of meaning. Ask "show me articles about ending my subscription" and it finds nothing if the word "cancellation" does not appear in the query.

Hybrid search uses both and merges the ranked lists. The only design decision is how to combine two score scales that are not on the same axis.

Reciprocal Rank Fusion: The Simplest Merger That Works

Reciprocal rank fusion (RRF) solves the score-combination problem by ignoring raw scores entirely and working only with ranks. Each result gets a score of 1 / (k + rank) in each system, where k is a constant (typically 60) that dampens the influence of the very top result. The final score is the sum across systems. A result that is ranked 1st in vector search and 5th in full-text search will beat a result that is ranked 2nd in one system and absent from the other.

The formula is simple, robust, and does not require tuning the relative weights of the two systems. Teams that try weighted sum approaches spend weeks calibrating coefficients per query type. RRF degrades gracefully: if a result only appears in one retrieval system, it still contributes positively rather than being discarded.

The Schema

A clean schema for hybrid search keeps the full-text index on the same row as the vector. Maintaining two separate tables that must stay in sync is the same class of problem as maintaining Elasticsearch alongside Postgres — just cheaper.

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE document_chunk (
    id            BIGSERIAL PRIMARY KEY,
    tenant_id     UUID        NOT NULL,
    document_id   UUID        NOT NULL,
    content       TEXT        NOT NULL,
    embedding     vector(1536),
    search_vector TSVECTOR GENERATED ALWAYS AS (
        to_tsvector('english', coalesce(content, ''))
    ) STORED,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Vector index: HNSW for approximate nearest-neighbour at scale
CREATE INDEX chunk_embedding_hnsw_idx
    ON document_chunk
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

-- Full-text index: GIN for fast tsvector lookups
CREATE INDEX chunk_fts_gin_idx
    ON document_chunk
    USING gin (search_vector);

-- B-tree index for tenant isolation (almost every query filters by tenant)
CREATE INDEX chunk_tenant_idx ON document_chunk (tenant_id);

The GENERATED ALWAYS AS ... STORED column keeps the tsvector in sync with content without application code. You write the content; Postgres maintains the index. Choose your dictionary (english, german, simple, etc.) based on the primary language of your content — or maintain multiple tsvector columns for multilingual corpora.

The Hybrid Query

The core SQL merges two CTEs and applies RRF in a single query. Postgres evaluates both retrieval paths in one round trip.

WITH vector_results AS (
    SELECT
        id,
        ROW_NUMBER() OVER (ORDER BY embedding <=> $1::vector) AS rank
    FROM document_chunk
    WHERE tenant_id = $2
    ORDER BY embedding <=> $1::vector
    LIMIT 60
),
fts_results AS (
    SELECT
        id,
        ROW_NUMBER() OVER (
            ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', $3)) DESC
        ) AS rank
    FROM document_chunk
    WHERE tenant_id = $2
      AND search_vector @@ websearch_to_tsquery('english', $3)
    ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', $3)) DESC
    LIMIT 60
),
rrf AS (
    SELECT
        COALESCE(v.id, f.id) AS id,
        COALESCE(1.0 / (60 + v.rank), 0) +
        COALESCE(1.0 / (60 + f.rank), 0) AS rrf_score
    FROM vector_results v
    FULL OUTER JOIN fts_results f ON v.id = f.id
)
SELECT
    dc.id,
    dc.content,
    dc.document_id,
    rrf.rrf_score
FROM rrf
JOIN document_chunk dc ON dc.id = rrf.id
ORDER BY rrf.rrf_score DESC
LIMIT 10;

$1 is the query embedding (a float[] cast to vector), $2 is the tenant UUID, and $3 is the raw query string for websearch_to_tsquery. The FULL OUTER JOIN ensures results that appear in only one system still surface — they contribute one RRF term rather than two, which naturally ranks them below results present in both.

Symfony and Doctrine Integration

Doctrine's native query support makes it straightforward to wrap this in a repository method. The key is passing the embedding as a native parameter and bypassing DQL for the hybrid query, which uses Postgres-specific syntax.

// src/Repository/DocumentChunkRepository.php

class DocumentChunkRepository extends ServiceEntityRepository
{
    public function __construct(
        ManagerRegistry $registry,
        private readonly EmbeddingClient $embeddings,
    ) {
        parent::__construct($registry, DocumentChunk::class);
    }

    /**
     * @return array<array{id: int, content: string, document_id: string, rrf_score: float}>
     */
    public function hybridSearch(string $tenantId, string $query, int $limit = 10): array
    {
        $embedding = $this->embeddings->embed($query);
        $embeddingLiteral = '[' . implode(',', $embedding) . ']';

        $sql = <<<SQL
            WITH vector_results AS (
                SELECT id,
                       ROW_NUMBER() OVER (ORDER BY embedding <=> :embedding::vector) AS rank
                FROM document_chunk
                WHERE tenant_id = :tenant
                ORDER BY embedding <=> :embedding::vector
                LIMIT 60
            ),
            fts_results AS (
                SELECT id,
                       ROW_NUMBER() OVER (
                           ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', :query)) DESC
                       ) AS rank
                FROM document_chunk
                WHERE tenant_id = :tenant
                  AND search_vector @@ websearch_to_tsquery('english', :query)
                ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', :query)) DESC
                LIMIT 60
            ),
            rrf AS (
                SELECT COALESCE(v.id, f.id) AS id,
                       COALESCE(1.0 / (60 + v.rank), 0) +
                       COALESCE(1.0 / (60 + f.rank), 0) AS rrf_score
                FROM vector_results v
                FULL OUTER JOIN fts_results f ON v.id = f.id
            )
            SELECT dc.id, dc.content, dc.document_id, rrf.rrf_score
            FROM rrf
            JOIN document_chunk dc ON dc.id = rrf.id
            ORDER BY rrf.rrf_score DESC
            LIMIT :limit
        SQL;

        $conn = $this->getEntityManager()->getConnection();
        $result = $conn->executeQuery($sql, [
            'embedding' => $embeddingLiteral,
            'tenant'    => $tenantId,
            'query'     => $query,
            'limit'     => $limit,
        ]);

        return $result->fetchAllAssociative();
    }
}

Keep embedding generation outside the transaction boundary. A slow model API call inside a database transaction is a reliable way to exhaust your connection pool under load. Embed first, then search.

Index Tuning for Sub-100ms p95

With the right indexes in place, a hybrid search on a few million rows stays well under 100ms p95 on standard hardware. The variables that shift this are HNSW construction parameters, the ef_search runtime parameter, and the LIMIT passed to each CTE.

HNSW m and ef_construction. Higher m (the number of connections per node) improves recall at the cost of memory and slower index builds. For most document corpora, m = 16 with ef_construction = 64 is the right starting point. Raising ef_construction to 128 improves recall by a few percent; raising it to 256 rarely justifies the index build time.

ef_search at query time. This controls how many candidates the HNSW graph explores per query. The default is 40. Setting it higher improves recall but increases latency. Add it as a session-level setting before your hybrid queries if you have strict recall requirements and a generous latency budget.

SET hnsw.ef_search = 80;

CTE LIMIT. Both CTEs retrieve 60 candidates before fusion. Increasing this improves recall — you are less likely to miss a relevant document that ranks just outside the top 60 in one system — at the cost of more rows in the RRF join. 60 is the right default; 100 is reasonable if recall is critical and your corpus is large.

ANALYZE your tenant column. Because most queries include WHERE tenant_id = $2, Postgres needs accurate statistics on that column to choose the right plan. Running ANALYZE document_chunk after the first major data load and after large bulk inserts keeps the planner honest and prevents it from defaulting to sequential scans at the wrong moment.

Evaluating Search Quality

Before declaring the system production-ready, instrument it. A hybrid search you cannot measure is a hybrid search you cannot improve.

The minimum evaluation harness is a set of golden query-document pairs: queries you know the correct answers to. Run your hybrid search against each, record the rank of the expected result, and report mean reciprocal rank (MRR) and recall@10. Track these in CI. Any change to the embedding model, the tokenisation dictionary, or the RRF k constant should be validated against this test set before reaching production.

A/B test embedding models against each other with the same harness. Moving from text-embedding-3-small to text-embedding-3-large sometimes improves MRR by 15%; sometimes it makes no difference. The cost difference is real and the benefit depends entirely on your corpus. Measure both before committing to the more expensive model.

Also instrument at the application layer: log query strings, result counts from each CTE leg, and final RRF ranks. Queries where the vector leg returns zero results (because the corpus has no semantically similar content) or the FTS leg returns zero results (because the query is conversational and matches nothing) are the most important cases to catch early. In both scenarios the hybrid query degrades to a single-system query, which is still correct behaviour — but knowing it is happening tells you where to invest in better content coverage.

When to Escalate Beyond Postgres

Hybrid search in Postgres covers most B2B SaaS workloads well. The thresholds where moving to a dedicated system becomes rational are narrower than vendor marketing suggests.

Move to Elasticsearch if you need faceted search with complex aggregations across millions of distinct values, streaming document updates at thousands of writes per second, or language-specific analysers and token filters well beyond what Postgres dictionaries provide. Elasticsearch is a significantly better search engine for content-heavy products with editorial workflows or e-commerce catalogues.

Move to a dedicated vector database if your corpus exceeds 50 to 100 million chunks, if you need multi-vector retrieval (ColBERT-style late interaction), or if query rates push into hundreds of concurrent vector searches per second on a single primary. These are real thresholds — above them, pgvector's HNSW memory footprint and single-writer model become genuine bottlenecks. Below them, which is where most B2B SaaS sits for most of its lifetime, the operational simplicity of doing everything in Postgres is worth more than any feature a dedicated system offers.

The Amsterdam team who added Elasticsearch after the fact would have saved six months of dual-system complexity with this pattern. Postgres was already the right tool. They just needed both indexes from the start.

Building This Into an Existing Product

Adding hybrid search to a working Symfony application is a one-sprint project if the schema is clean and the embedding pipeline is already in place. The bigger investment is the evaluation harness: defining golden query sets, running them in CI, and building the muscle to treat search quality as a metric rather than a feeling.

If your team is at that stage — past the prototype, approaching the scale where search quality starts affecting retention — and wants a second opinion on the architecture before committing to an approach, hello@wolf-tech.io is the right starting point. We work with SaaS teams building search into existing Symfony and Next.js products, and we can help you get the architecture right before the corpus grows to the point where changes are expensive. More on the kinds of problems we work on at wolf-tech.io.