til

BM25 Ranking in PostgreSQL with pg_textsearch

Today I learned about pg_textsearch, a new PostgreSQL extension by Timescale that brings BM25 relevance-ranked full-text search to Postgres.

What is BM25?

BM25 (Best Matching 25) is the industry-standard ranking algorithm used by search engines like Elasticsearch and Lucene. Unlike simple term frequency counting, BM25 provides smarter relevance scoring through:

The formula uses two tunable parameters:

Basic Usage

-- Create a BM25 index
CREATE INDEX docs_idx ON documents USING bm25(content)
  WITH (text_config='english');

-- Search with the <@> operator (returns negative scores—lower is better)
SELECT title, content <@> 'database search' as score
FROM documents
ORDER BY content <@> 'database search'
LIMIT 10;

pg_textsearch vs PostgreSQL Built-in FTS

Aspect PostgreSQL FTS pg_textsearch (BM25)
Ranking Basic term frequency Probabilistic with IDF
Operator @@ with tsquery <@>
Boolean queries Rich (&, \|, !, <->) Simple (implicit AND)
Phrase search Yes No
Highlighting ts_headline() No
Relevance quality Basic Industry-standard
Maturity Battle-tested Prerelease (v0.1.x)

When to Use Each

Use PostgreSQL FTS when you need:

Use pg_textsearch when you need:

Hybrid Approach

You can combine both—use FTS for filtering and BM25 for ranking:

SELECT title, body <@> 'database performance' as score
FROM articles
WHERE body_tsv @@ to_tsquery('english', 'database & !mysql')  -- FTS filter
ORDER BY body <@> 'database performance'  -- BM25 ranking
LIMIT 10;

Pure vector search might miss exact matches; pure keyword search misses synonyms. Hybrid gets the best of both — a search for “ML algorithms” would find documents about “machine learning techniques” (semantic) and those with the exact phrase (keyword).

This article outlines with code sample how to combine BM25 with Vector Search in PostgreSQL.

Key Takeaways

  1. BM25 is what makes modern search engines feel “smart”—it’s not just counting words
  2. pg_textsearch brings this to Postgres without needing Elasticsearch
  3. The <@> operator returns negative scores (more negative = more relevant)
  4. For production search, consider the hybrid approach: FTS for filtering, BM25 for ranking
  5. The extension is still prerelease—not yet recommended for production

Resources