Today I learned about pg_textsearch, a new PostgreSQL extension by Timescale that brings BM25 relevance-ranked full-text search to Postgres.
BM25 (Best Matching 25) is the industry-standard ranking algorithm used by search engines like Elasticsearch and Lucene. Unlike simple term frequency counting, BM25 provides smarter relevance scoring through:
The formula uses two tunable parameters:
k1 (default 1.2): Controls term frequency saturationb (default 0.75): Controls length normalization (0 = none, 1 = full)-- Create a BM25 index
CREATE INDEX docs_idx ON documents USING bm25(content)
WITH (text_config='english');
-- Search with the <@> operator (returns negative scores—lower is better)
SELECT title, content <@> 'database search' as score
FROM documents
ORDER BY content <@> 'database search'
LIMIT 10;
| Aspect | PostgreSQL FTS | pg_textsearch (BM25) |
|---|---|---|
| Ranking | Basic term frequency | Probabilistic with IDF |
| Operator | @@ with tsquery |
<@> |
| Boolean queries | Rich (&, \|, !, <->) |
Simple (implicit AND) |
| Phrase search | Yes | No |
| Highlighting | ts_headline() |
No |
| Relevance quality | Basic | Industry-standard |
| Maturity | Battle-tested | Prerelease (v0.1.x) |
Use PostgreSQL FTS when you need:
(postgresql | mysql) & !oracle)'full text search')data:*)Use pg_textsearch when you need:
You can combine both—use FTS for filtering and BM25 for ranking:
SELECT title, body <@> 'database performance' as score
FROM articles
WHERE body_tsv @@ to_tsquery('english', 'database & !mysql') -- FTS filter
ORDER BY body <@> 'database performance' -- BM25 ranking
LIMIT 10;
Pure vector search might miss exact matches; pure keyword search misses synonyms. Hybrid gets the best of both — a search for “ML algorithms” would find documents about “machine learning techniques” (semantic) and those with the exact phrase (keyword).
This article outlines with code sample how to combine BM25 with Vector Search in PostgreSQL.
<@> operator returns negative scores (more negative = more relevant)