PostgreSQL Full-Text Search: Implementation Guide
“We need search. Should we add Elasticsearch?”
This question appeared in every project planning meeting for years. Elasticsearch is powerful—also operationally heavy. Another cluster to monitor, another JVM to tune, another sync pipeline to break. For many apps, the answer was hiding in the database we already had.
PostgreSQL’s full-text search won’t replace Elasticsearch for billion-document scale or complex faceting. But for “users need to find articles/products/docs quickly”? It’s remarkably capable. We ran full-text search on 2 million articles with sub-50ms p95 queries—no additional infrastructure.
Here’s the implementation guide from that project: indexing, ranking, highlighting, and the gotchas that don’t appear in the basic tutorials.
How PostgreSQL Full-Text Search Works
PostgreSQL converts text into tsvector—a sorted list of normalized lexemes (word stems) with positions:
SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog');
-- Result: 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2
Notice: “jumps” → “jump”, “lazy” → “lazi”. Stemming and stop word removal happen automatically with the english configuration.
Search queries become tsquery:
SELECT to_tsquery('english', 'fox & dog');
-- Result: 'fox' & 'dog'
The @@ operator matches vectors against queries:
SELECT title, content
FROM articles
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'database & performance');
Simple. The performance trap is calling to_tsvector() on every query without an index. That’s a sequential scan. Don’t do that.
Indexing: GIN Makes It Fast
-- GIN index on computed tsvector
CREATE INDEX articles_content_idx ON articles
USING GIN(to_tsvector('english', content));
SELECT title, content
FROM articles
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'search term');
GIN (Generalized Inverted Index) is the right index type for full-text search. GiST works too but GIN is typically faster for read-heavy search workloads.
Better approach: precompute the tsvector in a generated column:
ALTER TABLE articles
ADD COLUMN content_search_vector tsvector
GENERATED ALWAYS AS (to_tsvector('english',
coalesce(title, '') || ' ' || coalesce(content, '')
)) STORED;
CREATE INDEX articles_search_idx ON articles USING GIN(content_search_vector);
-- Queries use the precomputed column
SELECT title, content
FROM articles
WHERE content_search_vector @@ to_tsquery('english', 'search term');
Generated columns compute once on insert/update, not on every query. Combine title + content into one vector because users don’t care which field matched—they care that it matched.
Ranking: Making Results Feel Relevant
Matching isn’t enough. Users expect the best result first.
ts_rank: Basic Relevance
SELECT
title,
content,
ts_rank(content_search_vector, query) AS rank
FROM articles,
to_tsquery('english', 'postgresql & performance') AS query
WHERE content_search_vector @@ query
ORDER BY rank DESC
LIMIT 20;
ts_rank considers term frequency—documents where terms appear more often rank higher.
ts_rank_cd: Better for Phrases
SELECT
title,
content,
ts_rank_cd(content_search_vector, query) AS rank
FROM articles,
to_tsquery('english', 'postgresql & performance') AS query
WHERE content_search_vector @@ query
ORDER BY rank DESC;
ts_rank_cd (cover density) considers term proximity. “PostgreSQL performance” appearing as a phrase ranks higher than terms scattered across paragraphs. I default to ts_rank_cd for user-facing search.
Query Patterns Users Actually Type
Phrase Search
SELECT title, content
FROM articles
WHERE content_search_vector @@ phraseto_tsquery('english', 'quick brown fox');
phraseto_tsquery requires terms adjacent and in order. Good for exact quotes.
Prefix Matching (Autocomplete)
SELECT title, content
FROM articles
WHERE content_search_vector @@ to_tsquery('english', 'postgre:*');
The :* suffix matches any word starting with “postgre”—postgresql, postgres, etc. Essential for search-as-you-type.
Boolean Operators
-- AND: both terms required
WHERE content_search_vector @@ to_tsquery('english', 'postgresql & tuning');
-- OR: either term
WHERE content_search_vector @@ to_tsquery('english', 'postgresql | mysql');
-- NOT: exclude term
WHERE content_search_vector @@ to_tsquery('english', 'postgresql & !oracle');
In application code, I default user queries to AND (all terms required). OR queries return too many irrelevant results for casual searchers.
Multi-Field Search with Weights
Not all fields are equal. Title matches should rank above matches in footnotes.
ALTER TABLE articles
ADD COLUMN weighted_search_vector tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(content, '')), 'B') ||
setweight(to_tsvector('english', coalesce(tags, '')), 'C')
) STORED;
CREATE INDEX articles_weighted_idx ON articles USING GIN(weighted_search_vector);
SELECT
title,
ts_rank(weighted_search_vector, query) AS rank
FROM articles,
to_tsquery('english', 'kubernetes') AS query
WHERE weighted_search_vector @@ query
ORDER BY rank DESC;
Weights: A (highest) → D (lowest). Title=’A’, content=’B’, tags=’C’ is a sensible default for articles/docs.
Highlighting: Show Users Why It Matched
SELECT
title,
ts_headline(
'english',
content,
query,
'StartSel=<mark>, StopSel=</mark>, MaxFragments=2, MaxWords=35'
) AS snippet
FROM articles,
to_tsquery('english', 'postgresql & index') AS query
WHERE weighted_search_vector @@ query
ORDER BY ts_rank(weighted_search_vector, query) DESC;
ts_headline extracts relevant fragments with search terms highlighted. Configure MaxFragments and MaxWords for your UI—snippets, not full documents.
Application Integration
const { Pool } = require('pg');
class SearchService {
constructor(pool) {
this.pool = pool;
}
async searchArticles(query, limit = 20, offset = 0) {
const searchQuery = this.buildTsQuery(query);
const result = await this.pool.query(`
SELECT
id,
title,
ts_headline('english', content, $1::tsquery,
'StartSel=<mark>, StopSel=</mark>') AS snippet,
ts_rank_cd(weighted_search_vector, $1::tsquery) AS rank
FROM articles
WHERE weighted_search_vector @@ $1::tsquery
AND status = 'published'
ORDER BY rank DESC
LIMIT $2 OFFSET $3
`, [searchQuery, limit, offset]);
return result.rows;
}
buildTsQuery(query) {
// Sanitize and convert user input to tsquery
const terms = query
.trim()
.split(/\s+/)
.filter(term => term.length > 0)
.map(term => term.replace(/[^\w]/g, '')) // Strip special chars
.filter(term => term.length > 0)
.map(term => `${term}:*`); // Prefix match
if (terms.length === 0) return '';
return terms.join(' & ');
}
}
Critical: never pass raw user input to to_tsquery. Characters like &, |, !, : have special meaning. Sanitize or use plainto_tsquery for simple cases:
-- Safer for raw user input
WHERE content_search_vector @@ plainto_tsquery('english', $1)
plainto_tsquery treats input as plain text, converting spaces to AND. Less flexible but injection-safe.
Performance Optimization
Partial Indexes
-- Index only published articles (if most are drafts)
CREATE INDEX articles_published_search_idx
ON articles USING GIN(weighted_search_vector)
WHERE status = 'published';
Smaller index, faster queries, less storage—when your queries always filter on status.
Pagination
SELECT title, ts_rank_cd(weighted_search_vector, query) AS rank
FROM articles, to_tsquery('english', 'search term') AS query
WHERE weighted_search_vector @@ query
ORDER BY rank DESC
LIMIT 20 OFFSET 0;
OFFSET gets slow on deep pages (Postgres still scans skipped rows). For infinite scroll at scale, consider cursor-based pagination or search-after patterns.
When to Reach for Elasticsearch
PostgreSQL FTS is enough when:
- < 10-50M documents
- Simple faceting (use
GROUP BYor JSONB) - Search isn’t your primary product feature
- You want zero additional infrastructure
Reach for Elasticsearch/OpenSearch when:
- Sub-10ms search at billions of docs
- Complex aggregations and faceting
- Fuzzy matching, synonyms, ML ranking
- Dedicated search team to operate it
Troubleshooting
Slow queries: Missing GIN index, or calling to_tsvector() at query time instead of using a stored column. Run EXPLAIN ANALYZE.
Poor relevance: Try ts_rank_cd, add field weights, check if stemming is too aggressive (english stems “running” to “run”—sometimes you want simple config).
No results for valid terms: Check if terms are stop words (“the”, “a”, “is” are ignored). Use plainto_tsquery to debug.
Conclusion
PostgreSQL full-text search is the most underrated feature in the most popular open-source database. GIN indexes, generated tsvector columns, weighted fields, and ts_rank_cd give you search that feels production-grade without operating a separate search cluster.
Start with a generated column combining your searchable fields. Add a GIN index. Use ts_rank_cd for ordering. Add ts_headline for snippets. Sanitize user input before converting to tsquery.
We deleted our Elasticsearch sync pipeline, reduced operational complexity, and search got faster because data wasn’t replicated across systems. For most applications, that’s the right trade. Save Elasticsearch for when you genuinely outgrow Postgres—and you’ll know because ranking and scale will hurt in measurable ways, not because someone said you should use it.
PostgreSQL full-text search from July 2020, covering tsvector, tsquery, and GIN indexes.