PostgreSQL Query Optimization: A Practical Guide
After spending countless hours optimizing database queries in production, I’ve learned that PostgreSQL performance isn’t magic—it’s about understanding how the database thinks. Let me share practical techniques that have helped me turn 30-second queries into sub-second responses.
Understanding EXPLAIN
The EXPLAIN command is your best friend for query optimization. It shows you how PostgreSQL plans to execute your query:
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2016-01-01'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 10;
Output breakdown:
Limit (cost=1234.56..1234.58 rows=10 width=48) (actual time=45.123..45.125 rows=10 loops=1)
-> Sort (cost=1234.56..1289.12 rows=21824 width=48) (actual time=45.121..45.122 rows=10 loops=1)
Sort Key: (count(o.id)) DESC
Sort Method: top-N heapsort Memory: 25kB
-> HashAggregate (cost=789.45..890.23 rows=21824 width=48) (actual time=42.345..43.678 rows=15000 loops=1)
Group Key: u.id
-> Hash Left Join (cost=234.12..678.90 rows=25000 width=40) (actual time=12.345..35.678 rows=25000 loops=1)
Key metrics to watch:
- cost: Estimated query cost (not real time)
- rows: Estimated number of rows
- actual time: Real execution time in milliseconds
- loops: Number of times a node was executed
Index Basics
Indexes are crucial for query performance. Here’s when to use each type:
B-tree Index (Default)
Best for equality and range queries:
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created_at ON orders(created_at);
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
Partial Index
Index only rows that match a condition:
-- Only index active users
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';
-- Only index recent orders
CREATE INDEX idx_recent_orders ON orders(created_at)
WHERE created_at > '2015-01-01';
GIN Index
For full-text search and JSONB:
-- Full-text search
CREATE INDEX idx_posts_content_gin ON posts USING GIN(to_tsvector('english', content));
-- JSONB columns
CREATE INDEX idx_user_preferences ON users USING GIN(preferences);
GiST Index
For geometric and full-text search:
CREATE INDEX idx_locations ON stores USING GIST(location);
Common Query Anti-patterns
1. N+1 Query Problem
Bad:
-- First query
SELECT * FROM posts WHERE author_id = 123;
-- Then for each post (N queries)
SELECT * FROM comments WHERE post_id = ?;
Good:
-- Single query with JOIN
SELECT p.*, c.*
FROM posts p
LEFT JOIN comments c ON p.id = c.post_id
WHERE p.author_id = 123;
2. SELECT * Instead of Specific Columns
Bad:
SELECT * FROM users WHERE email = 'john@example.com';
Good:
SELECT id, name, email FROM users WHERE email = 'john@example.com';
This allows PostgreSQL to use covering indexes and reduces I/O.
3. Function Calls in WHERE Clause
Bad:
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
This prevents index usage. Good:
-- Create functional index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Or store data in lowercase
SELECT * FROM users WHERE email = lower('john@example.com');
4. OR Conditions Across Different Columns
Bad:
SELECT * FROM orders
WHERE user_id = 123 OR created_at > '2016-01-01';
Good:
-- Use UNION when indexes exist on both columns
SELECT * FROM orders WHERE user_id = 123
UNION
SELECT * FROM orders WHERE created_at > '2016-01-01';
Optimizing JOINs
Use Appropriate JOIN Types
-- INNER JOIN: Only matching rows
SELECT o.*, u.name
FROM orders o
INNER JOIN users u ON o.user_id = u.id;
-- LEFT JOIN: All rows from left table
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id;
-- EXISTS: More efficient than IN for subqueries
SELECT * FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.user_id = u.id AND o.status = 'completed'
);
Index Foreign Keys
Always index foreign key columns:
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);
Window Functions for Complex Queries
Window functions can replace inefficient subqueries:
Bad:
SELECT u.*,
(SELECT COUNT(*) FROM orders WHERE user_id = u.id) as total_orders,
(SELECT SUM(amount) FROM orders WHERE user_id = u.id) as total_spent
FROM users u;
Good:
SELECT u.*,
COUNT(o.id) OVER (PARTITION BY u.id) as total_orders,
SUM(o.amount) OVER (PARTITION BY u.id) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
Analyzing Query Performance
Check Index Usage
-- See which indexes are being used
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
Find Slow Queries
Enable slow query logging in postgresql.conf:
log_min_duration_statement = 1000 # Log queries taking > 1 second
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
log_statement = 'all'
Table Statistics
Keep statistics up to date:
-- Manual analyze
ANALYZE users;
-- Vacuum and analyze
VACUUM ANALYZE orders;
-- Auto-vacuum settings in postgresql.conf
autovacuum = on
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.1
Practical Optimization Example
Let’s optimize a real-world query:
Original (30 seconds):
SELECT
u.name,
u.email,
COUNT(o.id) as order_count,
SUM(o.total) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2015-01-01'
AND (u.subscription_status = 'active' OR o.status = 'completed')
GROUP BY u.id, u.name, u.email
HAVING COUNT(o.id) > 10
ORDER BY total_spent DESC;
Problems identified by EXPLAIN:
- No index on
users.created_at - OR condition prevents index usage
- Sequential scan on large table
Optimized (0.3 seconds):
-- Add indexes
CREATE INDEX idx_users_created_at ON users(created_at);
CREATE INDEX idx_users_subscription_status ON users(subscription_status);
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Rewrite query
WITH active_users AS (
SELECT id, name, email
FROM users
WHERE created_at > '2015-01-01'
AND subscription_status = 'active'
),
completed_orders AS (
SELECT user_id, COUNT(*) as order_count, SUM(total) as total_spent
FROM orders
WHERE status = 'completed'
GROUP BY user_id
HAVING COUNT(*) > 10
)
SELECT
au.name,
au.email,
COALESCE(co.order_count, 0) as order_count,
COALESCE(co.total_spent, 0) as total_spent
FROM active_users au
LEFT JOIN completed_orders co ON au.id = co.user_id
ORDER BY total_spent DESC;
Configuration Tuning
Key PostgreSQL parameters for performance:
# Memory Settings
shared_buffers = 256MB # 25% of RAM
effective_cache_size = 1GB # 50-75% of RAM
work_mem = 16MB # Per operation
maintenance_work_mem = 128MB # For VACUUM, CREATE INDEX
# Query Planning
random_page_cost = 1.1 # Lower for SSD
effective_io_concurrency = 200 # Higher for SSD
# WAL Settings
wal_buffers = 16MB
checkpoint_completion_target = 0.9
Monitoring Tools
Use these tools to monitor performance:
-- Current active queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY duration DESC;
-- Table sizes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
-- Index hit ratio (should be > 95%)
SELECT
sum(idx_blks_hit) / (sum(idx_blks_hit) + sum(idx_blks_read)) AS index_hit_ratio
FROM pg_statio_user_indexes;
Conclusion
PostgreSQL query optimization is an iterative process:
- Identify slow queries using EXPLAIN ANALYZE
- Add appropriate indexes
- Rewrite queries to use indexes effectively
- Monitor and tune configuration
- Keep statistics up to date
The difference between a slow and fast query often comes down to proper indexing and understanding how PostgreSQL’s query planner works. Start with EXPLAIN, be patient, and the performance improvements will come.
Performance tips are based on PostgreSQL 9.5 features available in early 2016.