PostgreSQL Performance Tuning: A Practical Guide
PostgreSQL is incredibly powerful out of the box, but proper tuning can improve performance by orders of magnitude. This guide covers practical techniques I've used to optimize databases handling billions of rows.
Understanding Query Execution
Before optimizing, you need to understand how PostgreSQL executes queries.
Using EXPLAIN ANALYZE
sqlEXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT u.*, COUNT(o.id) as order_count FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE u.created_at > '2024-01-01' GROUP BY u.id;
Key metrics to watch:
- Actual time: Real execution time
- Rows: Number of rows processed
- Buffers: Pages read from disk/cache
- Loops: How many times an operation runs
Indexing Strategies
B-Tree Indexes (Default)
Best for equality and range queries:
sql-- Simple index CREATE INDEX idx_users_email ON users(email); -- Composite index (order matters!) CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC); -- Partial index (for specific conditions) CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending'; -- Covering index (includes all needed columns) CREATE INDEX idx_orders_covering ON orders(user_id, created_at) INCLUDE (total_amount, status);
When to Use Different Index Types
sql-- GIN index for array/JSONB containment CREATE INDEX idx_tags ON articles USING GIN(tags); -- GiST for geometric/range data CREATE INDEX idx_location ON stores USING GiST(location); -- BRIN for naturally ordered data (like timestamps) CREATE INDEX idx_logs_created ON logs USING BRIN(created_at); -- Hash for equality-only lookups (rare) CREATE INDEX idx_lookup ON cache USING HASH(key);
Query Optimization Techniques
1. Avoid SELECT *
sql-- Bad: fetches all columns SELECT * FROM users WHERE id = 1; -- Good: fetch only needed columns SELECT id, name, email FROM users WHERE id = 1;
2. Use EXISTS Instead of IN for Large Sets
sql-- Slower for large subqueries SELECT * FROM users WHERE id IN (SELECT user_id FROM orders WHERE total > 1000); -- Faster: stops at first match SELECT * FROM users u WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.total > 1000 );
3. Optimize JOINs
sql-- Ensure join columns are indexed CREATE INDEX idx_orders_user_id ON orders(user_id); -- Use appropriate join types -- INNER JOIN when you need matches in both tables -- LEFT JOIN when you need all rows from left table
4. Pagination Done Right
sql-- Bad: OFFSET is slow for large values SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000; -- Good: Keyset pagination SELECT * FROM posts WHERE created_at < '2024-01-15 10:30:00' ORDER BY created_at DESC LIMIT 20;
Configuration Tuning
Memory Settings
ini# postgresql.conf # Shared buffers: 25% of RAM (but usually not more than 8GB) shared_buffers = 4GB # Work memory for sorts/hashes (per operation!) work_mem = 256MB # Maintenance operations (VACUUM, CREATE INDEX) maintenance_work_mem = 1GB # Effective cache size: ~75% of total RAM effective_cache_size = 12GB
Write Performance
ini# WAL settings for write-heavy workloads wal_buffers = 64MB checkpoint_completion_target = 0.9 max_wal_size = 4GB min_wal_size = 1GB # For SSDs random_page_cost = 1.1 effective_io_concurrency = 200
Connection Pooling
ini# Increase if using connection pooler max_connections = 200 # But actually, use PgBouncer or similar # and keep max_connections lower
Table Maintenance
Regular VACUUM and ANALYZE
sql-- Update statistics (run regularly) ANALYZE users; -- Reclaim space from dead tuples VACUUM users; -- Full vacuum (blocks table, use sparingly) VACUUM FULL users; -- Automatic maintenance settings ALTER TABLE users SET ( autovacuum_vacuum_scale_factor = 0.05, autovacuum_analyze_scale_factor = 0.02 );
Table Partitioning
For very large tables, partitioning improves query performance:
sql-- Create partitioned table CREATE TABLE logs ( id BIGSERIAL, created_at TIMESTAMPTZ NOT NULL, message TEXT ) PARTITION BY RANGE (created_at); -- Create partitions CREATE TABLE logs_2024_01 PARTITION OF logs FOR VALUES FROM ('2024-01-01') TO ('2024-02-01'); CREATE TABLE logs_2024_02 PARTITION OF logs FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
Monitoring Queries
Find Slow Queries
sql-- Enable query logging -- In postgresql.conf: log_min_duration_statement = 1000 -- Or use pg_stat_statements CREATE EXTENSION pg_stat_statements; SELECT query, calls, mean_exec_time, total_exec_time FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 20;
Identify Missing Indexes
sqlSELECT schemaname, relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch FROM pg_stat_user_tables WHERE seq_scan > 100 ORDER BY seq_tup_read DESC;
Common Anti-Patterns
- Functions on indexed columns:
WHERE LOWER(email) = 'test@example.com'won't use index - Implicit type casting: Ensure types match between columns and parameters
- Over-indexing: Each index slows down writes
- Not using prepared statements: Leads to repeated query planning
Conclusion
PostgreSQL performance tuning is an iterative process. Start with EXPLAIN ANALYZE to understand your queries, add appropriate indexes, tune your configuration based on workload, and maintain your tables regularly. Monitor continuously and adjust as your data grows.