PostgreSQL Performance Tuning: A Practical Guide

PostgreSQL is incredibly powerful out of the box, but proper tuning can improve performance by orders of magnitude. This guide covers practical techniques I've used to optimize databases handling billions of rows.

Understanding Query Execution

Before optimizing, you need to understand how PostgreSQL executes queries.

Using EXPLAIN ANALYZE

sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id;

Key metrics to watch:

Actual time: Real execution time
Rows: Number of rows processed
Buffers: Pages read from disk/cache
Loops: How many times an operation runs

Indexing Strategies

B-Tree Indexes (Default)

Best for equality and range queries:

sql
-- Simple index
CREATE INDEX idx_users_email ON users(email);

-- Composite index (order matters!)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);

-- Partial index (for specific conditions)
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';

-- Covering index (includes all needed columns)
CREATE INDEX idx_orders_covering ON orders(user_id, created_at)
INCLUDE (total_amount, status);

When to Use Different Index Types

sql
-- GIN index for array/JSONB containment
CREATE INDEX idx_tags ON articles USING GIN(tags);

-- GiST for geometric/range data
CREATE INDEX idx_location ON stores USING GiST(location);

-- BRIN for naturally ordered data (like timestamps)
CREATE INDEX idx_logs_created ON logs USING BRIN(created_at);

-- Hash for equality-only lookups (rare)
CREATE INDEX idx_lookup ON cache USING HASH(key);

Query Optimization Techniques

1. Avoid SELECT *

sql
-- Bad: fetches all columns
SELECT * FROM users WHERE id = 1;

-- Good: fetch only needed columns
SELECT id, name, email FROM users WHERE id = 1;

2. Use EXISTS Instead of IN for Large Sets

sql
-- Slower for large subqueries
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 1000);

-- Faster: stops at first match
SELECT * FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.user_id = u.id AND o.total > 1000
);

3. Optimize JOINs

sql
-- Ensure join columns are indexed
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Use appropriate join types
-- INNER JOIN when you need matches in both tables
-- LEFT JOIN when you need all rows from left table

4. Pagination Done Right

sql
-- Bad: OFFSET is slow for large values
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000;

-- Good: Keyset pagination
SELECT * FROM posts
WHERE created_at < '2024-01-15 10:30:00'
ORDER BY created_at DESC
LIMIT 20;

Configuration Tuning

Memory Settings

ini
# postgresql.conf

# Shared buffers: 25% of RAM (but usually not more than 8GB)
shared_buffers = 4GB

# Work memory for sorts/hashes (per operation!)
work_mem = 256MB

# Maintenance operations (VACUUM, CREATE INDEX)
maintenance_work_mem = 1GB

# Effective cache size: ~75% of total RAM
effective_cache_size = 12GB

Write Performance

ini
# WAL settings for write-heavy workloads
wal_buffers = 64MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB
min_wal_size = 1GB

# For SSDs
random_page_cost = 1.1
effective_io_concurrency = 200

Connection Pooling

ini
# Increase if using connection pooler
max_connections = 200

# But actually, use PgBouncer or similar
# and keep max_connections lower

Table Maintenance

Regular VACUUM and ANALYZE

sql
-- Update statistics (run regularly)
ANALYZE users;

-- Reclaim space from dead tuples
VACUUM users;

-- Full vacuum (blocks table, use sparingly)
VACUUM FULL users;

-- Automatic maintenance settings
ALTER TABLE users SET (
  autovacuum_vacuum_scale_factor = 0.05,
  autovacuum_analyze_scale_factor = 0.02
);

Table Partitioning

For very large tables, partitioning improves query performance:

sql
-- Create partitioned table
CREATE TABLE logs (
  id BIGSERIAL,
  created_at TIMESTAMPTZ NOT NULL,
  message TEXT
) PARTITION BY RANGE (created_at);

-- Create partitions
CREATE TABLE logs_2024_01 PARTITION OF logs
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE logs_2024_02 PARTITION OF logs
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

Monitoring Queries

Find Slow Queries

sql
-- Enable query logging
-- In postgresql.conf: log_min_duration_statement = 1000

-- Or use pg_stat_statements
CREATE EXTENSION pg_stat_statements;

SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Identify Missing Indexes

sql
SELECT schemaname, relname, seq_scan, seq_tup_read,
       idx_scan, idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > 100
ORDER BY seq_tup_read DESC;

Common Anti-Patterns

Functions on indexed columns: WHERE LOWER(email) = 'test@example.com' won't use index
Implicit type casting: Ensure types match between columns and parameters
Over-indexing: Each index slows down writes
Not using prepared statements: Leads to repeated query planning

Conclusion

PostgreSQL performance tuning is an iterative process. Start with EXPLAIN ANALYZE to understand your queries, add appropriate indexes, tune your configuration based on workload, and maintain your tables regularly. Monitor continuously and adjust as your data grows.

PostgreSQL Performance Tuning: A Practical Guide

Listen to this article

PostgreSQL Performance Tuning: A Practical Guide

Understanding Query Execution

Using EXPLAIN ANALYZE

Indexing Strategies

B-Tree Indexes (Default)

When to Use Different Index Types

Query Optimization Techniques

1. Avoid SELECT *

2. Use EXISTS Instead of IN for Large Sets

3. Optimize JOINs

4. Pagination Done Right

Configuration Tuning

Memory Settings

Write Performance

Connection Pooling

Table Maintenance

Regular VACUUM and ANALYZE

Table Partitioning

Monitoring Queries

Find Slow Queries

Identify Missing Indexes

Common Anti-Patterns

Conclusion

About the Author

MLflow

AI Debate