Query Optimization
Choosing indexes to improve your query performance.
When working with Postgres, or any relational database, indexing is key to improving query performance. Aligning indexes with common query patterns can speed up data retrieval by an order of magnitude.
This guide is intended to:
- help identify parts of a query that have the potential to be improved by indexes
- introduce tooling to help identify useful indexes
This is not a comprehensive resource, but rather a helpful starting point for your optimization journey.
If you're new to query optimization, you may be interested in index_advisor
, our tool for automatically detecting indexes that improve performance on a given query.
Example query
Consider the following example query that retrieves customer names and purchase dates from two tables:
_10select_10 a.name,_10 b.date_of_purchase_10from_10 customers as a_10 join orders as b on a.id = b.customer_id_10where a.sign_up_date > '2023-01-01' and b.status = 'shipped'_10order by b.date_of_purchase_10limit 10;
In this query, there are several parts that indexes could likely help in optimizing the performance:
where
Clause:
The where
clause filters rows based on certain conditions, and indexing the columns involved can improve this process:
a.sign_up_date
: If filtering bysign_up_date
is common, indexing this column can speed up the query.b.status
: Indexing the status may be beneficial if the column has diverse values.
_10create index idx_customers_sign_up_date on customers (sign_up_date);_10_10create index idx_orders_status on orders (status);
join
Columns
Indexes on the columns used for joining tables can help Postgres avoid scanning tables in their entirety when connecting tables.
- Indexing
a.id
andb.customer_id
would likely improve the performance of the join in this query. - Note that if
a.id
is the primary key of thecustomers
table it is already indexed
_10create index idx_orders_customer_id on orders (customer_id);
order by
Clause
Sorting can also be optimized by indexing:
- An index on
b.date_of_purchase
can improve the sorting process, and is particularly beneficial when a subset of rows is being returned with alimit
clause.
_10create index idx_orders_date_of_purchase on orders (date_of_purchase);
Key concepts
Here are some concepts and tools to keep in mind to help you identify the best index for the job, and measure the impact that your index had:
Analyze the query plan
Use the explain
command to understand the query's execution. Look for slow parts, such as Sequential Scans or high cost numbers. If creating an index does not reduce the cost of the query plan, remove it.
For example:
_10explain select * from customers where sign_up_date > 25;
Use appropriate index types
Postgres offers various index types like B-tree, Hash, GIN, etc. Select the type that best suits your data and query pattern. Using the right index type can make a significant difference. For example, using a BRIN index on a field that always increases and lives within a table that updates infrequently - like created_at
on an orders
table - routinely results in indexes that are +10x smaller than the equivalent default B-tree index. That translates into better scalability.
_10create index idx_orders_created_at ON customers using brin(created_at);
Partial indexes
For queries that frequently target a subset of data, a partial index could be faster and smaller than indexing the entire column. A partial index contains a where
clause to filter the values included in the index. Note that a query's where
clause must match the index for it to be used.
_10create index idx_orders_status on orders (status)_10where status = 'shipped';
Composite indexes
If filtering or joining on multiple columns, a composite index prevents Postgres from referring to multiple indexes when identifying the relevant rows.
_10create index idx_customers_sign_up_date_priority on customers (sign_up_date, priority);
Over-Indexing
Avoid the urge to index columns you operate on infrequently. While indexes can speed up reads, they also slow down writes, so it's important to balance those factors when making indexing decisions.
Statistics
Postgres maintains a set of statistics about the contents of your tables. Those statistics are used by the query planner to decide when it's is more efficient to use an index vs scanning the entire table. If the collected statistics drift too far from reality, the query planner may make poor decisions. To avoid this risk, you can periodically analyze
tables.
_10analyze customers;
By following this guide, you'll be able to discern where indexes can optimize queries and enhance your Postgres performance. Remember that each database is unique, so always consider the specific context and use case of your queries.