How to pick distribution keys and sort keys for Amazon Redshift?

3 minute read

Content level: Expert

Amazon Redshift ATO automatically does a lot of data storage optimization for your warehouse uses advanced artificial intelligence methods, but you can follow these simple rules to hit the ground running for your tables to get the best query performance from day one.

Amazon Redshift's Automatic table optimization (ATO) is a self-tuning capability that automatically optimizes the design of tables by applying sort and distribution keys intelligently. Furthermore, ATO continuously observes how queries interact with your tables, and can change sort and distribution keys to optimize performance for the cluster's workload.

Also with Amazon Redshift you can specify the Primary Key (PK) and Foreign Key (FK) to signify the relationships between your tables. These relationships often describe how the tables will be joined to each other, and Amazon Redshift leverages that to pick appropriate distribution keys.

Specify PK and FK for Redshift tables

ATO will pick Distribution Style (DS) and Sort Key (SK) automatically for you.

If those do not work then proceed with below, in order

1. Table mostly used in join on single column, and no filters applied

DS:Key on the join column

SK on the join column

2. Table mostly used in join on multi-column, and no filters applied

DS:Key on the highest cardinality join column

SK on highest cardinality join column

3. Table mostly used in join on single column, and column filters applied

DS:Key on the join column

SK on the most frequent column eliminating most rows

4. Table mostly used in join on multi-column, and column filters applied

DS:Key on the highest cardinality join column

SK on the most frequent column eliminating most rows

5. Relatively LARGE table, mostly NOT used in joins, and column filters applied

DS:Even

SK on the most frequent column eliminating most rows

6. Relatively LARGE table, mostly NOT used in joins, and no filters applied

DS:Even

no selection/AUTO

7. Relatively SMALL table, mostly used with column filters applied

DS:All

SK on the most frequent column eliminating most rows

8. Relatively SMALL table, mostly used with no filters applied

DS:All

no selection/AUTO

IMPORTANT

After manually selecting the DK and SK, you should then alter the tables distribution style to auto and sort key auto. This will make the table ATO eligible again and the ATO will use your current DK/SK selection as baseline.

You can apply these table rules to Materialized Views too!

A table with fewer than 5 Million rows is considered a small table, however if a 100 Million rows table is frequently joined with a 10 Billion rows table then it is considered a relatively small table.

Topics

Analytics

Relevant content

Improve COPY ingestion performance for large data loads on Amazon Redshift
EXPERT
MilindOke
published 4 days ago
Best Practices for loading Electronic Health Claims Data to Amazon Redshift
EXPERT
Phil Bates
published a year ago
Optimizing Levenshtein User-Defined Function in Amazon Redshift
EXPERT
Harshida Patel
published a year ago
Redshift Automatic Table Optimization and table swaps
Accepted Answer
Daniel_Y
asked 4 years ago
Recommend for designing tables in Redshift
Rafael
asked a year ago
Redshift Auto Sortkeys in ATO
Nilesh Pisat
asked 5 months ago
How do I choose the right primary key for my Amazon DynamoDB table?
AWS OFFICIALUpdated 2 years ago
What are some of the best practices to create a composite key for my Amazon DynamoDB table?
AWS OFFICIALUpdated 2 years ago
What are the best practices for migrating a RDBMS database to Amazon Redshift using AWS DMS?
AWS OFFICIALUpdated 2 years ago
Why does a table in an Amazon Redshift provisioned cluster consume more or less disk storage space than expected?
AWS OFFICIALUpdated 8 months ago