I'm trying to determine the best indexes for a table in PostgreSQL. I expect on the order of ~10b rows and ~10TB data.
The table has 5 main columns used for filtering and/or sorting
CREATE TABLE table (
filter_key_1 AS BYTEA, -- filtering
filter_key_2 AS BYTEA, -- filtering
filter_key_3 AS BYTEA, -- filtering
sort_key_1 AS INTEGER, -- filtering & sorting
sort_key_2 AS INTEGER -- filtering & sorting
)
Queries will be:
SELECT * FROM table WHERE filter_key_1 = $1 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_2 = $1 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_3 = $1 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_1 = $1 AND sort_key_1 <= $2 AND sort_key_2 <= $3 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_2 = $1 AND sort_key_1 <= $2 AND sort_key_2 <= $3 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_3 = $1 AND sort_key_1 <= $2 AND sort_key_2 <= $3 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
What are the ideal indexes for the table? How large will they get with ~10b rows? How much will they limit write throughput?
What if I want to add additional queries such as below. Would the indexes from above hold-up?
SELECT * FROM table WHERE filter_key_1 = $1 AND filter_key_2 = $2 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_1 = $1 AND filter_key_2 = $2 AND filter_key_3 = $3 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
-- ...
The workload is heavy read, low write.
Read speed is important. Write speed is less important (can live with up-to 3 seconds per insert)
Since you need to run these queries all the time, you will have to optimize them as much as possible. That would mean
CREATE INDEX ON tab (filter_key_1, sort_key_1, sort_key_2);
CREATE INDEX ON tab (filter_key_2, sort_key_1, sort_key_2);
CREATE INDEX ON tab (filter_key_3, sort_key_1, sort_key_2);
Together, these indexes should be substantially larger than your table.