I could not reach any conclusive answers reading some of the existing posts on this topic.
I have certain data at 100 locations the for past 10 years. The table has about 800 million rows. I need to primarily generate yearly statistics for each location. Some times I need to generate monthly variation statistics and hourly variation statistics as well. I'm wondering if I should generate two indexes - one for location and another for year or generate one index on both location and year. My primary key currently is a serial number (Probably I could use location and timestamp as the primary key).
Thanks.
Update:
As @MondKin mentioned in comments certain queries can actually use several indexes on the same relation. For example, query with OR
clauses like a = 123 OR b = 456
(assuming that there are indexes for both columns). In this case postgres would perform bitmap index scans for both indexes, build a union of resulting bitmaps and use it for bitmap heap scan. In certain conditions the same scheme may be used for AND
queries but instead of union there would be an intersection.
Note, that queries like ... WHERE timestamp BETWEEN smth AND smth
will not use the index above while queries like ... WHERE location = 'smth'
or ... WHERE location = 'smth' AND timestamp BETWEEN smth AND smth
will. It's because the first attribute in index is crucial for searching and sorting.
Don't forget to perform
ANALYZE;
after index creation in order to collect statistics.