sqlpostgresqlindexingcompound-index

Compound index on database Log table: (Status, CreatedTime) or (CreatedTime, Status)?


I usually use a compound key (CreatedTime, Status) for my Log table, but I’m reconsidering this design. Since CreatedTime is typically very unique and Status only has 3-5 possible values, it seems that Status might not add much to further filtering after CreatedTime.

Most of my queries involve retrieving logs for a specific time range, optionally filtering or counting by Status. Conceptually, if I were working with a physical log book sorted by time, identifying entries with a specific Status (e.g., "Successful") would be cumbersome. On the other hand, having separate log books for each Status, all sorted by time, could make searching more efficient—though combining and re-sorting results for all Status values might complicate things. Does the database optimize for such scenarios?

I've already asked three different AIs about this, but their answers were vague and contradictory (and even the same AI gives different answers just by asking slightly different), and I can't find much on Google and SO. Could someone confirm whether my intuition here is correct?


Solution

  • Your indexes should be based on how you typically query your data. Since you typically query logs for a given date range, then an index on CreatedTime would be most efficient. I doubt that the secondary kay on Status makes a significant difference unless you query a very large number of logs without your date range any only a few match the status you want. Also, since you most likely do not have multiple logs of different statuses at the exact same time, the sub-index is not helping since it's going to have to scan all index records for the given timeframe anyway.

    Indexing by Status, CreatedTime is not going to be significantly faster than CreatedDate, Status if you want logs of one status, and will be less performance if you force the engine to scan through several statuses and consolidate the results.

    Of secondary importance is how the data is added. Since you almost certainly add logs sequentially, indexing by Status, CreatedTime will be less efficient since you'll be inserting records in the middle quite often, making it harder to add records. Indexing on CreatedTime means you'll almost always be adding records to the end of your table, barring unusual activity like bulk imports of older logs.