I am creating a table to register the events from multiple sensors in a MySQL 8.0 database.
Since the data is going to be accessed mainly over time ranges (from one day to a month), I decided to partition it based on its event_datetime
, with an additional index on the sensor_id
Here is the DDL statement:
CREATE TABLE `events_table` (
`event_id` int NOT NULL,
`event_datetime` datetime NOT NULL,
`sensor_id` int NOT NULL,
`zone` int NOT NULL,
`created_at` datetime NOT NULL DEFAULT (utc_timestamp()),
PRIMARY KEY (`event_id`,`created_at`,`event_datetime`),
KEY `ix_events_sensor_id` (`sensor_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
PARTITION BY RANGE (year(`event_datetime`))
SUBPARTITION BY HASH (month(`event_datetime`))
SUBPARTITIONS 12
(PARTITION p2021 VALUES LESS THAN (2022) ENGINE = InnoDB,
PARTITION p2022 VALUES LESS THAN (2023) ENGINE = InnoDB,
PARTITION p2023 VALUES LESS THAN (2024) ENGINE = InnoDB,
PARTITION p2024 VALUES LESS THAN (2025) ENGINE = InnoDB,
PARTITION p2025 VALUES LESS THAN (2026) ENGINE = InnoDB,
PARTITION p2026 VALUES LESS THAN (2027) ENGINE = InnoDB,
PARTITION p2027 VALUES LESS THAN (2028) ENGINE = InnoDB,
PARTITION p2028 VALUES LESS THAN (2029) ENGINE = InnoDB,
PARTITION p2029 VALUES LESS THAN (2030) ENGINE = InnoDB,
PARTITION p2030 VALUES LESS THAN MAXVALUE ENGINE = InnoDB);
However, when making queries over a given month, I found out that multiple partitions and subpartitions could be accessed, which basically rendered the partitioning useless, here is an example:
EXPLAIN SELECT event_id, event_datetime, sensor_id, zone
FROM events_table
WHERE events_datetime BETWEEN "2023-12-01 00:00:00" and "2023-12-01 00:00:00"
AND sensor_id IN (...)
Would output the following in the "partitions" column:
p2023_p2023sp0,p2023_p2023sp1,p2023_p2023sp2,p2023_p2023sp3,p2023_p2023sp4,p2023_p2023sp5,p2023_p2023sp6,p2023_p2023sp7,p2023_p2023sp8,p2023_p2023sp9,p2023_p2023sp10
(note that it is also (I can actually FORCE INDEX if needed)using_where
instead of the index, even if I specify USE INDEX(ix_events_sensor_id)
))
From my understanding, this means that even though I specified the datetime to be in December, it will look into all partitions of 2023 ??
It is even worse if I specify the YEAR and MONTH instead of the BETWEEN statement:
EXPLAIN SELECT event_id, event_datetime, sensor_id, zone
FROM events_table
WHERE YEAR(events_datetime) = 2023 AND MONTH(event_datetime) = 12
AND sensor_id IN (...)
Returns that it will look into ALL subpartitions of ALL partitions.
I'm a bit puzzled here, as I assumed that partitioning for such series of data would allow me to access only a very limited portion of my data and thus make much faster queries, but neither the (sub)partitions nor the index seem to be used the way I expect it.
Am I missing something obvious there ? How can I nudge MySQL into reading only the subpartitions where the relevant data is ? Bonus point if I can also use the index
https://dev.mysql.com/doc/refman/8.0/en/partitioning-pruning.html says:
Pruning can also be applied for tables partitioned on a
DATE
orDATETIME
column when the partitioning expression uses theYEAR()
orTO_DAYS()
function. Pruning can also be applied for such tables when the partitioning expression uses theTO_SECONDS()
function.
The MONTH()
function is not listed as supporting partition pruning.
You could get a better result by using this partitioning strategy:
PARTITION BY RANGE (YEAR(`event_datetime`))
SUBPARTITION BY HASH (TO_DAYS(`event_datetime`))
SUBPARTITIONS 12
When I test EXPLAIN on your query, it shows a much better pruned list of partitions:
mysql> EXPLAIN SELECT event_id, event_datetime, sensor_id, zone
-> FROM events_table
-> WHERE event_datetime BETWEEN "2023-12-01 00:00:00" and "2023-12-01 00:00:00"\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: events_table
partitions: p2023_p2023sp8
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1
filtered: 100.00
Extra: Using where
This shows that TO_DAYS()
supports partition pruning.
Admittedly, this won't give you what you probably want. If you were to query for a range of days in the same month, you expect to still prune to a single subpartition, but since my solution above subpartitions by days instead of months, a range of days in the same month will have to read multiple partitions.
When partitioning was introduced years ago, it was presented as a way to optimize queries. But it has turned out to be much more difficult to use for that purpose. There are good uses for partitioning (e.g. quick deletions of large batches of rows), but in general for query optimization it's easier and more flexible to use indexes.