mysqldatetimemariadbquery-optimizationvirtual-column

MySQL: Optimize left join on formatted date


I'm trying to optimize the speed of this query:

      SELECT t.t_date td, v.visit_date vd
      FROM temp_dates t
      LEFT JOIN visits v ON DATE_FORMAT(v.visit_date, '%Y-%m-%d') = t.t_date
      ORDER BY t.t_date

v.visit_date is of type DATETIME and t.t_date is a string of format '%Y-%m-%d'. Simply creating an index on v.visitdate didn't improve the speed. Therefore I intended to try the solution @oysteing gave here: How to optimize mysql group by with DATE_FORMAT I successfully created a virtual column by this SQL ALTER TABLE visits ADD COLUMN datestr varchar(10) AS (DATE_FORMAT(visit_date, '%Y-%m-%d')) VIRTUAL; However when I try to create an index on this column by CREATE INDEX idx_visit_date on visits(datestr) I get this error:

#1901 - Function or expression 'date_format()' cannot be used in the GENERATED ALWAYS AS clause of datestr

What am I doing wrong? My DB is Maria DB 10.4.8

Best regards - Ulrich


Solution

  • date_format() cannot be used for persistent generated columns either. And in an index it cannot be just virtual, it has to be persisted.

    I could not find an explicit statement in the manual, but I believe this is due to the fact that the output of date_format() can depend on the locale and isn't strictly deterministic therefore.

    Instead of date_format() you can build the string using deterministic functions such as concat(), year(), month(), day() and lpad().

    ...
    datestr varchar(10) AS (concat(year(visit_date),
                                   '-',
                                   lpad(month(visit_date), 2, '0'),
                                   '-',
                                   lpad(day(visit_date), 2, '0')))
    ...
    

    But as I already mentioned in a comment, you're fixing the wrong end. Dates/times should never be stored as strings. So you should rather promote temp_dates.t_date to a date and use date() to extract the date portion of visit_date in the generated, indexed column

    ...
    visit_date_date date AS (date(visit_date))
    ...
    

    And you might also want to try to also index temp_dates.t_date.