sqlpostgresqlperformance

Speed up UPDATE from another table


I have the following code:

UPDATE tableOne
SET columnOne = CASE
                 WHEN tableOne.columnTwo LIKE '%-02-%' OR tableOne.columnTwo LIKE '%-03-%' OR
                      tableOne.columnTwo LIKE '%-04-%' OR
                      tableOne.columnTwo LIKE '%-05-%' OR
                      tableOne.columnTwo LIKE '%-06-%' OR
                      tableOne.columnTwo LIKE '%-07-%' OR tableOne.columnTwo LIKE '%-08-%' OR
                      tableOne.columnTwo LIKE '%-09-%'
                         THEN tableTwo.columnOne :: text
                 ELSE tableOne.columnOne
    END

FROM tableTwo
WHERE tableTwo.tableId = tableOne.tableId

I have two tables. tableOne consists of 100 millions of rows (and 40 columns) and tableTwo consists of 90 millions of rows. Above query is already in progress for more than 2 days. I am not sure it will ever finish. Is there a way to optimize the query?

If helpful LIKE does the following: Checks if the string (e.g. 2018-06-30 08:20:17) has listed month. If yes, pick value from tableTwo (and CAST it to type text), else keep self value (already type text).


Solution

  • Move the case condition to the where clause:

    UPDATE tableOne
        SET columnOne = tableTwo.columnOne::text
    FROM tableTwo
    WHERE tableTwo.tableId = tableOne.tableId AND
          tableOne.columnTwo ~ '-0[2-9]-' and
          tableOne.columnOne is distinct from tableTwo.columnOne::text;
    

    Regular expressions are not really that much faster than a bunch of likes. The win here is in not updating rows that don't need to be updated. If the format of tableOne.columnTwo is a known format, you could use substring operations instead.