I have two tables
table1:
column1: varchar(20) column2: varchar(20) column3: varchar(20)
table2:
column1: varchar(20) column2: varchar(20) column3: varchar(20) <- empty
column1 and column2 both have a separate Fulltext index in table1
both tables hold 20 million rows
I need to fill column3
of table2
by matching column1
& column2
from table2
to column1
& column2
from table1
, then take the value in column3
from table1
and put it into column3
of table2
. column1
& column2
might not match exactly, so the query I use for this is:
UPDATE table1, table2
SET table2.column3 = table1.column3
WHERE table2.column1 LIKE table1.'%column1%' AND
table2.column2 LIKE table1.'%column2%';
This query never finishes. I let it run for 2 weeks and still didn't produce any result. It utilized one CPU core 100%, had little SSD IO and apparently needs to be optimized somehow.
EDIT
You can increase the threads in config (InnoDB). For the Update itself I recommend to first create a temp_table and then copy to db2
Why do you use like operator if you do not use wild card characters? Replace them with =. Also, do you have multi-column index on the 3 columns in the where criteria in each of the tables?
The query you wrote will perform 20m x 20m lookups (for each row in table 1 look up all rows in table 2). You can't write whatever in and expect it to work if you have an SSD or a good CPU. If you arrived at this point, it's time to think before you start writing SQL. What it is that you need to do, what are the tools you have at your disposal and what's the middle part that you don't know - those are the questions you need to answer every time before you issue 400 billion lookup query.
That is the scenario I am facing though. The basic "update this, where that matches" query apparently doesn't apply here. So I am trying to figure out a way to a more advanced solution.
table1:
+---------+---------+-------------+---------+---------+---------+
| column1 | column2 | column3 | column4 | column5 | columnN |
+---------+---------+-------------+---------+---------+---------+
| John | Doe_ | employee001 | xyz | 12345 | ... |
| Jim | Doe | employee002 | abc | 67890 | ... |
+---------+---------+-------------+---------+---------+---------+
table2:
+---------+---------+---------+
| column1 | column2 | column3 |
+---------+---------+---------+
| John | Doe | |
| Jim | Doe | |
+---------+---------+---------+
Here, a LIKE query would fill both rows of table 2, if it would match "Doe_" for "Doe". But by writing this down, I just realized that a LIKE query is no option here because the variations wouldn't constrain to a suffix of column2 in table 1, rather various possible likes would be required (leading AND trailing variants for both columns in both tables). This in turn would multiply the number of required matches. So let's forget about the LIKE and concentrate on exact matching only.
FULLTEXT and LIKE have nothing to do with each other.
"Might not match exactly" -- You will need more limitations on this non-restriction. Else, any attempt at a query will continue to take weeks.
t2.c1 LIKE CONCAT('%', t1.c1, '%') requires checking ever row of t1 against every row of t2; that's 400 trillion tests. No hardware can do that in a reasonable length of time.
FULLTEXT works with "words". If your c1 and c2 are strings of words, then there is some hope to use FULLTEXT. FULLTEXT is much faster than LIKE because it has an index structure based on words.
However, even FULLTEXT is no where near the speed of t2.c1 = t1.c1. Still, that would need a composite INDEX(c1, c2) Then it would be a full table scan (20M rows) of one table, plus 20M probes via a BTree index into the other table. This is like 40M operations -- a lot better than 400T for LIKE.
In order to proceed, please think through your definition of "Might not match exactly" and present the best you can live with.
Ok, since I decided to drop the LIKE requirement, what exactly do you propose to use as index? I read your post like this:
ALTER TABLE `table1` ADD FULLTEXT INDEX `indexname1` (`column1`, `column2`);
ALTER TABLE `table2` ADD FULLTEXT INDEX `indexname2` (`column1`, `column2`);
UPDATE `table1`, `table2`
SET `table2`.`column3` = `table1`.`column3 `
WHERE CONCAT(`table1`.`column1`, `table1`.`column2`) = CONCAT(`table2`.`column1`, `table2`.`column2`);
Is this correct?
Two followup questions though:
Is the update in your oppinion as fast, faster or slower as creating a new table, i.e.:
CREATE TABLE merged
AS
SELECT table1
.column1
, table1
.column2
, table1
.column3
FROM table1
, table2
WHERE CONCAT(table1
.column1
, table1
.column2
) = CONCAT(table2
.column1
, table2
.column2
);
Would the indexes and / or the matching be case sensitive? If yes, can adapt the query without having to change column1 & column2 to all upper case (or all lower case)?
FULLTEXT
and LIKE
have nothing to do with each other.
"Might not match exactly" -- You will need more limitations on this non-restriction. Else, any attempt at a query will continue to take weeks.
t2.c1 LIKE CONCAT('%', t1.c1, '%')
requires checking ever row of t1 against every row of t2; that's 400 trillion tests. No hardware can do that in a reasonable length of time.
FULLTEXT
works with "words". If your c1 and c2 are strings of words, then there is some hope to use FULLTEXT
. FULLTEXT
is much faster than LIKE
because it has an index structure based on words.
However, even FULLTEXT
is no where near the speed of t2.c1 = t1.c1
. Still, that would need a composite INDEX(c1, c2)
Then it would be a full table scan (20M rows) of one table, plus 20M probes via a BTree index into the other table. This is like 40M operations -- a lot better than 400T for LIKE
.
In order to proceed, please think through your definition of "Might not match exactly" and present the best you can live with.
Edit
WHERE CONCAT(t1.c1, t1.c2) = CONCAT(t2.c1, t2.c2)
is a lot worse than saying WHERE t1.c1=t2.c2 AND t1.c2 = t2.c2
. The latter will run fast with INDEX(c1,c2)
.