I am trying to summarize some data from two data sources. The tables are following
Table t1
| auto_id | unique_column | value |
|---|---|---|
| 1 | d1 | 10 |
| 2 | d2 | 5 |
| 3 | d3 | 15 |
Table t2
| auto_id | unique_column | value |
|---|---|---|
| null | d3 | 5 |
| null | d4 | 6 |
Expected data in t1 after query:
| auto_id | unique_column | value |
|---|---|---|
| 1 | d1 | 10 |
| 2 | d2 | 5 |
| 3 | d3 | 20 |
| 4 | d4 | 6 |
To solve this I was planning to do the following query
REPLACE INTO t1
SELECT auto_id, unique_column, SUM(value) value FROM
(
SELECT * FROM t1
UNION ALL
SELECT * FROM t2
) GROUP BY unique_column;
Issue with this query
SELECT auto_id, unique_column, SUM(value) value FROM
(
SELECT * FROM t1
UNION ALL
SELECT * FROM t2
) GROUP BY unique_column;
The above query will give us
| auto_id | unique_column | value |
|---|---|---|
| 1 | d1 | 10 |
| 2 | d2 | 5 |
| 3 | d3 | 20 |
| NULL | d4 | 6 |
Which is okay. In the insert query we will get the following result:
| auto_id | unique_column | value |
|---|---|---|
| 1 | d1 | 10 |
| 2 | d2 | 5 |
| 3 | d3 | 20 |
| 4 | d4 | 6 |
But the following query
SELECT auto_id, unique_column, SUM(value) value FROM
(
SELECT * FROM t2
UNION ALL
SELECT * FROM t1
) GROUP BY unique_column;
will give us
| auto_id | unique_column | value |
|---|---|---|
| 1 | d1 | 10 |
| 2 | d2 | 5 |
| NULL | d3 | 20 |
| NULL | d4 | 6 |
So, Insert query will product the following unexpected result
| auto_id | unique_column | value |
|---|---|---|
| 1 | d1 | 10 |
| 2 | d2 | 5 |
| 3 | d3 | 15 |
| 4 | d3 | 20 |
| 5 | d4 | 6 |
Alternative thought was, full outer join but mysql decided not to implement this. What other alternative can be done with mysql?
You were close:
REPLACE INTO t1
SELECT MAX(auto_id), unique_column, SUM(value) value FROM
(
SELECT * FROM t1
UNION ALL
SELECT * FROM t2
) t
GROUP BY unique_column;
MAX(auto_id) will skip NULL fields, which will give us just 3 for 'd3'; for the 'd4' case, auto_id will still be NULL, but (as long as the column is declared as AUTO_INCREMENT NOT NULL), NULL will be replaced by the next number in the sequence.
EDIT for your edit: as long as you properly aggregate auto_id using MAX (or MIN), the only scenario NULL appears is if you have no prior auto_id; it is not sensitive to ordering.