I have a table, say, Product
(Id, Name
):
Id | Name |
---|---|
1 | 'one' |
2 | 'two' |
3 | 'three' |
I have a new version of data that is needed to update the table, with possible duplicate values.
For example:
Name |
---|
'one' |
'two' |
'two' |
'two' |
'four' |
I need the result table to consist of as many rows of each "Name" as there are in the second table, i.e.
Id | Name |
---|---|
1 | 'one' |
2 | 'two' |
3 | 'three' |
4 | 'two' |
5 | 'two' |
6 | 'four' |
So, if the value, like 'one' already exists, we don't need to insert a duplicate, but if there are more rows of the same value in the second table, like 'two' or 'four', we need to insert respectively 2 and 1 more values into the table.
How do I do it using SQL?
I've tried answers that suggest inserting into the table without duplicates, which is not my task.
First I set up som tables for the testing:
create temp table Product(id integer primary key generated always as identity, name text);
insert into Product (name) values ('one'),('two'),('three');
select * from Product;
create temp table newdata(name text);
insert into newdata values ('one'),('two'),('two'),('two'),('four');
select * from newdata;
Then I use the analytical function "row_number()" which creates row numbers on the result. That is just a way of counting how many instances of each name is in your list
select name,row_number() over (partition by name order by name) rn from newdata;
+------+----+
| name | rn |
+------+----+
| four | 1 |
| one | 1 |
| two | 1 |
| two | 2 |
| two | 3 |
+------+----+
For product:
select name,row_number() over (partition by name order by name) rn from product
+-------+----+
| name | rn |
+-------+----+
| one | 1 |
| three | 1 |
| two | 1 |
+-------+----+
Finding the differences between the two tells me which names I am missing:
select name,row_number() over (partition by name order by name) rn from newdata
except
select name,row_number() over (partition by name order by name) rn from product;
+------+----+
| name | rn |
+------+----+
| two | 3 |
| four | 1 |
| two | 2 |
+------+----+
Then I just need to insert the missing names:
insert into product (name)
select name from (
select name,row_number() over (partition by name order by name) rn from newdata
except
select name,row_number() over (partition by name order by name) rn from product
) a;
select * from product
+----+-------+
| id | name |
+----+-------+
| 1 | one |
| 2 | two |
| 3 | three |
| 4 | two |
| 5 | four |
| 6 | two |
+----+-------+