I have three tables:
x (column1,..., x_id)
y (column1, column2, colum3, ..., x_id, y_id)
z (column1, column2, colum3,..., x_id, y_id, z_id)
My final goal: delete duplicates in y table if all of their linked values (based on foreign keys) in z table are also exactly the same. For example, This is just an example.
For more clarification consider this example: y table: my table structure is very similar to this and this table is close to the reality of what I have.
column 1 | column 2 | x_id | y_id |
---|---|---|---|
foo | bar | 1 | 1 |
foo | bar | 1 | 2 |
xx | yy | 2 | 3 |
zz | kk | 2 | 4 |
tt | mm | 2 | 5 |
baz | qux | 3 | 6 |
baz | qux | 3 | 7 |
has two groups with the same column 1 and column 2 and x_id values (the y_ids = (1 and 2) and y_id == (6,7).
Then for these rows, I want to check a second table to see if all the items are the same. meaning they have the same items in the z table. The items of the y_id = 1 and y_id = 2 are the same, so one of them in the y table should be deleted.
{color:red;} column 1 | column 2 | column | x_id | y_id | z_id |
---|---|---|---|---|---|
foo | bar | qux | 1 | 1 | 1 |
foo2 | bar2 | qux2 | 1 | 1 | 2 |
foo | bar | qux | 1 | 2 | 3 |
foo2 | bar2 | qux2 | 1 | 2 | 4 |
baz | foo | qux | 3 | 6 | 5 |
baz2 | foo2 | qux1 | 3 | 6 | 6 |
baz3 | foo3 | qux2 | 3 | 6 | 7 |
baz | foo | qux | 3 | 7 | 8 |
baz2 | foo2 | qux1 | 3 | 7 | 9 |
I want to delete the row with y_id = 1 from the y table ( or y_id = 2, which duplicate row doesn't make a difference). The tables are very big so a fast solution would be great. I know I can join the tables, but I was hoping I could find another solution. A solution that can use the group by IDs instead and just look for the found items there inside the z table (third table)
I believe the below example will help me to delete the rows I want:
WITH cte AS (
SELECT
a.column1 AS a_column1,
a.column2 AS a_column2,
a.y_id AS a_y_id,
a.x_id AS a_x_id,
b.column1 AS a_column1,
b.colum2 AS a_column2,
b.y_id AS b_y_id ,
b.x_id AS b_x_id
FROM YTable a
JOIN YTable b
ON a.column1 = b.column1
AND a.column2 = b.column2
AND a.x_id = b.x_id
AND a.y_id != b.y_id
WHERE a.y_id < b.y_id
)
SELECT
cte.*,
CASE
WHEN EXISTS (
SELECT 1
FROM (
SELECT * FROM ZTable
WHERE y_id = cte.y_id
) table1
FULL JOIN (
SELECT * FROM ZTable
WHERE y_id = cte.y_id
) table2
ON table1.colum1 = table2.column1
AND table1.colum2 = table2.colum2
AND table1.colum3 = table2.colum3
WHERE table1.column1 IS NULL OR table2.column1 IS NULL
)
THEN 'Not Identical'
ELSE 'Identical'
END AS duplicated
FROM cte;