We have a plugin, which imports an XML file from an FTP server into the database. This XML is further processed, and data is inserted into the database tables. These rows are added to the "pricing" table (with help of the package w 10k+ rows), where prices can be set. My problem is that duplicate values often come through during the import. The system processes them, but during pairing, only the first match is taken into table.
To simplify, there is a table.. the important columns are: the ID, the column "SITE", and the column LINKED, where the ID is attached with an exclamation mark (!) to the PK.
How can I find the pairs and filter out those that don’t have a pair or those that are linked to the same ID more than once?
After this, I would like to make a procedure to update the column to fix the pairing - with the correct ID.
table:
I have colored the pairs to make it easier to visualize and understand what the issue is. Here you can see, that SITE: 292 ID: 4 is not paired to ID:6.
I have no IDEA, how can i link the correct.. i can find the not good one-s, with sub-select:
select * from pricing p where p.PK='3452400012026'
and p.site='292'
and
(select count(*) from pricing po
where po.PK = p.PK and
po.LINKED=p.PK||'!'||p.ID) != 1
So if it's not 1, it's problematic...
EDIT:
If I'm not wrong - this could be done with SQL alone using MERGE INTO to do the update. Here is the code with link to the fiddle. Details below the code:
MERGE INTO PRICING p
USING ( WITH
links AS
( Select p.ID, p.LINKED, p.SITE, p.PK,
To_Number(Substr(p.LINKED, InStr(p.LINKED, '!') + 1)) "LINK_ID",
Case When p.LINKED Is Not Null
Then Row_Number() Over(Partition By p.SITE, p.PK, To_Number(Substr(p.LINKED, InStr(p.LINKED, '!') + 1))
Order By p.SITE, p.PK, p.ID)
End "RN_LINK",
Count(p.ID) Over(Partition By p.SITE, p.PK, To_Number(Substr(p.LINKED, InStr(p.LINKED, '!') + 1))) "COUNT_LINK"
From pricing p
Where p.LINKED Is Not Null
),
grid AS
( Select *
From links
UNION ALL
Select ID, LINKED, SITE, PK, ID,
Row_Number() Over(Partition By SITE, PK Order By ID),
Count(ID) Over()
From pricing
Where LINKED Is Null And
ID Not In(Select LINK_ID From links)
)
-- M a i n S Q L :
SELECT *
FROM ( Select g.ID, g.LINKED, g.SITE, g.PK,
Case When g.COUNT_LINK > 1 And
g.LINKED Is Not Null And
g.RN_LINK = 1
Then g.PK || '!' || g.LINK_ID
When g.COUNT_LINK > 1 And
g.RN_LINK > 1 And
g.ID != g.LINK_ID
Then g.PK || '!' || ( Select ID From grid Where LINKED Is Null And RN_LINK = g.RN_LINK - 1 )
End "LINKED_2"
From grid g
)
WHERE LINKED_2 Is Not Null And SubStr(LINKED_2, -1) != '!'
) x ON (x.ID = p.ID)
WHEN MATCHED
THEN UPDATE SET p.LINKED = x.LINKED_2
WHERE p.LINKED != x.LINKED_2
See the fiddle and the result here.
Details:
-- l i n k s :
/*
ID LINKED SITE PK LINK_ID RN_LINK COUNT_LINK
--- --------------- ---- ------------- ------- ------- ----------
1 3452400012026!2 99 3452400012026 2 1 1
3 3452400012026!8 99 3452400012026 8 1 2
5 3452400012026!7 99 3452400012026 7 1 1
6 3452400012026!8 99 3452400012026 8 2 2
9 3452400012026!10 99 3452400012026 10 1 1 */
-- here we have the ID - LINK_ID pairs along with row numbers per LINK_ID with total rows per LINK_ID
-- notice that LINK_ID 8 has two rows - for IDs 3 and 6
-- g r i d
/*
ID LINKED SITE PK LINK_ID RN_LINK COUNT_LINK
--- --------------- ---- ------------- ------- ------- ----------
1 3452400012026!2 99 3452400012026 2 1 1
3 3452400012026!8 99 3452400012026 8 1 2
4 null 292 3452400012026 4 1 1
5 3452400012026!7 99 3452400012026 7 1 1
6 3452400012026!8 99 3452400012026 8 2 2
9 3452400012026!10 99 3452400012026 10 1 1 */
-- m a i n :
/*
ID LINKED SITE PK LINKED_2
-- ---------------- ---- -------------- ---------------
3 3452400012026!8 99 3452400012026 3452400012026!8
6 3452400012026!8 99 3452400012026 3452400012026!4 */
The resulting dataset is MERGED INTO pricing to do the update.