I have a situation that I cannot figure out on my own and would like to have support in it how to solve it: I’m currently building a data warehouse on SQL server 18. In the staging db I have a SoLine table that combines the data from several Oracle tables. Among those tables is ONT.OE_Order_Price_Attribs which contains de Header_Id and Line_Id to join on the base table ONT.OE_Order_Lines_All:
SELECT
OL.*,
OPA.*
FROM ONT.OE_Order_Lines_All OL
LEFT JOIN ONT.OE_Order_Price_Attribs OPA
ON OL.Header_Id = OPA.Header_Id
AND OL.Line_Id = OPA.Line_Id
OPA contains sometimes multiple records per Header_Id, Line_Id and Last_Update_Date (so those contain all the same values):
Header_Id | Line_Id | Last_Update_Date | Pricing_Context | Order_Price_Attrib_Id | Att1 | Att2 | Att3 | Att4 | Att5 |
---|---|---|---|---|---|---|---|---|---|
123 | 456 | 2023-12-13 10:00:00 | 01 | 789 | A | B | C | D | E |
123 | 456 | 2023-12-13 10:00:00 | 01 | 790 | A | B | C | D | F |
The rest of the columns are also identical except for column Order_Price_Attrib_Id. It can occur that the att* columns are the same or different though.
What I want to achieve is that I have only 1 record as result. In case when the columns Header_Id, Line_Id and Last_Update_Date are the same, it should show the record with the latest Order_price_Attrib_Id. So in the above example Id 790, despite if the att* are the same or different. But I do want to see the att* columns
So for each Header_Id and Line_Id combination from the order line table, it needs to find the latest Header_Id and Line_id of the attrib tables based on the Order_Price_Attrib_Id for each record in the SoLine table. So a loop is (at least I think) needed to find each combination.
I have tried to use the MAX function on Order_Price_Attrib_Id but that didn’t work out unfortunately. I also tried to use a sub-query with a count(*) and such. But no results unfortunatly so far.
I have a hard time to write these more advanced/difficult queries since I’m not a real expert in writing queries. I can write the basics but these type of queries are a bit out of reach yet at the moment.
So any help is highly appreciated and therefore already thank you in advance.
Yoeri
I don't seem to get in any way to only have 1 record as a result per record in the SoLine table
We can use ROW_NUMBER()
to assign a unique sequential number to each row within defined groups, specified by PARTITION BY
clause, then we select the latest row per group, by using ORDER BY
clause to sort the data in descending order :
SELECT *
FROM (
SELECT
OL.*,
OPA.*,
ROW_NUMBER() OVER (PARTITION BY OL.Header_Id, OL.Line_Id, OL.Last_Update_Date ORDER BY Order_Price_Attrib_Id DESC) AS rn
FROM OE_Order_Lines_All OL
LEFT JOIN OE_Order_Price_Attribs OPA
ON OL.Header_Id = OPA.Header_Id
AND OL.Line_Id = OPA.Line_Id
) AS S
WHERE rn = 1;