sql-serverloopsleft-joindata-warehouse

How do I get 1 record as result including a left join including a loop?


I have a situation that I cannot figure out on my own and would like to have support in it how to solve it: I’m currently building a data warehouse on SQL server 18. In the staging db I have a SoLine table that combines the data from several Oracle tables. Among those tables is ONT.OE_Order_Price_Attribs which contains de Header_Id and Line_Id to join on the base table ONT.OE_Order_Lines_All:


SELECT
    OL.*,
    OPA.*
FROM ONT.OE_Order_Lines_All OL

LEFT JOIN ONT.OE_Order_Price_Attribs OPA
    ON OL.Header_Id = OPA.Header_Id
    AND OL.Line_Id = OPA.Line_Id

OPA contains sometimes multiple records per Header_Id, Line_Id and Last_Update_Date (so those contain all the same values):

Header_Id Line_Id Last_Update_Date Pricing_Context Order_Price_Attrib_Id Att1 Att2 Att3 Att4 Att5
123 456 2023-12-13 10:00:00 01 789 A B C D E
123 456 2023-12-13 10:00:00 01 790 A B C D F

The rest of the columns are also identical except for column Order_Price_Attrib_Id. It can occur that the att* columns are the same or different though.

What I want to achieve is that I have only 1 record as result. In case when the columns Header_Id, Line_Id and Last_Update_Date are the same, it should show the record with the latest Order_price_Attrib_Id. So in the above example Id 790, despite if the att* are the same or different. But I do want to see the att* columns

So for each Header_Id and Line_Id combination from the order line table, it needs to find the latest Header_Id and Line_id of the attrib tables based on the Order_Price_Attrib_Id for each record in the SoLine table. So a loop is (at least I think) needed to find each combination.

I have tried to use the MAX function on Order_Price_Attrib_Id but that didn’t work out unfortunately. I also tried to use a sub-query with a count(*) and such. But no results unfortunatly so far.

I have a hard time to write these more advanced/difficult queries since I’m not a real expert in writing queries. I can write the basics but these type of queries are a bit out of reach yet at the moment.

So any help is highly appreciated and therefore already thank you in advance.

Yoeri

I don't seem to get in any way to only have 1 record as a result per record in the SoLine table


Solution

  • We can use ROW_NUMBER() to assign a unique sequential number to each row within defined groups, specified by PARTITION BY clause, then we select the latest row per group, by using ORDER BY clause to sort the data in descending order :

    SELECT *
    FROM (
      SELECT
        OL.*,
        OPA.*,
        ROW_NUMBER() OVER (PARTITION BY OL.Header_Id, OL.Line_Id, OL.Last_Update_Date ORDER BY Order_Price_Attrib_Id DESC) AS rn
      FROM OE_Order_Lines_All OL
      LEFT JOIN OE_Order_Price_Attribs OPA
         ON OL.Header_Id = OPA.Header_Id
         AND OL.Line_Id = OPA.Line_Id
    ) AS S
    WHERE rn = 1;