sql-serverazure-data-lakeu-sql

U-Sql, how do i join without getting cartesian product


I have a large file with rows for each day per ID. There can be more then one record per ID per day but only the newest value is valid.

DailyValues:

ID int,
date datetime,
version datetime,
value1 float,
value2 float,
value3 float,
value4 float,

I T-SQL i would select MAX(version) and group by ID, Date, then join in the values with cross apply.

select
  B.*
from
(
   Select
     ID,
     Date,
     MAX(Version)
     From DailyValues D1
   group by
     ID,
     Date
) as A
CROSS APPLY (
   select top 1 *
   from DailyValues D2
   where D1.ID = D2.ID
   and D1.Date = D2.Date
   and D1.Version = D2.version
   order Version desc
) as B

The file is to big for me to do it in T-sql.

How can i do this in U-sql


Solution

  • You can first extract the CSV file into a rowset. After, extracting, you can select the latest version, as given below:

    @DailyValues = 
          EXTRACT ID int,
        date datetime,
        version datetime,
        value1 float,
        value2 float,
        value3 float,
        value4 float
           FROM "/Samples/Data/DailyValues.csv"
           USING Extractors.Csv(encoding: Encoding.[ASCII]);
    
    SELECT ID, Date, Version
    FROM
    (
    SELECT ID, Date,Version, ROW_Number() OVER(PARTITION BY ID, Date ORDER BY version DESC) AS rn
    FROM @DailyValues) AS t
    WHERE t.rn == 1;