I have a large file with rows for each day per ID. There can be more then one record per ID per day but only the newest value is valid.
DailyValues:
ID int,
date datetime,
version datetime,
value1 float,
value2 float,
value3 float,
value4 float,
I T-SQL i would select MAX(version) and group by ID, Date, then join in the values with cross apply.
select
B.*
from
(
Select
ID,
Date,
MAX(Version)
From DailyValues D1
group by
ID,
Date
) as A
CROSS APPLY (
select top 1 *
from DailyValues D2
where D1.ID = D2.ID
and D1.Date = D2.Date
and D1.Version = D2.version
order Version desc
) as B
The file is to big for me to do it in T-sql.
How can i do this in U-sql
You can first extract the CSV file into a rowset. After, extracting, you can select the latest version, as given below:
@DailyValues =
EXTRACT ID int,
date datetime,
version datetime,
value1 float,
value2 float,
value3 float,
value4 float
FROM "/Samples/Data/DailyValues.csv"
USING Extractors.Csv(encoding: Encoding.[ASCII]);
SELECT ID, Date, Version
FROM
(
SELECT ID, Date,Version, ROW_Number() OVER(PARTITION BY ID, Date ORDER BY version DESC) AS rn
FROM @DailyValues) AS t
WHERE t.rn == 1;