powerbidaxpowerquery

Get value from next row when column does not contain consectutive values in Power BI


I have a table with the following data

record_id site start_date activity_1 activity_2 activity_3 activity_4
10 1 9/24/2022 basketball baseball
10 1 10/1/2022 basketball
10 1 10/3/2022 basketball
10 1 10/11/2022 baseball football
10 1 11/1/2022 football soccer
10 1 12/12/2022

I want to do two things: combine the activity columns into one column while retaining the same record_id, site and start_date row values and create a new end_date column. The end_date will come from the next start_date after that activity has a blank field. My expected result is below.

record_id site start_date activity end_date
10 1 9/24/2022 basketball 10/11/2022
10 1 10/1/2022 basketball 10/11/2022
10 1 10/3/2022 basketball 10/11/2022
10 1 9/24/2022 baseball 10/1/2022
10 1 10/11/2022 baseball 11/1/2022
10 1 10/11/2022 football 12/12/2022
10 1 11/1/2022 football 12/12/2022
10 1 11/1/2022 soccer 12/12/2022

I've tried this query, but it only gives me the next start_date after the activity, even if the next one is not blank.

let
// Step 1: Load the data
Source = Sheet2,
// Step 2: Unpivot activity columns into one column
UnpivotedColumns = Table.UnpivotOtherColumns(Source, {"record_id", "Site", "start_date"}, "ActivityType", "activity"),

// Step 3: Remove rows where activity is null or empty
FilteredRows = Table.SelectRows(UnpivotedColumns, each ([activity] <> null and [activity] <> "")),

// Step 4: Sort data by record_id, Site, and start_date
SortedTable = Table.Sort(FilteredRows, {{"record_id", Order.Ascending}, {"Site", Order.Ascending}, {"start_date", Order.Ascending}}),

// Step 5: Create a table with DISTINCT start dates
DistinctStartDates = Table.Distinct(SortedTable[[record_id], [Site], [start_date]]),

// Step 6: Add a column to find the next start_date for each group
AddedNextStartDate = Table.AddColumn(DistinctStartDates, "next_start_date", each 
    let 
        CurrentRecordID = [record_id],
        CurrentSite = [Site],
        CurrentStartDate = [start_date],
        // Find the next available start_date
        NextDates = Table.SelectRows(DistinctStartDates, each ([record_id] = CurrentRecordID and [Site] = CurrentSite and [start_date] > CurrentStartDate)),
        NextStartDate = if Table.RowCount(NextDates) > 0 then NextDates[start_date]{0} else null
    in 
        NextStartDate, type nullable date
),

// Step 7: Merge this next_start_date back into the main table
MergedTable = Table.NestedJoin(SortedTable, {"record_id", "Site", "start_date"}, AddedNextStartDate, {"record_id", "Site", "start_date"}, "NextDateTable", JoinKind.LeftOuter),
ExpandedTable = Table.ExpandTableColumn(MergedTable, "NextDateTable", {"next_start_date"}),

// Step 8: Fill in null end_dates with the maximum date (if applicable)
//MaxEndDate = Date.From("12/12/2022"), // Replace with a suitable max date
MaxEndDate = List.Max(Source[start_date]), // Finds the latest date in the Start_Date column
FinalTable = Table.TransformColumns(ExpandedTable, {"next_start_date", each if _ = null then MaxEndDate else _, type date}),

// Step 9: Rename next_start_date to end_date
RenamedTable = Table.RenameColumns(FinalTable, {{"next_start_date", "end_date"}}),
#"Sorted Rows" = Table.Sort(RenamedTable,{{"ActivityType", Order.Ascending}})
in
#"Sorted Rows"

Any help would be appreciated


Solution

  • This seems to work, although I have to think there should be a more efficient method of accomplishing this task.

    Edit: Code edited to a more efficient version

    Custom Function
    (rename as per the comments)
    processes one activity at a time

    //Rename "fnEndDate"
    
    (tbl as table, act as text)=>
    
    [cn=List.FirstN(Table.ColumnNames(tbl),3),
     a=Table.SelectColumns(tbl,cn & {act}),
     b=Table.FromColumns(
         Table.ToColumns(a) &
         {{null} & List.RemoveLastN(Table.Column(a,act),1)},
         cn & {"Activity","Shifted"}),
     c=Table.ReplaceValue(
         b,
         each [Shifted],
         each [start_date],
         (x,y,z)=>if x=null and y<>null then z else x,
         {"Activity"}),
     d=Table.RemoveColumns(c,{"Shifted"}),
     e=Table.FillDown(d,{"Activity"}),
     f=Table.AddColumn(e,"end_date", each if not (try Date.From([Activity]))[HasError] 
                                            then [Activity] 
                                            else null, type nullable date),
     g=Table.FillUp(f,{"end_date"}),
     h=Table.SelectRows(g, each (try Date.From([Activity]))[HasError]),
     i=Table.TransformColumnTypes(h,{"Activity", type text})
    ][i]
    

    Main Code

    let
    
    //Change next line to reflect actual data source
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{
            {"record_id", type text}, {"site", Int64.Type}, {"start_date", type date} &
                List.Transform(List.Skip(Table.ColumnNames(Source),3), each {_, type any})}),
        #"End Dates" = List.Accumulate(
            List.Skip(Table.ColumnNames(#"Changed Type"),3),
            #table({},{}),
            (s,c)=> Table.Combine({s,fnEndDate(#"Changed Type",c)})
        )
    in
        #"End Dates"
    

    Original Data
    enter image description here

    Results
    enter image description here