I have a table with the following data
record_id | site | start_date | activity_1 | activity_2 | activity_3 | activity_4 |
---|---|---|---|---|---|---|
10 | 1 | 9/24/2022 | basketball | baseball | ||
10 | 1 | 10/1/2022 | basketball | |||
10 | 1 | 10/3/2022 | basketball | |||
10 | 1 | 10/11/2022 | baseball | football | ||
10 | 1 | 11/1/2022 | football | soccer | ||
10 | 1 | 12/12/2022 |
I want to do two things: combine the activity columns into one column while retaining the same record_id, site and start_date row values and create a new end_date column. The end_date will come from the next start_date after that activity has a blank field. My expected result is below.
record_id | site | start_date | activity | end_date |
---|---|---|---|---|
10 | 1 | 9/24/2022 | basketball | 10/11/2022 |
10 | 1 | 10/1/2022 | basketball | 10/11/2022 |
10 | 1 | 10/3/2022 | basketball | 10/11/2022 |
10 | 1 | 9/24/2022 | baseball | 10/1/2022 |
10 | 1 | 10/11/2022 | baseball | 11/1/2022 |
10 | 1 | 10/11/2022 | football | 12/12/2022 |
10 | 1 | 11/1/2022 | football | 12/12/2022 |
10 | 1 | 11/1/2022 | soccer | 12/12/2022 |
I've tried this query, but it only gives me the next start_date after the activity, even if the next one is not blank.
let
// Step 1: Load the data
Source = Sheet2,
// Step 2: Unpivot activity columns into one column
UnpivotedColumns = Table.UnpivotOtherColumns(Source, {"record_id", "Site", "start_date"}, "ActivityType", "activity"),
// Step 3: Remove rows where activity is null or empty
FilteredRows = Table.SelectRows(UnpivotedColumns, each ([activity] <> null and [activity] <> "")),
// Step 4: Sort data by record_id, Site, and start_date
SortedTable = Table.Sort(FilteredRows, {{"record_id", Order.Ascending}, {"Site", Order.Ascending}, {"start_date", Order.Ascending}}),
// Step 5: Create a table with DISTINCT start dates
DistinctStartDates = Table.Distinct(SortedTable[[record_id], [Site], [start_date]]),
// Step 6: Add a column to find the next start_date for each group
AddedNextStartDate = Table.AddColumn(DistinctStartDates, "next_start_date", each
let
CurrentRecordID = [record_id],
CurrentSite = [Site],
CurrentStartDate = [start_date],
// Find the next available start_date
NextDates = Table.SelectRows(DistinctStartDates, each ([record_id] = CurrentRecordID and [Site] = CurrentSite and [start_date] > CurrentStartDate)),
NextStartDate = if Table.RowCount(NextDates) > 0 then NextDates[start_date]{0} else null
in
NextStartDate, type nullable date
),
// Step 7: Merge this next_start_date back into the main table
MergedTable = Table.NestedJoin(SortedTable, {"record_id", "Site", "start_date"}, AddedNextStartDate, {"record_id", "Site", "start_date"}, "NextDateTable", JoinKind.LeftOuter),
ExpandedTable = Table.ExpandTableColumn(MergedTable, "NextDateTable", {"next_start_date"}),
// Step 8: Fill in null end_dates with the maximum date (if applicable)
//MaxEndDate = Date.From("12/12/2022"), // Replace with a suitable max date
MaxEndDate = List.Max(Source[start_date]), // Finds the latest date in the Start_Date column
FinalTable = Table.TransformColumns(ExpandedTable, {"next_start_date", each if _ = null then MaxEndDate else _, type date}),
// Step 9: Rename next_start_date to end_date
RenamedTable = Table.RenameColumns(FinalTable, {{"next_start_date", "end_date"}}),
#"Sorted Rows" = Table.Sort(RenamedTable,{{"ActivityType", Order.Ascending}})
in
#"Sorted Rows"
Any help would be appreciated
This seems to work, although I have to think there should be a more efficient method of accomplishing this task.
Edit: Code edited to a more efficient version
Custom Function
(rename as per the comments)
processes one activity at a time
//Rename "fnEndDate"
(tbl as table, act as text)=>
[cn=List.FirstN(Table.ColumnNames(tbl),3),
a=Table.SelectColumns(tbl,cn & {act}),
b=Table.FromColumns(
Table.ToColumns(a) &
{{null} & List.RemoveLastN(Table.Column(a,act),1)},
cn & {"Activity","Shifted"}),
c=Table.ReplaceValue(
b,
each [Shifted],
each [start_date],
(x,y,z)=>if x=null and y<>null then z else x,
{"Activity"}),
d=Table.RemoveColumns(c,{"Shifted"}),
e=Table.FillDown(d,{"Activity"}),
f=Table.AddColumn(e,"end_date", each if not (try Date.From([Activity]))[HasError]
then [Activity]
else null, type nullable date),
g=Table.FillUp(f,{"end_date"}),
h=Table.SelectRows(g, each (try Date.From([Activity]))[HasError]),
i=Table.TransformColumnTypes(h,{"Activity", type text})
][i]
Main Code
let
//Change next line to reflect actual data source
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"record_id", type text}, {"site", Int64.Type}, {"start_date", type date} &
List.Transform(List.Skip(Table.ColumnNames(Source),3), each {_, type any})}),
#"End Dates" = List.Accumulate(
List.Skip(Table.ColumnNames(#"Changed Type"),3),
#table({},{}),
(s,c)=> Table.Combine({s,fnEndDate(#"Changed Type",c)})
)
in
#"End Dates"