How can I calculate running sum in Power Query faster? I used table buffering and List.Generate
but it is still too slow.
I have a table like this:
datetime | val | maxval | otherCols |
---|---|---|---|
2024-12-12 00:00:00 | 0 | 3 | ... |
2024-12-12 00:00:01 | 0 | 3 | ... |
2024-12-12 00:00:03 | 1 | 3 | ... |
2024-12-12 00:00:04 | 2 | 3 | ... |
2024-12-12 00:00:10 | 3 | 3 | ... |
2024-12-12 00:00:11 | 0 | 2 | ... |
2024-12-12 00:00:12 | 1 | 2 | ... |
2024-12-12 00:00:13 | 2 | 2 | ... |
2024-12-12 00:00:15 | 2 | 2 | ... |
2024-12-12 00:00:16 | 0 | 5 | ... |
val
starts from 0, increases until it is equal to maxval
, then restarts from 0. There might be multiple consecutive val
s with same value.
I should create batches of val
s (from 0 to maxval
) using Power Query (running on PowerQueryNet). maxval
of multiple batches can be the same.
My approach was creating a batchChange
column, which is 1 for each row that is start of a new batch and 0 for all other, then calculate running sum of batchChange
which gives new batchId
for each batch.
datetime | val | maxval | batchChange | batchId | otherCols |
---|---|---|---|---|---|
2024-12-12 00:00:00 | 0 | 3 | 0 | 0 | ... |
2024-12-12 00:00:01 | 0 | 3 | 0 | 0 | ... |
2024-12-12 00:00:03 | 1 | 3 | 0 | 0 | ... |
2024-12-12 00:00:04 | 2 | 3 | 0 | 0 | ... |
2024-12-12 00:00:10 | 3 | 3 | 0 | 0 | ... |
2024-12-12 00:00:11 | 0 | 2 | 1 | 1 | ... |
2024-12-12 00:00:12 | 1 | 2 | 0 | 1 | ... |
2024-12-12 00:00:13 | 2 | 2 | 0 | 1 | ... |
2024-12-12 00:00:15 | 2 | 2 | 0 | 1 | ... |
2024-12-12 00:00:16 | 0 | 5 | 1 | 2 | ... |
Calculating batchChange
was easy. Using two indexes (from 0 and 1) and joining them, I have val
and maxval
of previous row in each row and an if
can detect if the row is start of a new batch. For ~6000 rows, it takes less than a second.
However, calculating batchId
takes forever! I buffered batchChangedTbl
and disabled query folding which improved it to 30 seconds but it is still too long.
batchId = List.Generate(
() => [idx = 1, bid = 0],
each [idx] <= Table.RowCount(batchChangedTbl),
each [
bid = [bid] + batchChangedTbl{idx - 1}[batchChange],
idx = [idx] + 1
],
each [bid]
)
I think it recalculates entire list for each row as time grows exponentially when row count is increased.
How can I create batchId
faster? Is there any faster approach in Power Query? I know tools like Pandas or DAX are better for this problem, but I'm limited to Power Query.
If you start with
then you can use this to create new consecutive batch numbers. If you remove the last two steps you would have unique but non-consecutive batch numbers
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each
if [Index]=0 then 0 else
if [val]=0 then
if #"Added Index"{[Index]-1}[val]=0 then null else [Index]
else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"}),
// next two steps just to get consecutive batch numbers. You could leave as is
#"Grouped Rows" = Table.Group(#"Filled Down", {"Custom"}, {{"data", each _, type table }}),
#"Added Index1" = Table.AddIndexColumn(#"Grouped Rows", "NewBatch", 0, 1, Int64.Type)
in #"Added Index1"