powerbipowerqueryetlm

Power Query `List.Generate` runs too slow


TL;DR

How can I calculate running sum in Power Query faster? I used table buffering and List.Generate but it is still too slow.

Details

I have a table like this:

datetime val maxval otherCols
2024-12-12 00:00:00 0 3 ...
2024-12-12 00:00:01 0 3 ...
2024-12-12 00:00:03 1 3 ...
2024-12-12 00:00:04 2 3 ...
2024-12-12 00:00:10 3 3 ...
2024-12-12 00:00:11 0 2 ...
2024-12-12 00:00:12 1 2 ...
2024-12-12 00:00:13 2 2 ...
2024-12-12 00:00:15 2 2 ...
2024-12-12 00:00:16 0 5 ...

val starts from 0, increases until it is equal to maxval, then restarts from 0. There might be multiple consecutive vals with same value.

I should create batches of vals (from 0 to maxval) using Power Query (running on PowerQueryNet). maxval of multiple batches can be the same.

My approach was creating a batchChange column, which is 1 for each row that is start of a new batch and 0 for all other, then calculate running sum of batchChange which gives new batchId for each batch.

datetime val maxval batchChange batchId otherCols
2024-12-12 00:00:00 0 3 0 0 ...
2024-12-12 00:00:01 0 3 0 0 ...
2024-12-12 00:00:03 1 3 0 0 ...
2024-12-12 00:00:04 2 3 0 0 ...
2024-12-12 00:00:10 3 3 0 0 ...
2024-12-12 00:00:11 0 2 1 1 ...
2024-12-12 00:00:12 1 2 0 1 ...
2024-12-12 00:00:13 2 2 0 1 ...
2024-12-12 00:00:15 2 2 0 1 ...
2024-12-12 00:00:16 0 5 1 2 ...

Calculating batchChange was easy. Using two indexes (from 0 and 1) and joining them, I have val and maxval of previous row in each row and an if can detect if the row is start of a new batch. For ~6000 rows, it takes less than a second.

However, calculating batchId takes forever! I buffered batchChangedTbl and disabled query folding which improved it to 30 seconds but it is still too long.

batchId = List.Generate(
    () => [idx = 1, bid = 0],
    each [idx] <= Table.RowCount(batchChangedTbl),
    each [
        bid = [bid] + batchChangedTbl{idx - 1}[batchChange],
        idx = [idx] + 1
    ],
    each [bid]
)

I think it recalculates entire list for each row as time grows exponentially when row count is increased.

How can I create batchId faster? Is there any faster approach in Power Query? I know tools like Pandas or DAX are better for this problem, but I'm limited to Power Query.


Solution

  • If you start with

    enter image description here

    then you can use this to create new consecutive batch numbers. If you remove the last two steps you would have unique but non-consecutive batch numbers

    let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each 
        if [Index]=0 then 0 else 
        if [val]=0 then 
            if #"Added Index"{[Index]-1}[val]=0 then null else [Index]
        else null),
    #"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"}),
    // next two steps just to get consecutive batch numbers. You could leave as is
    #"Grouped Rows" = Table.Group(#"Filled Down", {"Custom"}, {{"data", each _, type table }}),  
    #"Added Index1" = Table.AddIndexColumn(#"Grouped Rows", "NewBatch", 0, 1, Int64.Type)
    in  #"Added Index1"
    

    enter image description here