powerbipowerquery

Grouped running total with Power Query M


This is how my table looks like (1.7 million rows):

enter image description here

I'm trying to build a running total per customer ID and date.

This is easy to express using DAX, but unfortunately I don't have enough memory on my machine (16GB RAM).

So, I'm trying to find an alternative with Power Query M using buffered tables, etc. but that is too complicated for me.

Can anyone help? Thank you so much in advance!

EDIT: After sorting by Date and CustomerID, added index and added a custom column with:

= Table.AddColumn(#"Added Index", "Personalizado", each (i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales]))

I get the following: enter image description here

EDIT2: The whole code:

let
    Origem = dataset,
    #"Linhas Agrupadas" = Table.Group(Origem, {"Date", "CustomerID"}, {{"Sales", each List.Sum([Sales]), type nullable number}}),
    #"Linhas Ordenadas" = Table.Sort(#"Linhas Agrupadas",{{"Date", Order.Ascending}, {"CustomerID", Order.Ascending}}),
    #"Linhas Filtradas" = Table.SelectRows(#"Linhas Ordenadas", each [Sales] <> 0),
    #"Added Index" = Table.AddIndexColumn(#"Linhas Filtradas", "Index", 0, 1, Int64.Type),
    #"Personalizado Adicionado" = Table.AddColumn(#"Added Index","CumSum",(i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales]), type number )
in
    #"Personalizado Adicionado"

Solution

  • Method1

    Sort your data to start with, perhaps on the date column and CustomerID column. However it appears on screen is the row order it is going to accumulate the total

    Add column .. index column...

    Add column .. custom column with this formula, going back to remove the first instance of each before the (i) in the formula bar

    = (i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales])
    

    Right click index column and remove it

    Likely adding a Table.Buffer() around the index step will help speed things up

    Sample full code:

    let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Sorted Rows" = Table.Sort(Source,{{"CustomerID", Order.Ascending}, {"Date", Order.Ascending}}),
    #"Added Index" = Table.Buffer(Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1)),
    #"Added Custom" = Table.AddColumn(#"Added Index","CumSum",(i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales]), type number ),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Index"})
    in  #"Removed Columns"
    

    Method 2:

    Create function fn_cum_total

    (Input) =>
    let withindex = Table.AddIndexColumn(Input, "Index", 1, 1),
    cum = Table.AddColumn(withindex, "Total",each List.Sum(List.Range(withindex[Sales],0,[Index])))[Total]
    in cum
    

    Create query that uses that function to add cumulative totals to Sales column after grouping on CustomerID

    let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Sorted Rows" = Table.Buffer(Table.Sort(Source,{{"CustomerID", Order.Ascending}, {"Date", Order.Ascending}})),
    Running_Total = Table.Group(#"Sorted Rows",{"CustomerID"},{{"Data",
        (Input as table) as table =>  let  zz = fn_cum_total(Input),
         result = Table.FromColumns(Table.ToColumns(Input)&{zz}, Value.Type(Table.AddColumn(Input, "total", each null, type number))) in result, type table}} ),
    #"Expanded Data" = Table.ExpandTableColumn(Running_Total, "Data", {"Date", "Sales", "total"}, {"Date", "Sales", "total"})
    in #"Expanded Data"
    

    I cannot take credit for method 2, borrowed long ago, but do not recall source