excelpowerquerypercentile

How do I calculate Percentiles in PowerQuery based on grouping variables?


I have a few columns of data, I need to convert the excel version of "PERCENTILE" into Powerquery format.

I have some code which adds in as a function but doesnt apply accurately as it doesnt allow for grouping of the data by CATEGORY and YEAR. So anything that is in Full Discretionary 1.5-2.5 AND 2014 needs to be added to the percentile array, equally anything that falls in Full discretionary 2.5-3.5 AND 2014 needs to go into a different percentile array

let

 Source = (list as any, k as number) => let

 Source = list,

 #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),

 #"Sorted Rows" = Table.Sort(#"Converted to Table",{{"Column1", Order.Ascending}}),

 #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1),

 #"Added Custom" = Table.AddColumn(#"Added Index", "TheIndex", each Table.RowCount(#"Converted to Table")*k/100),

 #"Filtered Rows" = Table.SelectRows(#"Added Custom", each [Index] >= [TheIndex] and [Index] <= [TheIndex]+1),

Custom1 = List.Average(#"Filtered Rows"[Column1])

   in

 Custom1

in

 Source

So Expected results would be that anything that matches off on the 2 columns (Year,Category) should be applied within the same array. Currently invoking the above function just gives me errors. I have also tried using grouping and outputting the "Min, Median, and Max" outputs but I also require 10% and 90% Percentiles.

Thank you in advance


Solution

  • Based on some findings on other websites and alot of googling (most folk just want to use DAX but if youre only using Power Query you cant!) someone posted an answer which is very helpful:

    https://social.technet.microsoft.com/Forums/en-US/a57bfbea-52d1-4231-b2de-fa993d9bb4c9/can-the-quotpercentilequot-be-calculated-in-power-query?forum=powerquery

    Basically:

    /PercentileInclusive Function
    
    (inputSeries as list, percentile as number) => 
    let
        SeriesCount = List.Count(inputSeries),
        PercentileRank = percentile*(SeriesCount-1)+1, //percentile value between 0 and 1
        PercentileRankRoundedUp = Number.RoundUp(PercentileRank),
        PercentileRankRoundedDown = Number.RoundDown(PercentileRank),
        Percentile1 = List.Max(List.MinN(inputSeries,PercentileRankRoundedDown)),
        Percentile2 = List.Max(List.MinN(inputSeries,PercentileRankRoundedUp)),
        Percentile = Percentile1+(Percentile2-Percentile1)*(PercentileRank-PercentileRankRoundedDown)
    in
        Percentile
    

    The above will replicate the PERCENTILE function found within Excel - you pass this as a query using "New Query" and advanced editor. Then call it in after grouping your data -

    Table.Group(RenamedColumns, {"Country"}, {{"Sales Total", each List.Sum([Amount Sales]), type number}, {"95 Percentile Sales", each List.Average([Amount Sales]), type number}})

    In the above formula, RenamedColumns is the name of the previous step in the script. Change the name to match your actual case. I've assumed that the pre-grouping sales amount column is "Amount Sales." Names of grouped columns are "Sales Total" and "95 Percentile Sales."

    Next modify the group formula, substituting List.Average with PercentileInclusive:

    Table.Group(RenamedColumns, {"Country"}, {{"Sales Total", each List.Sum([Amount Sales]), type number}, {"95 Percentile Sales", each PercentileInclusive([Amount Sales],0.95), type number}})

    This worked for my data set and matches similar