Based on my previous post below url, I have prepared aggregate table from source raw table using materialized view.
Materialized view over materialized view || ADX || KQL || Kusto
Calculating Percentile value on raw data from my source table which is different from aggregate table.
Below is percentile calculating on raw data or source table.
Below is percentile calculating on aggregate data which is from raw data or source table. stg01 is aggregate table in my real time scenario which is Materialized view from source table.
Below is code for calculating Percentile from source table
let MySourceTable = datatable(col1:int, col2:int, Value:int, EventUTCDateTime:datetime)
[
1,2,10,'2022-01-01 01:00:01',
1,2,20,'2022-01-01 02:01:01',
1,2,30,'2022-01-01 03:05:01',
1,2,40,'2022-01-02 01:30:01',
1,2,20,'2022-01-02 02:30:01',
1,2,50,'2022-01-03 07:30:01',
1,2,50,'2022-01-05 06:30:01'
];
MySourceTable
| extend EventUTCDate = bin(EventUTCDateTime,30m)
| project col1,col2,Value,EventUTCDate
| summarize RecordsCount=count(), TotalValue=sum(Value), Min=min(Value), Max=max(Value),Median=percentile(Value,50) by col1,col2
| project col1,col2, Mean=TotalValue/RecordsCount, Min, Max,Median
Below is code for calculating Percentile from aggregated table
let MySourceTable = datatable(col1:int, col2:int, Value:int, EventUTCDateTime:datetime)
[
1,2,10,'2022-01-01 01:00:01',
1,2,20,'2022-01-01 02:01:01',
1,2,30,'2022-01-01 03:05:01',
1,2,40,'2022-01-02 01:30:01',
1,2,20,'2022-01-02 02:30:01',
1,2,50,'2022-01-03 07:30:01',
1,2,50,'2022-01-05 06:30:01'
];
let stg01 = MySourceTable
| extend EventUTCDate = bin(EventUTCDateTime,1d)
| project col1,col2,Value,EventUTCDate
| summarize RecordsCount=count(), TotalValue=sum(Value), Min=min(Value), Max=max(Value),Median=percentile(Value,50) by col1,col2,EventUTCDate;
stg01
| summarize RecordsCount=sum(RecordsCount), TotalValue=sum(TotalValue),Min=min(Min), Max=max(Max),Median=percentile(Median,50) by col1,col2
| project col1,col2, Mean=TotalValue/RecordsCount, Min, Max,Median
Need suggestions and inputs how to mitigate this issue, I want to present same percentile value from both source table and aggregate table.
I think you're looking for tdigest()
- see using hll and tdigest. If you preserve the tdigest
from the source table, you can calculate the 2nd aggregation based on that, instead of over the percentile.