kqlazure-data-explorerkusto-explorer

Generate random values but without changing them after refresh in kusto?


I have the following data table in kusto :

let DataSource = datatable(Id:string, Name:string, Value:real)
[
    '1', 'Name1', 36.27,
    '2', 'Name2', 10.85,
    '3', 'Name3', 722.49,
    '4', 'Name4', 216.1,
    '5', 'Name5', 345.44,
    '6', 'Name6', 100.83
];
DataSource
| extend y = rand(100)

What I am trying to achieve here is to be able to generate random values but I need those values to never change again, so whenever I run the kql query again the values generated should always be the same.

Is there a way to do it ?


Solution

  • You can use hashing to generate random numbers in a way that is not truly random in a statistical sense, but is deterministic. It works best if you have high cardinality (highly diverse values) and many columns. Hashing is commonly used to provide repeatability in machine learning model train-test splitting. Keep in mind that if you have data with low cardinality then this method won't work as well.

    For "random" values between 0 and 100 here is a sample:

    let DataSource = datatable(Id:string, Name:string, Value:real)
    [
        '1', 'Name1', 36.27,
        '2', 'Name2', 10.85,
        '3', 'Name3', 722.49,
        '4', 'Name4', 216.1,
        '5', 'Name5', 345.44,
        '6', 'Name6', 100.83
    ];
    DataSource
    | extend y = hash(strcat(Id, Name, Value)) %100
    

    You can try running a sample of what that the random number generation would look for 10000 numbers. It looks close enough to being uniformly distributed.

    print N = range(0,10000)
    | mv-expand N to typeof(int)
    | extend y = hash(N) %100
    | summarize count() by bin(y,5)
    | render barchart with (xtitle="Hash Value", ytitle="count")
    

    Resulting distribution of the hash values that looks close enough to being uniformly distributed