duplicateskqlazure-data-explorerkusto-explorer

Deduplicate Records in Kusto/Survive Non-Null Values in Case of Duplicates


I have a table in Kusto where I'm trying to deduplicate. Here is the table structure:

column1 column2 column3 column4
1234 business orange paypal
1234 business orange
3456 business apple
4845 business banana venmo

I'm struggling to deduplicate this table so in case of duplicates per combination of column1 , column2 , column3, only non-null values of the column4 survives. please note, in some cases, like column1 == 3456, there is a single records per column1, column2, and column3 where the column4 is in fact null and those records must survive.

tbl
| distinct column1 , column2 , column3, column4
| sort by column4  desc nulls last , column1 , column2 , column3
| extend priority_ = row_number(1, prev(column1) != column1 and prev(column2) != column2 and prev(column3) != column3 )
| where priority_ == 1

Because I don't know how to deprioritize the null values of column4, this returns incorrect results. Can you help me figure out how to correctly survive the non-null values in case there are duplicates?


Solution

  • Check if this works for you:

    datatable(column1:string, column2:string, column3:string, column4:string)
    [
        "1234", "business", "orange", "paypal",
        "1234", "business", "orange", "",
        "3456", "business", "apple", "",
        "4845", "business", "banana", "venmo"
    ]
    | summarize make_set(column4) by column1, column2, column3 
    // check whether there's only a single element for {column1,column2,column3} combination
    | extend single_element = array_length(set_column4) == 1
    | mv-expand colummn4 = set_column4 to typeof(string)
    // include only non empty or combinations with only a single column4 value (even if it's empty)
    | where single_element or isnotempty(colummn4)
    | project-away set_column4, single_element
    
    column1 column2 column3 colummn4
    1234 business orange paypal
    3456 business apple
    4845 business banana venmo